Variables and Data Types: How to Talk to Computers

22 June 2021

This article is part of the series A Gentle Introduction to Computer Science.

True or False? An Introduction to Boolean Algebra Getting Started with Python: A Minimal Set-up Guide

If you've read the previous articles in this series, we're almost ready to start looking at some interesting code examples. Before we do, there's one more topic to cover: Data types. Since computers operate exclusively within the realm of ones and zeros, data types help distinguish between different types of data. Let's find out what this means.

In the previous article, we covered the most basic data type: Boolean. We also introduced an important concept for making use of data types: variables. In practice, variables and data types rely heavily on each other: Without variables, data types provide limited utility to a programmer. Without data types, computers cannot accurately present the value of a variable to the programmer.

Data types come in many varieties; some too specific to be relevant here, others too general. In order to simplify things, we'll use some of Python's built-in types to guide the discussion: $^1$

Numbers (int, float)
Strings (str)
Arrays (list)
Dictionaries (dict)

There are other interesting types as well, but these will cover most use cases and therefore will satisfy our needs when we start writing some real code.

Numbers

For numbers, we'll cover two types: Integers (int) and floating-point numbers (float).

Integers

The integer type is used to store whole numbers. Since computers store information as bits (ones and zeros), integers are one of the most basic types. As we saw at the start of the series, whole numbers are first-class citizens in computer language. The integer type is limited by the amount of memory bits allocated to the variable. $^2$ In modern programming languages (including Python), the default size is often 32 bits, allowing for whole numbers in the range $-2,147,483,648$ through $2,147,483,647$ .

Floating-Point Numbers

The main strength of the integer type is its ability to store large numbers (over 2 trillion). It has a big weakness, though, and that weakness is fractions ( $\frac12$ , $\frac34$ , etc.). This is where the floating-point number (or "float") type comes in.

The float type is used to store decimal numbers (such as $1.0$ or $0.5$ ). This has obvious benefits over the integer type but has its own limitations as well.

In the beginning of the 21st century, most PCs had 32-bit processors. This means that, by default, arithmetic operations were performed on 32 bits (one word) at a time. This is why the integer and float types would default to using 32 bits in many languages. For increased precision, most programming languages also implement the more common "double-precision floating-point number" (or "double"), which uses 64 bits of memory. The float type implemented in Python is a double by default.

Since humans like to write decimal numbers in base-10, floats (which are stored as base-2, a.k.a. binary) are approximations of the decimal numbers assigned by the programmer. A common type of error among beginners, for example, occurs when comparing floats. The following snippet is the output we get when using Python to add and compare floats: $^3$ , $^4$

$ python
> 1.0 + 2.0 == 3.0
True
> 0.1 + 0.2 == 0.3
False
> 0.1 + 0.2
0.30000000000000004

Going into the details of why this type of error occurs would require quite a bit of effort and would bring us no closer to preventing it. Instead we can simply rely on a rule of thumb: We should try to avoid floats where we need to make strict equality comparisons.

Strings

The string datatype is interesting. First of all, letters and digits are entirely human concepts with little or no pattern to them. This means that, in order for a computer to store the letters of an alphabet, a binary value has to be assigned to every letter (or "character"). For a long time, this was mostly organized using the ASCII (American Standard Code for Information Interchange) character encoding.

Since it was invented for early american computer systems, ASCII uses only 7 bits to encode characters. With 128 unique combinations, 7 bits is enough to encode each letter of the english alphabet in upper and lower case along with several punctuation characters, the digits 0-9, and a number of control characters which were relevant to computers and/or printers in the 1960s.

What ASCII doesn't cover includes various African and Asian scripts, special characters in latin-derived alphabets (e.g. ö, ü, ß), and various other symbols and emoji. These are all covered by the Unicode standard and its UTF-8 encoding, which can use up to 32 bits per character.

Now that we know that letters are represented as characters, usually Unicode, we can put them together to get strings. In other words, a string is a sequence of characters. Most programming languages use quotation marks to denote strings (either 'single' or "double", Python allows both).

Strings can be combined using concatenation. In Python, this can be achieved using the + operator (notice how we create the variables hello and world to store the strings 'Hello' and 'World' for later use):

$ python
> hello = 'Hello'
> world = 'World'
> hello + ' ' + world + '!'
'Hello World!'

Arrays (lists)

An array is a container for multiple values (or "elements"). For performance reasons (and perhaps simplicity), many programming languages require all elements in an array to be of the same type. This is not true for Python's built-in list type, making it less performant in some edge cases and otherwise easier and more flexible for the programmer.

Lists are denoted with square brackets and can be populated with raw values and/or existing variables. A list can also contain more lists, which are called nested lists:

$ python
> [hello, world]
['Hello', 'World']
> a = 1
> c = 3
> [a, 2, c]
[1, 2, 3]
> empty_list = []
> one_two = [1, 2]
> three_four = [3, 4]
> [empty_list, one_two, three_four]
[[], [1, 2], [3, 4]]

Lists have infinite practical uses, especially when we learn to iterate over them. To iterate over a list means to execute a block of code once for every item (element) in said list. We will see this in action once we talk about loops in the next article.

Dictionaries

Compared to the previous data types, dictionaries are not as common across programming languages but deserve an honorable mention due to their extensive use in web-related technologies. Dictionaries are key-value based, meaning that values in a dictionary are stored using a reference key. Let's see what this looks like:

$ python
> my_dict = { "a": 1, "b": 2, "c": 3 }
> my_dict["b"]
2
> this_article = {
... "category": "Computer Science",
... "year": 2021,
... "author": "Bjarki Sigurðsson",
... "tags": ["Binary", "Programming"],
... }
> this_article["tags"]
['Binary', 'Programming']

How does this relate to the web, then? Websites are sometimes stored in their entirity on a server. Usually, however, they rely on data from other parts of the internet. Websites can fetch this data using API endpoints. These can be hosted on the same server as the website itself or somewhere else entirely. Similar to how we view websites in a browser, websites fetch data from endpoints using HTTP requests. The responses to these requests can come in a number of formats, one of the most popular being JSON.

JSON stands for JavaScript Object Notation and looks very similar to a Python dictionary. JSON is popular since it is simple and flexible for humans and performs relatively well. One example of an API endpoint is a random quote generator hosted at api.quotable.io/random. A JSON response from this API will look something like this:

{
  "_id":"bNHmV_xSgwi",
  "tags":["famous-quotes"],
  "content":"Time you enjoy wasting, was not wasted.",
  "author":"John Lennon",
  "authorSlug":"john-lennon",
  "length":39
}

Armed with variables and these common data types, we can (and will) start writing some real computer programs in the next article!

$^1$ Since programming languages (like Python) are fundamentally an implementation of computer science concepts, these types have their counterparts in almost every programming language (not just Python).

$^2$ See Mental Arithmetic for Dummies (Using Binary Numbers) for examples of limitations imposed by the integer type.

$^3$ Try it yourself! The next article in this series will show you how to get started with an interactive Python console, either in your browser or in a terminal on your computer.

$^4$ Note that many computer languages use = to assign a value to a variable while == is used to check whether two values are equal.

ZiffurComputers and stuff, made simple.