Getting sick of this cover image yet?

Welcome back to yet another foray into the wonderful world of Haskell, where this time, we're turning our gaze towards basic data types. The book presents this quote by Robin Milner at the start of the chapter:

There are many ways of trying to understand programs. People often rely too much on one way, which is called “debugging” and consists of running a partly-understood program to see if it does what you expected. Another way, which ML advocates, is to install some means of understanding in the very programs themselves.

Note that in this context, ML is not machine learning, but the language known as ML. If you don't know who Robin Milner was (I didn't), Wikipedia has this to say about him:

He developed LCF, one of the first tools for automated theorem proving. The language he developed for LCF, ML, was the first language with polymorphic type inference and type-safe exception handling. In a very different area, Milner also developed a theoretical framework for analyzing concurrent systems, the calculus of communicating systems (CCS), and its successor, the pi-calculus.

So I'd say he's reasonably influential.

But that's enough of a history lesson for now. Let's move on to some more theory, shall we?


Let's start right at the beginning: what are types? In short, types are sets of values, where a set is a collection with no duplicate entries. Every expression---when evaluated---yields a value, and that value has a type. Each type is defined by the members (the values) that inhabit (are part of) the type. Boolean values, for instance, have two inhabitants: True and False. Integers have an infinite number of inhabitants, every integral value from negative infinity to positive infinity.

Types help us group multiple values that have something in common. It could be a specific domain or model or something more abstract. Whatever it is, they let us talk about a set of values under a shared name.

Data declarations

Data Types in Haskell are defined by data declarations. Data declarations consist of a type constructor with belonging data constructors.

A type constructor is the name of the type, such as Bool or Int, and is used in type signatures, or the type level of your code.

Data constructors are the values that inhabit the type they are defined in, such as the aforementioned True and False for Bool. These are the values that appear in the logic of your code, at the so-called term level.

Let's look at the data declaration of the Bool data type to make it clearer:

data Bool = False | True
--   [1]    [2]  [3] [4]
  1. The name of the type, aka the type constructor. This is what you see in function signatures.
  2. Data constructor for the value False.
  3. A pipe operator, indicating that this is a sum type (or discriminated union). This tells us that a Bool value is either False or True, but never both.
  4. Data constructor for the value True.

This is a very simple data declaration. Others may take arguments or use product types instead of sum types, but what they all have in common is the data keyword followed by the name of the type. Most are followed by an equals operator and a set of data constructors.

Numeric types

We've worked a bit with numeric types previously, but never really looked too deeply at them. They work quite differently in Haskell than what they do in a number of other languages, so let's see what we can find.

The book lists two categories of numbers, which each have multiple types:

  • Integral numbers ::
  • Int
    Fixed-precision integer. Has a minimum and a maximum value.
    Integer
    Integer that supports arbitrarily large and small numbers.
  • Fractional numbers ::
  • Float
    Single precision floating point numbers. Like in most languages, limited in precision.
    Double
    Double-precision floating point number. Twice as many bits as a Float. Still limited in precision.
    Rational
    Fractional number that represents a ratio of two integers. A value such as x / y :: Rational will be a value consisting of two Integer values and represents the ratio of $x$ to $y$. Arbitrarily precise.
    Scientific
    Space efficient and almost arbitrarily precise. Represented using scientific notation, storing the coefficient as an Integer and the exponent as an Int. Because Int is bounded, there is technically a limit to the size of the number, but hitting it is very unlikely. More efficient, but less precise than Rational.

All these data types have instances of a typeclass called Num (more about typeclasses in the coming chapters), which lets us use any which one of them for most functions that only require standard operations.

Choosing an integral data type

As noted above, Integer can represent arbitrarily large integers, while Int (and the related Int8, Int16, and so forth) can only express as many as their predefined size allows them, falling back to overflow wrapping if they exceed their limits.

Because of this, the book advises us to always choose Integer over Int unless we really understand the limitations and the additional performance has an impact.

Fractional numbers

Of the common fractional types mentioned above, three of them come bundled with GHC, while the fourth, Scientific, comes from an external library.

As was the case with the integral numbers, we are advised to stick to the most precise or efficient type we can use. This would usually be Rational or Scientific, but for certain scenarios, such as graphics programming, you might want to use Float.

Certain mathematical operations require the inputs to be fractional rather than just numeric; division being a great example. The type of the division operator is (/) :: Fractional a => a -> a -> a. The Fractional a => part tells us that this is a typeclass constraint, and that the type a must have an instance of this typeclass.

The Num typeclass is a supertype of the Fractional typeclass, meaning that to create an instance of Fractional, the type must also define an instance of Num. In other words: all Fractional instances are also instances of Num, while the opposite is not true.

Comparison

Like most languages, Haskell defines a number of operators to perform comparisons. Most of these are the same as you see in most languages, but the inequality operator may be a bit different. Anyway, for your viewing pleasure, here they are:

~(<)~
Less than
~(>)~
Greater than
~(<=)~
Less than or equal to
~(>=)~
Greater than or equal to
~(==)~
Equal to
~(/=)~
Not equal to

The equality operators take two values that have instances of the Eq typeclass, and the comparison operators take instances of the Ord typeclass, which is a subclass of Eq. As such, an Ord instance can be used with any of the above operators, while an Eq instance can only be used with the equality operators.

if ... then ... else ...

Haskell operates with ~if~ expressions, rather than ~if~ statements. What this means is that the expression itself evaluates to a value; so you can assign a variable to an if expression, for example.

The syntax is:

if CONDITION
then EXPR_A
else EXPR_B

An assignment might look like this:

-- the value of a will evaluate to 2
let a = if True then 2 else 0

This is similar to how the ternary operator (?:) works in most C-like languages, and much in the same way, you must have an else clause. This is because it is an expression, and an expression must evaluate to a value.

Tuples

The Tuple type is a way to pack multiple values together and pass them around. Tuple syntax is the same at both type and term levels and is easily distinguished from other types by its distinct look: (x, y), a set of parentheses surrounding two or more values or types separated by commas.

At the type level, the tuple would contain types---e.g. (Int, Bool)---, while at the term level, it would contain expressions: (1, True), (x, y), etc.

A tuple has a fixed number of constituents (parts, members). A tuple with two element is called a pair or a tuple ((a, b)), while a three-element tuple is known as a three-tuple or a triple ((a, b, c)). The number of constituents is known as the tuple's arity.

The data declaration for a pair can be found by looking up the (,) operator in the GHCi:

data (,) a b = (,) a b

This is quite different from what we saw earlier with the Bool declaration: It takes two parameters, represented by type variables a and b, and it's a so-called product type, rather than a sum type, meaning that the number of possible combinations is the product of the number of possible values for each of the types it's applied to.

The two-element tuple comes with some default convenience functions in Haskell, fst and snd, which gets the first or second element of the tuple, respectively. Their type signatures are:

fst :: (a, b) -> a
snd :: (a, b) -> b

This way to describe the types of the tuples also extends to when we use them in the function implementation, where we can destructure it and name and use the elements directly. The two above functions can be implemented like this:

fst :: (a, b) -> a
fst (a, b) = a

snd :: (a, b) -> b
snd (a, b) = b

This destructuring and matching on the shape of the data is known as pattern matching, and is something we'll see a lot more of further down the road.

Lists

We had a little run-in with lists last time, and they're only barely mentioned in this chapter. The reason is that there's a full chapter devoted to lists coming up, so we'll save the deep-dive for then.

The one thing they do mention this time, though, is that, much like tuples, lists have special syntax too, where the type or the values are enclosed in square brackets, e.g. [Integer] for describing the type of a list of integral values, and [1, 2, 3, 4] to construct one of them.

As opposed to tuples, they can only take values of a single type, and there is no limit to how many or how few elements a list can have.

Names and variables

To close off the chapter, there is a little section on naming conventions in Haskell.

Functions, when used as parameters, are typically labeled with variables starting with f, continuing alphabetically (g, h, ....). May also have numbers (f1, f2) or an ending apostrophe (f'~)---pronounced /f-prime/---if they're closely related to a function labeled ~f.

They may also be given variable names that describe what they do, such as a function fetching text from some source being called txt.

Type variables, the ones in type signatures, are commonly given names starting from a and going along alphabetically.

Arguments to functions usually go from x, though they might also have some other letter assigned to serve a mnemonic function (such as n for a number or r for the radius of a circle).

Of course, variable names don't have to be single letters, but in smaller programs they usually are. In more domain-specific code, it might make more sense to use longer names.

If you have a list of things, it's common to use a name such as xs, especially if one element from the list might be called x (as if xs was the plural form of x).

This is demonstrated by the commonly seen construct (x:xs) which is a way to pattern match on a list using the cons operator we saw last time. This can be used to destructure the list if it has at least one element, assigning the variable x to the first element, and xs to the rest of the list:

sum :: Num a => [a] -> a
sum [] = 0
sum (x:xs) = x + sum xs

The above function will recursively sum a list by first checking whether it's empty. If it is, it returns $0$. Otherwise, it returns the sum of the current head and the value of the sum of the rest of the list.

Definitions

To tide us over until next time, this chapter provides a bunch of juicy definitions!

Tuple
An ordered grouping of values. You cannot have a tuple with only one element, but the type unit or () can be thought of as a zero-element tuple.
Typeclass
A set of operations defined for a polymorphic type. If a type has an instance of a typeclass, it can be used in any function requiring a value of that typeclass. In Haskell, you can only define one instance of any one typeclass for a given data type. In other words, type a can have at most one instance of Eq.
Data constructors
How we create values that inhabit a type in Haskell. Can be constant values (True), or take one or more arguments ((,) a b)
Type constructors
The name of a type. Can only be used in type signatures
Data declarations
Definition of a data type in Haskell. Consists of a type constructor and zero or more data constructors.
Type alias
A way to refer to a type constructor or type constant by a different name, usually to communicate something more specific. type Name = String would let you use Name in place of String in your code, allowing you to say something more about what kind of string you want on the type level.
Arity
The number of arguments a function accepts. In Haskell (lambda calculus), all functions are actually 1-arity (unary) due to currying, but we tend to let Haskell abstract that away and think of the total number of arguments needed to fully evaluate the function.
Polymorphism
The ability to write code using values that may be one of several, or any, type. There are two types of polymorphism: parametric and constrained.

Parametric polymorphism is when a function can take any type of value because it doesn't use any information about it. Parametric polymorphism is easily recognised by there being no constraints placed on the type variable.

Constrained (or bounded) polymorphism is when the valid set of types is constrained by a typeclass, such as for the equality operators that require their arguments to have an instance of Eq.

Summary

Whew; well done for making it to the end! There was a lot to get through this time around, but it's been very insightful. Next time we'll be looking more at types in Haskell, exploring constraints and polymorphism.



Thomas Heartman is a developer, writer, speaker, and one of those odd people who enjoy lifting heavy things and putting them back down again. Preferably with others. Doing his best to gain and share as much knowledge as possible.