Welcome back to yet another foray into the wonderful world of Haskell, where this time, we're turning our gaze towards basic data types. The book presents this quote by Robin Milner at the start of the chapter:
There are many ways of trying to understand programs. People often rely too much on one way, which is called “debugging” and consists of running a partly-understood program to see if it does what you expected. Another way, which ML advocates, is to install some means of understanding in the very programs themselves.
Note that in this context, ML is not machine learning, but the language known as ML. If you don't know who Robin Milner was (I didn't), Wikipedia has this to say about him:
He developed LCF, one of the first tools for automated theorem proving. The language he developed for LCF, ML, was the first language with polymorphic type inference and type-safe exception handling. In a very different area, Milner also developed a theoretical framework for analyzing concurrent systems, the calculus of communicating systems (CCS), and its successor, the pi-calculus.
So I'd say he's reasonably influential.
But that's enough of a history lesson for now. Let's move on to some more theory, shall we?
Let's start right at the beginning: what are types? In short, types are sets of values, where a set is a collection with no duplicate entries. Every expression---when evaluated---yields a value, and that value has a type. Each type is defined by the members (the values) that inhabit (are part of) the type. Boolean values, for instance, have two inhabitants:
False. Integers have an infinite number of inhabitants, every integral value from negative infinity to positive infinity.
Types help us group multiple values that have something in common. It could be a specific domain or model or something more abstract. Whatever it is, they let us talk about a set of values under a shared name.
Data Types in Haskell are defined by data declarations. Data declarations consist of a type constructor with belonging data constructors.
A type constructor is the name of the type, such as
Int, and is used in type signatures, or the type level of your code.
Data constructors are the values that inhabit the type they are defined in, such as the aforementioned
Bool. These are the values that appear in the logic of your code, at the so-called term level.
Let's look at the data declaration of the
Bool data type to make it clearer:
data Bool = False | True --    
- The name of the type, aka the type constructor. This is what you see in function signatures.
- Data constructor for the value
- A pipe operator, indicating that this is a sum type (or discriminated union). This tells us that a
Boolvalue is either
True, but never both.
- Data constructor for the value
This is a very simple data declaration. Others may take arguments or use product types instead of sum types, but what they all have in common is the
data keyword followed by the name of the type. Most are followed by an equals operator and a set of data constructors.
We've worked a bit with numeric types previously, but never really looked too deeply at them. They work quite differently in Haskell than what they do in a number of other languages, so let's see what we can find.
The book lists two categories of numbers, which each have multiple types:
- Integral numbers ::
- Fixed-precision integer. Has a minimum and a maximum value.
- Integer that supports arbitrarily large and small numbers.
- Fractional numbers ::
- Single precision floating point numbers. Like in most languages, limited in precision.
- Double-precision floating point number. Twice as many bits as a
Float. Still limited in precision.
- Fractional number that represents a ratio of two integers. A value such as
x / y :: Rationalwill be a value consisting of two
Integervalues and represents the ratio of $x$ to $y$. Arbitrarily precise.
- Space efficient and almost arbitrarily precise. Represented using scientific notation, storing the coefficient as an
Integerand the exponent as an
Intis bounded, there is technically a limit to the size of the number, but hitting it is very unlikely. More efficient, but less precise than
All these data types have instances of a typeclass called
Num (more about typeclasses in the coming chapters), which lets us use any which one of them for most functions that only require standard operations.
Choosing an integral data type
As noted above,
Integer can represent arbitrarily large integers, while
Int (and the related
Int16, and so forth) can only express as many as their predefined size allows them, falling back to overflow wrapping if they exceed their limits.
Because of this, the book advises us to always choose
Int unless we really understand the limitations and the additional performance has an impact.
Of the common fractional types mentioned above, three of them come bundled with GHC, while the fourth,
Scientific, comes from an external library.
As was the case with the integral numbers, we are advised to stick to the most precise or efficient type we can use. This would usually be
Scientific, but for certain scenarios, such as graphics programming, you might want to use
Certain mathematical operations require the inputs to be fractional rather than just numeric; division being a great example. The type of the division operator is
(/) :: Fractional a => a -> a -> a. The
Fractional a => part tells us that this is a typeclass constraint, and that the type
a must have an instance of this typeclass.
Num typeclass is a supertype of the
Fractional typeclass, meaning that to create an instance of
Fractional, the type must also define an instance of
Num. In other words: all
Fractional instances are also instances of
Num, while the opposite is not true.
Like most languages, Haskell defines a number of operators to perform comparisons. Most of these are the same as you see in most languages, but the inequality operator may be a bit different. Anyway, for your viewing pleasure, here they are:
- Less than
- Greater than
- Less than or equal to
- Greater than or equal to
- Equal to
- Not equal to
The equality operators take two values that have instances of the
Eq typeclass, and the comparison operators take instances of the
Ord typeclass, which is a subclass of
Eq. As such, an
Ord instance can be used with any of the above operators, while an
Eq instance can only be used with the equality operators.
if ... then ... else ...
Haskell operates with ~if~ expressions, rather than ~if~ statements. What this means is that the expression itself evaluates to a value; so you can assign a variable to an
if expression, for example.
The syntax is:
if CONDITION then EXPR_A else EXPR_B
An assignment might look like this:
-- the value of a will evaluate to 2 let a = if True then 2 else 0
This is similar to how the ternary operator (
?:) works in most C-like languages, and much in the same way, you must have an else clause. This is because it is an expression, and an expression must evaluate to a value.
Tuple type is a way to pack multiple values together and pass them around. Tuple syntax is the same at both type and term levels and is easily distinguished from other types by its distinct look:
(x, y), a set of parentheses surrounding two or more values or types separated by commas.
At the type level, the tuple would contain types---e.g.
(Int, Bool)---, while at the term level, it would contain expressions:
(x, y), etc.
A tuple has a fixed number of constituents (parts, members). A tuple with two element is called a pair or a tuple (
(a, b)), while a three-element tuple is known as a three-tuple or a triple (
(a, b, c)). The number of constituents is known as the tuple's arity.
The data declaration for a pair can be found by looking up the
(,) operator in the GHCi:
data (,) a b = (,) a b
This is quite different from what we saw earlier with the
Bool declaration: It takes two parameters, represented by type variables
b, and it's a so-called product type, rather than a sum type, meaning that the number of possible combinations is the product of the number of possible values for each of the types it's applied to.
The two-element tuple comes with some default convenience functions in Haskell,
snd, which gets the first or second element of the tuple, respectively. Their type signatures are:
fst :: (a, b) -> a snd :: (a, b) -> b
This way to describe the types of the tuples also extends to when we use them in the function implementation, where we can destructure it and name and use the elements directly. The two above functions can be implemented like this:
fst :: (a, b) -> a fst (a, b) = a snd :: (a, b) -> b snd (a, b) = b
This destructuring and matching on the shape of the data is known as pattern matching, and is something we'll see a lot more of further down the road.
We had a little run-in with lists last time, and they're only barely mentioned in this chapter. The reason is that there's a full chapter devoted to lists coming up, so we'll save the deep-dive for then.
The one thing they do mention this time, though, is that, much like tuples, lists have special syntax too, where the type or the values are enclosed in square brackets, e.g.
[Integer] for describing the type of a list of integral values, and
[1, 2, 3, 4] to construct one of them.
As opposed to tuples, they can only take values of a single type, and there is no limit to how many or how few elements a list can have.
Names and variables
To close off the chapter, there is a little section on naming conventions in Haskell.
Functions, when used as parameters, are typically labeled with variables starting with
f, continuing alphabetically (
h, ....). May also have numbers (
f2) or an ending apostrophe (
f'~)---pronounced /f-prime/---if they're closely related to a function labeled ~f.
They may also be given variable names that describe what they do, such as a function fetching text from some source being called
Type variables, the ones in type signatures, are commonly given names starting from
a and going along alphabetically.
Arguments to functions usually go from
x, though they might also have some other letter assigned to serve a mnemonic function (such as
n for a number or
r for the radius of a circle).
Of course, variable names don't have to be single letters, but in smaller programs they usually are. In more domain-specific code, it might make more sense to use longer names.
If you have a list of things, it's common to use a name such as
xs, especially if one element from the list might be called
x (as if
xs was the plural form of
This is demonstrated by the commonly seen construct
(x:xs) which is a way to pattern match on a list using the cons operator we saw last time. This can be used to destructure the list if it has at least one element, assigning the variable
x to the first element, and
xs to the rest of the list:
sum :: Num a => [a] -> a sum  = 0 sum (x:xs) = x + sum xs
The above function will recursively sum a list by first checking whether it's empty. If it is, it returns $0$. Otherwise, it returns the sum of the current head and the value of the sum of the rest of the list.
To tide us over until next time, this chapter provides a bunch of juicy definitions!
- An ordered grouping of values. You cannot have a tuple with only one element, but the type unit or
()can be thought of as a zero-element tuple.
- A set of operations defined for a polymorphic type. If a type has an instance of a typeclass, it can be used in any function requiring a value of that typeclass. In Haskell, you can only define one instance of any one typeclass for a given data type. In other words, type
acan have at most one instance of
- Data constructors
- How we create values that inhabit a type in Haskell. Can be constant values (
True), or take one or more arguments (
(,) a b)
- Type constructors
- The name of a type. Can only be used in type signatures
- Data declarations
- Definition of a data type in Haskell. Consists of a type constructor and zero or more data constructors.
- Type alias
- A way to refer to a type constructor or type constant by a different name, usually to communicate something more specific.
type Name = Stringwould let you use
Namein place of
Stringin your code, allowing you to say something more about what kind of string you want on the type level.
- The number of arguments a function accepts. In Haskell (lambda calculus), all functions are actually 1-arity (unary) due to currying, but we tend to let Haskell abstract that away and think of the total number of arguments needed to fully evaluate the function.
- The ability to write code using values that may be one of several, or any, type. There are two types of polymorphism: parametric and constrained.
Parametric polymorphism is when a function can take any type of value because it doesn't use any information about it. Parametric polymorphism is easily recognised by there being no constraints placed on the type variable.
Constrained (or bounded) polymorphism is when the valid set of types is constrained by a typeclass, such as for the equality operators that require their arguments to have an instance of
Whew; well done for making it to the end! There was a lot to get through this time around, but it's been very insightful. Next time we'll be looking more at types in Haskell, exploring constraints and polymorphism.