Quick tour of Haskell

To start with development, first we should be familiar with a few basic features of Haskell. We really need to know about laziness, datatypes, pattern matching, type classes, and basic notion of monads to start with Haskell.

Laziness Haskell is a language with lazy evaluation. From a programmer's point of view that means that the value is evaluated if and only if it is really needed. Imperative languages usually have a strict evaluation, that is, function arguments are evaluated before function application. To see the difference, let's take a look at a simple expression in Haskell: let x = 2 + 3 In a strict or eager language, the 2 + 3 expression would be immediately evaluated to 5, whereas in Haskell, only a promise to do this evaluation will be created and the value will be evaluated only when it is needed. In other words, this statement just introduces definition of x which might be used afterwards, unlike in strict language where it is an operator that assigns the computed value, 5 to a memory cell named x . Also, this strategy allows sharing of evaluations, because laziness assumes that computations can be executed whenever it is needed and therefore, the result can be memorized. It might reduce the running time by exponential factor over strict evaluation. Laziness also allows to manipulate with infinite data structures. For instance, we can construct an infinite list of natural numbers as follows: let ns = [1..] And moreover, we can manipulate it as if it is a normal list, even though some caution is needed, as you can get an infinite loop. We can take the first five elements of this infinite list by means of the built-in function, take as follows: take 5 ns By running this example in GHCi you will get [1,2,3,4,5].

Functions as first-class citizens The notion of function is one of the core ideas in functional languages and Haskell is not an exception at all. The definition of a function includes a body of function and an optional type declaration. For instance, the take function is defined in Prelude as follows: take :: Int -> [a] -> [a] take = ... Here, the type declaration says that the function takes an integer as the argument and a list of objects of the a type, and returns a new list of the same type. Also Haskell allows partial application of a function. For example, you can construct a function that takes first five elements of the list as follows: take5 :: [a] -> [a] take5 = take 5 Also functions are themselves objects, that is, you may pass a function as an argument to another function. Prelude defines map function as a function of a function: map :: (a -> b) -> [a] -> [b] map takes a function and applies it to each element of the list. Thus functions are first-class citizens in the language and it is possible to manipulate with them as if they were normal objects.

Datatypes Datatype is the core of a strongly-typed language as Haskell. The distinctive feature of Haskell datatypes is that they all are immutable; that is, after an object is constructed it cannot be changed. It might be weird for the first sight, but in the long run, it has several positive consequences. First, it enables computation parallelization. Second, all data is referentially transparent; that is, there is no difference between the reference to an object and the object itself. These two properties allow the compiler to reason about code optimization at a higher level than what the C/C++ compiler can. Now, let's turn to our domain model to show how datatypes are defined in Haskell. Consider data model in a quote-driven market. The market maker posts bid and ask quotes. We can express these facts in Quote.hs . The following are the three common ways to define datatypes in Haskell: The first declaration just creates a synonym for an existing datatype and type checker won't prevent you from using Integer instead of TimeStamp . Also, you can use a value of the TimeStamp type with any function that expects to work with an Integer datatype.

The second declaration creates a new type for prices and you are not allowed to use Double instead of Price . The compiler will raise an error in such cases and thus it will enforce the correct usage.

The last declaration introduces Algebraic Data Type (ADT) and says that the Quote type might be constructed either by the AskQuote data constructor, or by BidQuote with timestamp and price as its parameters. The Quote itself is called as a type constructor. Type constructor might be parameterized by type variables. For example, the Maybe and Either types quite oftenly used standard types, and are defined in the Quotes2.hs file. Here, a and b are type variables; that is, any type can be placed instead of them there. The Maybe type is often used to represent calculation that might have ended without results. If the logarithm is defined only on the positive half of the real line, we can define a type-safe version of the logarithm function in Log.hs . This version will always remind you to handle the NaN operation in your code. What is more, the compiler will make an exhaustive check to see if all paths are covered, and it will warn you if not. Please, also note that we used a new syntactic construction, a guard, to define conditions of function application. You can use any expression evaluating to Bool in guards. Also, the compiler checks to see if all the right-hand definitions have the same type, even if the type is not declared explicitly. It ensures that the function type is unambiguous. The Either type is commonly used for functions that can result either with an error, or with a value, and the Right constructor used to hold "right" values in LogEither.hs . In fact, data constructors are normal functions in Haskell that return or construct a new object. If you query GHCi about type of the Left or Right constructor, it will print out the following types: Prelude> :t Left Left :: a -> Either a b Prelude> :t Right Right :: b -> Either a b This fact allows us to pass a data constructor as the function argument, create a partially "constructed" datatype, and everything else that is possible to do with common functions in Haskell.

Type classes Type classes in Haskell are not classes as in object-oriented languages. It is more similar to the interfaces with optional implementation. You can find them in other languages as traits, mixins, and so on but unlike in them, this feature in Haskell enables ad-hoc polymorphism; that is, a function can be applied to arguments of different types. It is also known as function overloading, or operator overloading. A polymorphic function can specify different implementation for different types. By principle, the type class consists of function declarations over some objects. The Eq type class is a standard type class that specifies how to compare two objects of the same type as given in Eq.hs . Here, Eq is the name of the type class, a is a type variable and == and /= are the operations defined in the type class. This definition means that some type, a is of Eq class if it has defined the == and /= operation. Moreover, the definition provides default implementation of the /= operation. And if you decide to implement this class for a datatype, then you need to provide single operation implementation. For example, we might make Price as an instance of the Eq type class as given in EqPrice.hs . We have just defined a quite naÃ¯ve implementation of the Eq type class. With such definition, we can start to use functions that work over any type that implements the Eq type class. For instance, Prelude has the following functions for searching in lists that rely on the implementation of Eq : elem :: (Eq a) => a -> [a] -> Bool notElem :: (Eq a) => a -> [a] -> Bool lookup :: (Eq a) => a -> [(a,b)] -> Maybe b The constraint (Eq a) => requires the a type to have the Eq type class implemented. The first function checks if the element is in the list. The second one is a negation of an element. And the last one looks up for a key in the list of tuples. But it is important to note that type classes help to write very generic functions that work across many types just by putting a minimal required constraint on them. In object-oriented languages, it is usually implemented by interfaces, though Haskell allows you to extend library-defined datatype by implementing any type class. There are a large number of useful type classes defined in Prelude . For example, Eq is used to define equality between two objects, Ord is used to specify total ordering, Show and Read are used to introduce a string representation of the object, and the Enum type class is used to describe enumerations, that is, datatypes with null constructors. It might be quite boring to implement this quite trivial but useful type classes for each datatype, so Haskell supports automatic derivation of most of the standard type classes. This is given in Price.hs. Now, objects of the Price type can be converted from/to string by means of read/show functions and compared with themselves using equality operators. We will not cover all the details of the derivation. Those who want to know all the details can look them up in the Haskell language report.

Pattern matching Pattern matching is a core feature that makes Haskell so concise and readable. In simple words, you should specify a pattern to which some input should conform and then provide implementation. Consider the classic example of factorial as given in Pattern1.hs . We can see that the factorial function accepts objects that can be compared for equality, specified by the Eq type class, and are of type class Num , or numbers. Then, if the input is zero, the output is one. And after that for all the other inputs, the function is defined recursively, which means by itself. Recursion is an important construct in Haskell, because the language supports declarative description of computations; that is, you need to specify what should be computed but not how it should be done. That is why there is no for and while loops; you should use recursion instead of them. The compiler has optimization techniques for recursion, and it doesn't introduce a penalty that is usual for imperative languages in the form of stack overflows. Although imperative languages usually have a tail recursion optimization built into compilers, GHC supports a generalized tail recursion that covers a lot of cases. So it is usually better to pay more attention to readability and correctness of the recursion code than to achieve pure tail recursion. The order of pattern matching is quite important; here we will explicitly state that the first rule should be used before the second. If you swap these two declarations, the compiler will display the following warning: Warning: Pattern match(es) are overlapped In an equation for `factorial': factorial 0 = ... The compiler warns you that the last pattern has been already matched by the previous declaration as given in Pattern2.hs . Also the datatype structure needs to be matched. Here, we will provide a "special" treatment of zero prices and only then a standard implementation, Pattern3.hs . It is also permitted to omit some parts that are not important for pattern matching, as it is done with TimeStamp , and it is used in nested matching for deconstructing inner datatypes.

Monads The term monad comes from the category theory, which is a branch of mathematics that formalizes mathematics itself, and its concepts as a collection of objects and arrows. But from the point of view of a Haskell programmer, it is better to think of a monad as an abstract datatype that represents computation and supports common semantics. Monad defines how to chain computations of this type together. Thus, it allows a programmer to build computation pipelines to process data in steps. You can find monadic characteristics in many programming languages. Sometimes they are named as "programmable semicolons" by analogy from imperative languages, where semicolons are used to chain individual statements. The practical side of monads is that they allow to hide state. In fact, they are required only for I/O code but usage of monads simplifies a code that passes the state around. Thus, we don't need to write functions that takes input and state and returns output and, probably, the new state. A monad contains a type constructor and two operations, return and bind ( >>= ). The return operation injects a plain value into the monad. The bind operation does the reverse process, that is, it extracts the value and passes it to the next function in the pipeline. But, it is worth mentioning that monads don't need to provide a way to get values outside them. The real-world definition of monad is extended by the fail operation and some shortcuts of the bind operation. MonadPrelude.hs The Prelude lists and Maybe are also monads. We can use the Maybe type as a primitive type of checked exception; that is, at any point if the computation fails, then the rest of computation should be skipped and return Nothing , otherwise, if all steps are successful, then the final result should be Just x for some value x . Consider a simple example of summation of two Maybe values. Let's try to write an explicit version of such function: SumExplicit.hs Here we define an explicit type of function; this function takes two numbers of type Num class wrapped into the Maybe monad and returns their sum in case of success. As you can see, it is a quite lengthy and ugly definition of such sum. We have to duplicate the same case logic twice for each argument. Let's take a look at how Maybe implements a monad type class in the Prelude list: MaybePrelude.hs If the value is Just x for some x , bind is defined as f applied to x. If the value is Nothing , then nothing can be passed on to f , and hence Nothing is returned. This means that once a function in a chain of monadic Maybe operations returns Nothing , the whole chain will return Nothing . The Just data constructor is a return operation. So let's rewrite summation using bind and return operations: SumMonad.hs It becomes more cryptic, but it is definitely a shorter version of the same function. To avoid such secret messages, Haskell has a do syntactic sugar for monads: SumClear.hs Now it is significantly better. The rule of thumb is that you should remember that implicit bind operations are inserted between lines.