I originally chose Erlang as the first functional programming language to attempt learning. However, the combination of different programming paradigm, obscure Prolog-like syntax, and unfamiliar concurrency paradigm made it particularly difficult for me to see the big picture of functional programming.
I'm actually familiar with Haskell now. This very site is written in Haskell, and I have worked on a few other Haskell projects. That said, Haskell is also a programming language theory (PLT) playground. To that end, GHC contains various Haskell language extensions which I'm not entirely familiar with. This page of notes will therefore not necessarily cover the more basic aspects and instead will cover language extensions, popular libraries, idiomatic patterns, modules, parallelism and concurrency, and so on.
It's possible to build an EPUB eBook of the GHC User Guide. You'll need to have the required dependencies to build the documentation. In my experience I ended up having to boot an Ubuntu VM because the xsltproc packages on arch were too new. It's necessary to build the documentation at least once to generate the necessary files, which unfortunately ends up building GHC itself it seems. See this IBM article for more information on building eBooks with docbook.
Many types can be reasoned about with a core set of type classes. This section goes over the typeclassopedia.
A functor represents a computational context. The functor laws ensure that the
fmap operation doesn't change the structure of the container, only its elements. The functor laws aren't enforced at the type level. Actually, satisfying the first law automatically satisfies the second.
A way of thinking of
fmap is that it lifts a function into one that operates over contexts. This is evident from the type signature that explicitly emphasizes the fact that it is partially applied.
Applicative functors lie somewhere between functors and monads in expressivity. While functors allow the lifting of a normal function to a function on a computational context, applicative functors allow for the application of a function within a computational context. The type class includes a function
pure for embedding values in a default, effect free context. Applicatives are functors by definition.
<*> function is essentially function application within a computational context.
It's straightforward enough to create a convenient infix function that can be used like the regular function application
$ by embedding a regular function in a computational context before applying it to the parameter:
However, this is what
fmap already does based on its type. For this reason, the
<$> convenience function is simply an infix alias for
Note that in a monadic context, the effects are applied sequentially, from left to right. This is makes sense because function application is left-associative, and each application of
<*>, which for monads is
ap which itself is
liftM2 id, has to extract the pure function and value from their respective monad 'container' 1, thereby performing the monad's effects.
There also exists functions
<* that sequence actions while discarding the value of one of the arguments: left and right respectively. These are immensely useful for monadic parser combinators such as those found in the Parsec library. For example,
*> is useful in Parsec to consume the first parser but return the second, similar to
>> in a monadic context.
Finally, there's a function that replaces all of the locations in the input context with the provided value. This is useful in Parsec when we want to parse some input, and if the input is successfully consumed, return something else, such as parsing a URL-encoded space with
' ' <$ char '+'.
Amdahl's law places an upper bound on potential speedup that may be available by adding more processors. The expected speedup can be described as a function of the number of processors and the percentage of runtime that can be parallelized . The implications are that the potential speedup increasingly becomes negligible with the increase in processors, but more importantly that most programs have a theoretical maximum amount of parallelism.
The ThreadScope tool helps visualize parallel execution, particularly for profiling. To use ThreadScope, the program should be compiled with the
-eventlog GHC option and then run with the
+RTS -l option to generate the eventlog. ThreadScope is then run on this event log:
Eval monad from Control.Parallel.Strategies expresses parallelism naturally. The
rpar combinator expresses that the argument can be evaluated in parallel, and
rseq forces sequential evaluation. Evaluation is to weak head normal form (WHNF), i.e. the outermost type constructor is evaluated. Notice that the monad is completely pure, with no need for the
For example, the following evaluates two thunks in parallel then waits for both to be evaluated before returning:
Consider the following parallelized
An argument to
rpar is called a spark. The runtime collects sparks in a pool and distributes them to available processors using a technique known as work stealing. These sparks are to be converted, i.e. evaluated in parallel, though this may fail in a variety of ways. The
+RTS -s option provides information about the sparks and what became of them during the execution of the program in the following form:
|converted||evaluated in parallel|
|overflowed||didn't fit in the spark pool|
|dud||already evaluated, ignored|
|GC'd||unused, garbage collected|
|fizzled||evaluated some place else|
The Control.Deepseq module contains various utilities for forcing the evaluation of a thunk. It defines the
NFData, i.e. normal-form data, type class. This type class has only one method,
rnf, i.e. reduce to normal-form, which defines how the particular type may be evaluated to normal form, returning
The module also defines a
deepseq function which performs a deep evaluation to normal form, similar to
seq which performs shallow evaluation to WHNF. There's also a more convenient
force function which evaluates the argument to normal form and then returns the result.
Evaluation strategies decouple algorithms from parallelism, allowing for parallelizing in different ways by substituting a different strategy. A
Strategy is a function in
Eval that returns its argument.
A strategy would take a data structure as input which is then traversed and evaluated with parallelism and returns the original value. For example a strategy for evaluating a pair tuple could look like:
This strategy could then be used either directly or using the
using combinator, which reads as use this expression by evaluating it with this strategy:
parPair always evaluates the pair components in parallel and always to WHNF. A parameterized strategy could take as arguments strategies to apply to the type's components. The following
evalPair function is so called because it no longer assume parallelism, instead delegating that decision to the passed in strategy.
It's then possible to define a
parPair function in terms of
evalPair that evaluates the pair's components in parallel. However, the
rpar strategy only evaluates to WHNF, restricting the evaluation strategy. We could instead use
rdeepseq---a strategy to evaluate to normal form---by wrapping
rparWith combinator allows the wrapping of strategies in this manner:
This then allows us to use
parPair to create a strategy that evaluates a pair tuple to normal form, which can then be used with the
Sometimes it may be required to not evaluate certain type components, which can be accomplished using the
r0 strategy. For example, the following evaluates only the first components of a pair of pairs, i.e.
parMap function can be defined in terms of evaluation strategies. The
parList function is a strategy that evaluates list elements in parallel. Defining
parList can take the same approach as before: define a parameterized strategy on lists called
evalList and then define a parameterized function
parList that performs
evalList in parallel. Both of these functions are already defined in Control.Parallel.Strategies:
Haddock is the documentation system that is most prevalent in the Haskell community. Documentation can be generated using the
haddock command or more commonly
Declarations can be annotated by beginning comments with
-- |, which applies the documentation to the following declaration in the source file. It's also possible to place annotations after a given declaration, in which case the caret
^ is used instead of the
| to denote an annotation.
Annotations can span multiple lines until the first non-comment line is encountered. It's also possible to use multi-line comments by opening them with
Chunks of documentation can be given a name with
$name and then included elsewhere.
One or more blank lines separates two paragraphs. Emphasis is denoted by surrounding text with a forward-slash
/, whereas bold text is denoted by surrounding the text with two underscores
__ Monospace text is denoted by surrounding it with
@. Other markup is valid inside each of these, for example,
@'f' will hyperlink the identifier
f within the monospace text.
Links can be inserted using
<url label> syntax, although Haddock automatically links free-standing URLs. It's also possible to link to other parts of the same page with
Images can be embedded with
<<path.png title>> syntax.
It's possible to link to Haskell identifiers that are types, classes, constructors, or functions by surrounding them with single quotes. If the target is not in the scope, they may be referenced by fully qualifying them.
Alternatively, it's possible to link to a module entirely by surrounding the name with double quotes.
Code blocks may be inserted by surrounding the paragraph with
@ signs, where its content is interpreted as normal markup. Alternatively, it's possible to do so by preceding each line with a
>, in which case the text is interpreted literally.
It's possible to denote REPL examples with
>>>, followed by the result.
Unordered lists are possible by simply preceding the paragraph with a
-. Ordered lists are possible by preceding each item with
Haddock accepts some comma-separated list of options that affect how it generates documentation for that module, much like
LANGUAGE pragmas in Haskell.
|hide||omits module; doesn't affect re-exported definitions|
|prune||omits definitions with no annotations|
|ignore-exports||ignore export list; all top-level declarations are exported|
|not-home||module shouldn't be considered to be home module|
|show-extensions||include all language extensions used in the module|
Haskell is a PLT playground, and as a result GHC has available a multitude of language extensions. I found a series of articles that cover some of the more popular extensions.
Record puns allows for the easy creation of bindings of the same name as their field.
This extension allows the use of two dots
.. to automatically create bindings for all fields at that location, named the same as the fields they bind.
Tuple sections are a straightforward extension that allows tuples to be used in sections.
This allows one to explicitly specify the package from which to import a module, which is useful when needed for disambiguation purposes.
This allows string literals to be polymorphic over the
IsString type class which is implemented by types like
ByteString, making it less of a pain to construct values of those types.
This provides a shorthand for a lambda containing a case match.
This allows guard syntax to be used with
if expressions to avoid excessive nesting and repeating of the
This allows to evaluate to a value to weak head normal form before matching it.
A view pattern
e -> p applies the view
e to the argument that would be matched, yielding a result which is matched against
Pattern guards are a generalization of guards which allow pattern guardlets aside from the more common boolean guardlets. Pattern guardlets are of the form
p <- e which is fulfilled (i.e. accepted) when
e matches against
p. Further guardlets can be chained with commas. As usual, guardlets are tested in order from top to bottom before the guard as a whole is accepted.
This allows to explicitly annotate universal quantification in polymorphic type signatures. On its own this extension isn't very useful. Instead, it's required by many other extensions.
This allows type variables to have scope so that nested scopes' type signatures can close over them and refer to them. Without the extension, the
a in the type signature of
nubbed would not only be different from
func's type signature, but also different from themselves.
This relaxes restrictions on type synonyms so that type signatures are checked until after all type synonyms have been expanded, so that something like this is possible:
This allows the nesting of explicit universal quantifications within function types and data definitions. In the following example, the difference between the signatures is that the first signature specifically applies the universal quantification to the passed in function, whereas the second signature applies the universal quantification to the whole signature.
What this means is that the second signature would accept any function from
n -> n that applies for some
Num n, but the first signature requires a function from
n -> n that applies for every
(+1) would be valid for both signatures, since every
(/5) would only be valid for the second signature because it's a function from
n -> n that only applies for some
Num n, particularly the subset of
Num n that also implements
Fractional. Since not every
Num n implements
Fractional, that function is not valid for the first signature.
Note: This extension deprecates the less general extensions
When this extension is used in conjunction with
LiberalTypeSynonyms, it allows the universal quantification and/or constraints within type synonyms as well as the application or partial application of a type synonym to a type containing universal quantification and/or constraints.
This allows the definition of types with no constructors, which is useful for use as a phantom parameter to some other type.
Phantom types are parameterized types where some of the parameters only appear on the RHS. They are often used to encode information at the type level to make code much more strict, usually paired with empty data types.
Consider an API for sending encrypted messages. It's possible to encode---at the type-system level---that only encrypted messages may be sent using the
send function, thus preventing plain-text messages from being sent. This is further enforced by making the
Message constructor private and exposing a single constructor to build a plain-text message, such as
- Categories from Scratch
- Category Theory on Youtube
- What I Wish I Knew
- Learning Haskell
- CS 194
- Performance profiling with ghc-events-analyze
- Haskell Wikibooks - Category Theory
- Getting it Done with Haskell
Or mobit as some have taken to calling values that monads manage.