Recently I've been thinking about my opinions on the various languages I know, particularly with regards to which I should focus on, and I decided that knowing a JVM language would be very beneficial because of how robust and time-proven the JVM is, especially compared to other language VMs. For this reason I considered Scala and Clojure, and Scala seemed more like Haskell to me so I decided to go with that one first.
I didn't have an overwhelming reaction to Scala, so I decided to give Clojure a shot as well to better compare them. My main resource is the book Clojure Programming. There's also Clojure for the Brave and True.
Clojure and other lisp languages are homoiconic, often referred to as "code as data," which means that the way the code is written is itself the abstract syntax tree (AST) of the program. This makes creating embedded domain specific languages (EDSLs) very straightforward, as well as simply making the code easier to reason about. Questions of precedence, for example, are directly encoded into the code.
Because Clojure code is itself an AST in a Clojure data structure, metaprogramming is also more powerful because it simply involves manipulating that data structure; this is the basis of macros.
Clojure AST structures can be deserialized into Clojure structures using the
read-like functions. In the following examples, the structures are printed back out by the REPL using the
pr-str function. The fact that serialization of Clojure structures is this straightforward is what drives most Clojure developers to use it as the primary serialization mechanism.
Note: the Clojure REPL always starts in the default
The reader allows for different syntax to make code more concise. For example, evaluation of a form can be suppressed by prefixing it with a quote
'. Anonymous function literals can be defined with
Characters are denoted by a blackslash, as in
\c, and they natively support Unicode. It's also possible to use special characters such as
\n would be used in strings, but individually:
Keywords are similar to Ruby/Scala symbols and Erlang atoms. They are prefixed by a colon
: and consist of any non-whitespace character, where a slash
/ denotes a namespaced keyword, and a double colon
:: is expanded by the reader to a namespaced keyword in the current namespace, or another namespace if the keyword started by a namespace alias as in
Keywords are "named" values which are values that have intrinsic names that can be accessed using the
name function, and the namespace can be accessed with the
As in Ruby, keywords are often used for indexing hashes. The following defines a hashmap with a
:city key and then accesses the value for the
:city key. Keywords can be used in the function position because they are functions that look themselves up in collections passed to them.
Symbols are identifiers that evaluate to the values they name. For example, in the following code,
average is a symbol referring to the function held in the var named
average. Symbols containing a slash
/ denote a namespaced symbol which evaluates to the named value in the specified namespace.
The variables can be referred to directly by prefixing a symbol with
Numeric literals exist for a variety of number types. Custom numerical bases can be used with the
#r prefix where
# would be the desired number base.
|42, 0xff, 2r101, 040||long|
Strings prefixed with a hash
# are regex literals which yield java.util.regex.Pattern instances.
Single-line comments are started with a semicolon
;. There are also form-level comments prefixed by the
#_ reader macro which cue the reader to ignore the next Clojure form following the macro. This is particularly useful when wanting to comment out blocks of code. The
comment macro can also be used to comment out code but they always evaluate to
nil, which may be unexpected.
Commas are considered whitespace by the reader. Whether to use them or not is a question of style, but they're generally used when multiple pairs of values appear on the same line.
There are literals for lists, vectors, maps, and sets. Note that since lists denote calls in Clojure, it's necessary to quote them to prevent their evaluation as a call.
Record literals consist of the pound sign
# followed by the record name and a map literal containing the field names as keyword keys with their associated values. See records.
Tagged literals are custom data readers. On startup, Clojure looks for files named data_readers.clj at the root of the classpath which must contain a map of symbols mapping tags---to be recognized by the Clojure reader---to the name of fully-qualified Vars that are invoked by the reader to parse the form following the tag.
#'my.project.foo/bar is invoked by the reader on vector
[1 2 3] after it has been read as a normal Clojure data structure by the reader. That is, the reader will parse the form after the tag as a data structure, then invoke the tagged literal function on the data structure itself. The tagged literal function should then return a value that replaces the tagged data structure in the final result 1.
Vars are defined using the
def special form which takes the symbol used to refer to the var and the value to store in that var. When the symbol is used on its own to access the var's value, the symbol is said to be unqualified and so it is resolved within the current namespace. Vars can also be redefined by supplying the same symbol with a different value to the
Contrary to unqualified symbols, symbols can be namespace-qualified so that they are resolved within the specified namespace. For example, if we create a new namespace
foo, we can continue to refer to the symbol
x by qualifying the namespace. However, if we attempt to access the unqualified symbol, it will try to find
x within the
foo namespace, which doesn't exist:
in-ns function can be used to switch to another namespace, creating it if it doesn't already exist. Symbols from other namespace will have to be fully qualified in order to access. The
refer function can add mappings to all of another namespace's vars into the current namespace.
refer function also takes optional keyword arguments
:exclude which can be used to exclude specific vars from the namespace,
:only which specifies which mappings should be available unqualified, and
:rename which can alias vars from the namespace to appear another way in the current namespace.
require function can ensure that a namespace is loaded and can optionally establish aliases for the name of the namespace by passing a libspec, which is simply a vector containing the namespace, the
:as keyword, and the alias to use.
If multiple namespaces are to be required and they share a common prefix, a list can be provided where the first element is the common prefix and the remaining elements are the remaining segments for the specific namespaces to load.
use function is similar to
require, except that it
refers the specified namespace after it's loaded.
The aforementioned functions create mappings from symbols to vars, but namespaces can also contain Java classes and interfaces. Mappings to these can be created with the
import function, which essentially makes the class' short name available for use. This function also supports the common-prefixed collection that
There is no equivalent of the Java wildcard import. Inner classes such as java.util.Map.Entry can be referred to by the Java-internal notation e.g. java.util.Map$Entry.
The aforementioned namespace functions are mainly useful in a REPL. In actual source files, the
ns macro should be used, which combines each of the aforementioned functionality by allowing one to specify what needs to be required, refered, used, and imported for the namespace to load and work properly.
:refer keyword can be used in a
:require form similar to
:only, to allow the use of the provided symbols in an unqualified manner. In fact, the
:use form can be obviated entirely by passing
:refer instead of a vector of symbols.
:refer-clojure form is a synonym for
refering to the clojure.core namespace, which is particularly useful for the purpose of
:excludeing symbols to avoid collisions.
The following are equivalent.
Note that if a namespace contains dashes, the filename should contain underscores in their place.
There are functions
ns-interns which can yield a mapping of symbols pertaining to the respective category.
ns-unalias functions can be used to remove mappings to certain symbols and remove aliases for certain namespaces. The
remove-ns function can be used to drop a namespace from the namespace map, making the code under that namespace eligible for garbage collection if it's not referred from anyplace else.
Special forms are Clojure's primitives of computation upon which the rest of Clojure is built.
The special form
quote suppresses evaluation of a Clojure expression. For example symbols evaluate to the value of the var they represent, but with
quote that evaluation is suppressed, so they evaluate to themselves like strings and numbers do. The quote character
' is reader syntax for
quote. In fact,
quote can be used on reader sugars to determine how they're actually represented.
The special form
do evaluates all of the expressions provided to it in order and yields the last expression's value as its value. Many other forms such as
defn wrap their body in an implicit
do expression so that multiple inner expressions are evaluated.
The special form
def defines or redefines a var with an optional value within the current namespace. Other forms implicitly create or redefine vars and are usually prefixed with
def such as
It's possible to refer to vars instead of the values that they hold by using the special form
var. There's also a shorthand for this with
The special form
let allows lexically scoped named references to be defined.
let form also allows destructuring similar to pattern-matching in languages like Haskell, Rust, Scala, and Erlang. For example, to destructure a sequence, specifically a vector, we simply pass it a list of symbols that will take on the appropriate values. Destructuring can also be nested, as in other languages. The ampersand
& can be used to specify that the following symbol should take on the remaining sequence of values. The
:as keyword can be used to bind the collection to a value, similar to what
@ does in Haskell, Scala, and Rust.
Maps can also be destructured in a similar manner. This works with Clojure's
array-map, records, collections implementing java.util.Map, and values supported by the
get function such as Clojure vectors, strings, and array can be keyed by their indices.
:as keyword can be used to bind the collection. The
:or keyword can be used to provide a defaults map which will be consulted if the destructured keys aren't present.
Often times it may be desirable to destructure a map such that the symbols are named after the keys of the map, but doing this explicitly can get repetitive, which is why the options
:syms can be used.
It's also possible to destructure vectors which themselves contain key-value pairs. This can be done explicitly by binding the key-value pairs with
&, converting that to a
hash-map, and then destructuring that, however, it's also possible with regular destructure syntax. This is specifically made possible by
let by allowing the destructuring of rest sequences if they have an even number of values, i.e. key-value pairs.
The special form
fn is used to create functions. A function defined this way has no name, and so cannot be referred to later on. It can be place inside a var using the
def form. The
fn form also takes an optional name by which the function can reference itself. Furthermore, a function can have multiple arities, that is, define different bodies depending on the number of arguments passed.
defn form encapsulates the functionality of
The special form
letfn can be used to define multiple functions at once that are aware of each other. This is useful for definining mutually recursive functions.
Variadic functions are possible using the rest arguments syntax from destructuring.
It's also possible to use keyword arguments in functions, which is facilitated through map destructuring of rest sequences.
Function literals have specific, concise syntax by being prepended with
#. Placeholder arguments are prepended with
%, though the first argument can be referred to by a single
%. Function literals don't contain an implicit
do form, so multiple statements require an explicit
do form. It's also possible to specify variadic functions by assigning the rest of the arguments to
Note: Function literals cannot be nested.
The special form
if is the single primitive conditional operator in Clojure. If no else-expression is provided it is assumed to be
nil. There are other conditionals based on this form that are more convenient in specific situations.
The special form
recur transfers control to the local-most
loop or function, allowing recursion without consuming stack space and thereby overflowing the stack. The
loop special form takes a vector of binding names and initial values. The final expression is taken as the value of the form itself. The
recur special form is considered very low-level that is usually unnecessary, instead opting for
for, and so on.
The special forms
new exist for Java interoperability. Their use is somewhat unnatural in Clojure, however, and so there are sugared forms which are idiomatic.
There are also special forms for exception handling and throwing. There are also lock primitives to synchronize on the monitor associated with every Java object, but this is usually unnecessary as there's macro
locking that is better suited.
set! special form can be used to perform in-place mutation of state, which is useful for setting thread-local values, Java fields, or mutable fields.
eval form evaluates its single argument form, which is useful when used with
' to suppress evaluation of the argument until it's evaluated by
eval. With this final form, it's possible to reimplement a simple REPL.
apply function can be used to apply a function to arguments, where the last argument can be a collection of arguments to apply, but the complete set of arguments must be passed. The
partial function allows partial application by providing only a subset of the arguments, yielding a function that can be used further. The
comp function can be used to compose other functions 2.
->> macros can be used in place of explicitly using
-> macro inserts its first argument as the first argument to the first form, which is very useful when the item is, for example, a Java object on which we're calling methods. The
->> macro on the other hand inserts its first argument as the last argument in the first form, useful for non-object items. This makes it unnecessary to create partial functions, instead simply requiring function calls without the last argument applied:
Functions can be memoized using the
There are a set of core collection functions which allow data structures to participate in the common collection abstraction. These consist of the following functions:
||yield empty instance|
conj function has the guarantee that it adds values efficiently, so that it prepends items to lists instead of appending them 3.
Sequences are an abstraction to obtain and traverse sequential views over arbitrary values, such as in a collection. Sequences provide a base set of operations:
||yields first item|
||yields items excluding first|
||consumes current item|
||yields lazy sequence|
The important distinction between
rest is that
nil when the sequence is empty, whereas
rest continues to yield an empty sequence.
Sequences can be constructed using either
cons is similar to the Haskell
conj, which may prepend items in order to satisfy the efficiency guarantee,
cons always prepends items. The
list* function is a helper which can take any number of head values, followed by a sequence, so that the following two are equivalent:
Sequence contents can be evaluated lazily, where the process of accessing a lazy sequence is referred to as realization 4, such that when all values of a lazy sequence have been computed, it is considered to be fully realized. An important characteristic of the
list* functions is that they don't force the evaluation of lazy sequence arguments.
repeatedly function constructs infinite lazy sequences by invoking a given function, so that the above could be simply expressed as:
All of the core sequence-processing functions return lazy sequences, such as
drop, so that lazy sequences can be layered without forcing underlying lazy sequences.
next function is able to return
nil instead of an empty sequence because it checks to see if it's empty, and that emptiness check forces the head of the tail sequence 5. Because sequential destructuring uses
next and not
rest, it's important to note that destructuring a lazy sequence always realizes the tail's head value.
Laziness in Clojure is mostly used to transparently process big datasets that otherwise wouldn't fit entirely in memory, in order to express algorithms in a more straightforward manner. In this sense, sequences should be viewed as an ephemeral medium of computation, not as collections. In fact, evaluation of some lazy sequences is batched as a performance optimization, which means that some items' side effects can be run ahead of consumption by some amount.
It's possible to force the realization of an entire list with functions
doall retains the contents of the sequence and
dorun disposes of them 6.
When a reference is held to a sequence, which is done by keeping a reference to the head of a sequence, it implies that the program is interested in that sequence's items. As a result, this prevents garbage collection from occurring on the items of the sequence, which is known as head retention.
split-with function 7 in the following scenario where a very long sequence is split into a very short prefix and a very long suffix and then both are counted. Lazy processing (counting) of the much longer suffix is negated (i.e. not lazy) due to the fact that a reference to the head of the larger sequence (of which the suffix is a part) is held as
t, causing any potentially lazily processed items in the sequence to be retained anyways.
If instead the short prefix were counted first so that the reference to the head of the full sequence were no longer necessary, there would be no reason to retain the elements of the entire sequence (including the longer suffix) and so it could process (count) the suffix lazily.
Inserting into a map or set, or using the
count functions on a sequence are common causes of potentially premature realization.
Like sequences, associative is an abstraction shared by data structures that associate keys with values, the most common one being a map. These consist of the following functions:
||establish new association|
||get value for given key|
||check if association exists|
get function returns
nil if the key doesn't exist in the association. This poses a problem because it may be intentional to associate
nil with a key, making it ambiguous as to whether or not the association exists. The solution is to use
find, which always returns the entire association as a tuple, which in Clojure is expressed as a vector.
get-in function can yield a value in a nested associative structure given a vector path to the value. Like
get, it can accept a default value to return if the path is not found.
dissoc functions can be used on multiple sets of values and list of keys, respectively.
The indexed abstraction works on things that can be numerically indexed, such as vectors, lists, sequences, Java arrays and lists, strings, and regular expression matchers.
assoc functions can also be used on vectors, where the indices would serve as the keys. There also exists the function
nth which works the same way except that
nth returns an exception when an index is out of bounds. However, both can be provided a default return value, in which case their semantics are identical. The
get function is more resilient than
nth since it returns
nil when the subject of the lookup is unsupported.
Clojure doesn't have a distinct stack type, but it supports stack operations on lists and vectors through the following functions:
||peek top value|
Sets can be added to with
conj and can be tested for membership with
get, acting like associations of items with themselves. There's also a
disj which can be used to remove items from a set.
The sorted abstraction guarantees that items in the collection will be maintained in a stable ordering. This abstraction supports maps and sets with following functions:
||reverse sequence in constant time|
||return sequence of values within a certain range|
Sorted maps and sets can be created with
sorted-set, or with the
sorted-set-by functions which accept a predicate or comparator which defines the sort order.
Comparators are functions that return an integer of value
1 depending on if the first argument is less than, equal to, or greater than the second argument. Any two-argument predicate can automatically be converted to a comparator using simple logic by treating the predicate as if it were
- if the predicate returns true, return
- otherwise, flip the arguments
- if true, return
- else, return
- if true, return
This may be a possible implementation:
It's important to note that sort order, i.e. the comparator, defines the equality operation to use within a sorted collection. This can be fixed by delegating equality to a more general comparison function such as
compare when the comparator results in
Collections can also be used as functions that can be used to access values within a collection. For maps, an optional default return value can be provided, as with
Similarly, keywords and symbols are also functions that look themselves up in the provided collection. This is the recommended, idiomatic form of accessing values in a collection, as it helps avoid null pointer exceptions. This also makes them amenable for use as higher-order functions.
For example, the
some function returns the first item in a sequence that satisfies a given predicate. This can be used with a set to find the first item in a sequence that is contained in the set.
However, this can pose a problem when the set in question contains either
nil, since sets return their own value if they're present in the set, and those two values are logically false. In these situations, it would be more appropriate to use
It's important to remember that list literals are the form in which code is specified in Clojure. To define an actual list without evaluating it as a function call, it's necessary to quote it so that it's not evaluated. This has the consequence of not evaluating any of the members, which may not be the intention. For this reason, the
list function exists which evaluates each argument and uses the result as an element in the list.
Vectors can be constructed using the
vector function, which is similar to
list, or the
vec function which accepts a sequence. Vectors are the type used to represent tuples.
Sets can be created from any sequence using the
set function, similar to
Transient collections are mutable and, as a result, any reference to an old version of a transient collection may no longer be valid. Only vectors, unsorted maps, and sets can be transients. Transient collections are usually used within functions, such as those in the core, for purposes of efficiency, and are typically publicly exposed as immutable, persistent collections.
Transients have their own collection manipulation functions that end in a
! to signify that they are destructive. Using any of these functions on a transient renders that reference to the transient invalid. Instead, the result of these calls should be used going forward, just as with the persistent equivalents.
transient function can turn any persistent collection into a transient one, and the
persistent! function can turn a transient back into a persistent. Converting a transient into a persistent makes the transient unusable.
Consider this possible implementation of
Transients have a concurrency safeguard to ensure that only the thread that created it can use or modify it. Transients don't have value semantics due to being mutable.
Metadata is expressed as a map and can be attached to data structures, sequences, records, symbols, or reference types. There is reader syntax for attaching metadata to a value literal. Metadata that contains only keyword keys and whose value is
true can be provided in shorter form, and can stack onto the next metadata values.
with-meta function yields an object of the same type and object as a given object, but with the provided metadata. The
vary-meta function does the same thing but instead takes a function that modifies the original object's metadata to use its result as its metadata.
Zippers are a stack of all nodes traversed, thereby serving as a cursor into a data structure 9. The clojure.zip namespace provides a
zipper factory and specialized zipper instances
seq-zip for nested sequences,
vector-zip for nested vectors, and
xml-zip for clojure.xml.
Zippers can be moved with operations
right to move along siblings,
next which are depth-first traversals, and
rightmost to move to the first or last sibling.
root function can be used to respectively yield the current node, whether it's a branch, the child nodes, all left or right siblings of the current nodes. When operating on trees, the
root function is used to yield an updated tree which reflects all of the modifications made by the zipper since its creation.
Zippers can manipulate the structure they're representing with the
remove function to remove the current node,
replace it with another node,
insert a child node at the front, or
append it to the back. The
edit function takes a function and extra arguments which replaces the current node with the result of applying the current node and the extra arguments to that function. The
make-node function can also be used to create a new node, but it won't be added unless one of the above functions are used.
Custom zippers can be created with the
zipper function which takes three functions:
- predicate that determines if a node can have children
- function that yields a sequence of a given branch's children
- function that returns a new branch node given an existing node and a sequence of children
Consider creating a zipper for HTML elements represented as vectors and maps.
A helper function can be created that operates on a zipper, such as one to wrap a DOM node in another tag.
It's possible to use delays, futures, and promises to control when and how computations are performed.
delay function suspends some body of code such that it's evaluation is delayed until it's explicitly dereferenced with
deref. There exists convenient reader syntax for dereferencing as
@, which is in fact almost always preferred, except for when
deref is used as a higher-order function or when wanting to set a timeout. Many Clojure types are dereferenceable including delays, futures, promises, atoms, refs, agents, and vars.
The difference between delays and regular functions is that delays evaluate their body only once and from then on cache the return value so that subsequent dereferences are instant. The relation to this and concurrency is that multiple threads can safely attempt to dereference a delay and all of them will block until the delay is evaluated.
realized? function can check if a delay has been materialized yet. This function can also be used with futures, promises, and lazy sequences.
A future evaluates a body of code in another thread. A future returns immediately, allowing the current thread to continue execution. The future can be dereferenced to access the return value of the future, which will naturally block if the separate thread isn't finished computing. An upper-bound can be placed on the amount of time that is spent blocking by providing a timeout and a timeout value.
Promises are like delays and futures, except that they are not created with any code that will eventually define their value. Instead, they're initially an empty container which can be filled at a later point in time using the
deliver function. Like delays and futures, dereferencing a promise will block until there's a value to provide; timeouts can be specified 10.
Watches are functions that are called whenever the state of a reference is changed, though not necessarily to a different value. For that reason, it's common for watch functions to check that the new value is different before proceeding. Watches take four arguments: a key, the reference, its old state, and its new state. The watch function can be registered on a reference with the
add-watch function which takes the reference, the key to associate with the watch, and the function. The key associated with the watch can also be used to remove the watch later on.
It's important to note that the watch functions are called synchronously on the same thread that caused the state change, such that by the time the watch function is called, the value could have been changed again by another thread. For this reason it's important to rely only on the
new values passed to the watch function instead of dereferencing the reference.
Validators are essentially constraints on reference state. It consists of a single argument function which takes the proposed new state, and if the result is logically false or throws an exception, the change is aborted with an exception.
A validator can be attached at the time of creation by passing a key-value pair to
agent consisting of the key
:validator and the function to use. Alternatively, it's possible to add a validator to a var, or change a validator associated with an atom, ref, or agent using the
The default thrown exception can be made more descriptive by assuring that the validator we use throws its own exception with a more descriptive string.
pmap function can be used to perform a
map in parallel, evenly spread across available computational cores. However, as usual, the parallelization overhead must be taken into account with respect to the work being done, so that the overhead doesn't outweigh the computation itself. In the event that it does, however, a common workaround is to chunk the dataset so that each unit of work is larger.
Two other constructs are built on top of
pcalls function evaluates a variable number of zero-arity functions, yielding a lazy sequence of their return values, while the
pvalues function does the same but for a variable number of expressions.
References are essentially boxes that hold a value which can be changed by certain functions, depending on the reference type. The value "within" can be accessed using the
deref function, which will never block for reference types. The four reference types are
atom. Each of these reference types are intended for use in different concurrent operations.
Reference types can be decorated with metadata, although the metadata can only be changed using the
after-meta! function which modifies it in-place. The metadata for
agent references can be specified using an optional
:meta keyword argument. Reference types can notify specified functions when their state changes, where the functions are known as watches. Changes to the state that reference types hold can be validated with validator functions, potentially aborting non-conforming change operations.
Atoms are the most basic reference type which implement synchronous, uncoordinated, atomic compare-and-set modifications. Operations that modify the state of atoms block until the modification is complete, i.e. the modifications are atomic.
atom function is used to create an atom, and the
swap! function is the most common modification operation used on them, which replaces the value of the atom with the result of applying some function and additional arguments to its present value, returning the new value.
Since atoms use compare-and-set semantics, if the value of the atom changes (i.e. by another thread) before the new value is finished computing, it will abort the update and re-attempt based off of the new value, doing so until the original value remains the same when the new value is computed.
Since atoms are synchronous reference types, functions that change their values don't return until they have completed.
A more direct function
compare-and-set! can be used if we already know the value that the atom currently holds. The function returns a boolean indicating whether the value was indeed changed, which happens only if the value provided is indeed what's currently in the atom.
Note that this function doesn't use value semantics when comparing the provided value to the value inside the atom. Instead, the value must be
We can also completely overwrite the contained value using the
reset! function, regardless of what its current value may be.
Refs are the coordinate reference type which ensure that multiple identities can be used with concurrent operations with:
- no possibility of being in an observable inconsistent state
- no possibility of race conditions or deadlocks
- no manual use of locks, monitors, or other synchronization primitives
See the section on Software Transactional Memory for more information.
Vars are generally used implicitly in Clojure. Top-level functions and values are stored in vars, which are defined in the current namespace using the special form
def or its derivatives, which also copies the metadata provided. Recall that vars can be created with reader syntax
#', but they can also be explicitly created using the
Words are pale shadows of forgotten names. As names have power, words have power. Words can light fires in the minds of men. Words can wring tears from the hardest hearts. There are seven words that will break a strong man's will. But a word is nothing but a painting of a fire. A name is the fire itself.
The Name of the Wind by Patrick Rothfuss
Evaluating a symbol involves looking for a var with the name of the symbol in the current namespace and dereferencing it to obtain its value. The above passage reminds me of this relationship, where words are to names as symbols are to vars.
Private vars can only be referred to using the fully qualified name when in another namespace and their value can only be manually dereferenced. They're created by providing the
:private metadata key as true. The
defn- form creates private functions.
Docstrings are string literals that serve as documentation for top-level vars, immediately following the symbol name. The
def form actually takes the docstring as an optional argument and sets the
:doc metadata key to its value.
Constant values can be declared with the
:const metadata keyword. Constants aren't resolved at runtime, instead the value held by the var is retained permanently at compile time 11. Later redefinitions of the var have no effect.
Although for the most part Clojure is lexically scoped, vars have a feature known as dynamic scope. Vars have a root binding, which is the value bound to the var when it's defined with
def or some derivative.
However, if the var is defined to be dynamic, with the
:dynamic metadata keyword, then the root binding can be overridden and shadowed on a per-thread basis with the
binding form. The naming convention for dynamic vars is to surround the names with asterisks
Dynamic bindings can stack by nesting
binding calls, each inner scope shadowing the outer one. As a result,
get-*var* will evaluate to
:c and not the outer scope bindings.
Dynamic scope is idiomatically used to specify optional return values. This is possible using the
thread-bound? function to check if the current thread has established a thread-local binding on the var. The
set! function can then be used to change the value of the current thread-local binding.
Binding conveyance is the propagation of dynamic var bindings across threads, and it's possible with agents, futures, and
pmap and its derivatives. In the following code,
future propagates the dynamic scope across to the other thread for the duration of its operation.
However, lazy sequences in general do not support binding conveyance. To mitigate this, it's necessary to push the dynamic scope required for each step in the lazy sequence into the code that's values are realized.
Although generally discouraged, the root binding of vars can be mutated as a function of its current value using the
alter-var-root function. Other functions like
with-redefs can change the root binding within their scope, which can be useful for testing, such as mocking out functions or values.
Forward declarations are possible by not providing a value when defining a var. Although the
def form can be used for this purpose, the
declare macro is more idiomatic since it makes the intention more explicit, and allows a variable number of unbound vars to be declared.
Changes to an agent's state are independent of changes to other agents' states, and all changes are made away from the thread of execution that scheduled them. I/O and other side-effecting functions may be safely used with agents. Agents are STM-aware, such that they can safely be used in the context of retrying transactions.
The modification operations for agents are
send-off. The function and optional arguments passed to these functions are called an agent action. Each agent maintains a queue of actions. Invoking a modification function returns immediately after queueing the specified action, and each action is evaluated serially, in order, on one of many threads dedicated to the evaluation of agent actions.
send function sends actions to be evaluated within a fixed-size thread pool, and due to being fixed in size, it's important to not give it actions that may perform I/O or other blocking operations, as it would prevent other non-blocking, CPU-bound actions from fully utilizing the resource.
On the other hand, actions queued with
send-off are evaluated in an unbounded thread pool (same one used by futures), allowing any number of potentially blocking actions to be evaluated concurrently.
It's possible to block on the completion of all actions from a given set of actors sent from the current thread using
await. It's important to realize that by the time an agent is dereferenced, it could've been modified by another action. The
await-for function allows one to specify a timeout.
Because actions are evaluated in a separate thread, exceptions thrown by them can't be dealt with by the same thread that dispatched the action. As a result, an error causes the agent to fail silently.
The agent can be salvaged by using
restart-agent to reset the agent's state to the provided value, enabling it to receive actions again.
restart-agent function also accepts
:clear-actions which would cause it to clear up any pending actions, otherwise they immediately begin evaluating once again.
The default behavior for agents is for them to require intervention when an action fails. An alternative is to have the agent ignore a failed action and continue with the evaluation of the next action in the queue. This can be specified at creation time with the
agent function and the
:error-mode option with a value of
:continue, where the default is
It's also possible to specify an error handler which takes as argument the agent in question and the error using the
:error-handler option with the
The error mode and handler can be changed dynamically with the
Agent actions queued within Software Transactional Memory transactions aren't actually added to the queue until the transaction is actually committed. This allows agents to be used within transactions regardless of the number of times the transaction is retried. The same thing applies to actions queued within the scope of other agent actions.
If a validator or something else causes the transaction to fail, then the held off queued actions will be discarded.
*agent* var is one that's usually unbound except for within the scope of an evaluating action, where it's bound to the current agent, the owner of that action.
Software Transactional Memory (STM) transactions ensure that changes to refs are made atomically, consistently (transaction fails if changes don't satisfy ref constraints), and in isolation (transaction changes don't affect states from within transactions in other threads).
dosync function establishes the scope of a transaction. Nested transaction scopes are joined into a single logical transaction. All modifications of refs must occur within a transaction. Transactions are synchronous, blocking the current thread until the transaction is complete.
alter function can be used to modify the value of a ref, much like
alter returns, the in-transaction value of the ref is changed to what it was set to, allowing further modifications within the same transaction to work off of the previous modifications.
When the transaction is about to be committed (i.e. completed), the value of the ref outside of the transaction must be the same as it was prior to the first in-transaction use of
alter on that ref. If this is not the case, then the transaction is restarted from the beginning with the new value of the involved refs.
There are instances in which the order in which a function is applied to a ref is not important, however. In these situations, the behavior in which transactions are retried would waste a lot of time and unnecessary work trying to keep the values up to date. The
commute function can be used for when the order of application doesn't matter.
A good example of this is incrementing a counter. Using
alter in this situation would lead to many retried transactions, even though what ultimately matters is the end result: the final count.
It's important to realize that the in-transaction value of the ref will be the one altered by the function application passed to
commute, but the actual value of the ref once the transaction is committed will be based off of the latest value of that ref. For this reason, care needs to be taken to ensure that the transaction doesn't rely on the value of the ref.
ref-state function can set the in-transaction value of the ref to the provided value, usually used to set ref states to initial values. Like
alter, it will cause a retry of the transaction if the ref's state changes prior to commit-time. It's essentially
alter with a function that returns a constant value.
Validators can be used in transactions, since they throw exceptions when the validation fails, and any exception thrown within a transaction causes that transaction to fail.
io! function makes explicit side-effecting code. If it's evaluated within a transaction, it throws an exception, aborting the transaction. This is useful because the retry-behavior of transactions makes it possible that side-effecting code is run multiple times, which is often not the intention. Operations on atoms should also be considered side-effecting, as they don't participate in STM semantics.
It's also very important to ensure that values held in refs are immutable, or unintended effects may occur.
A live lock is the equivalent of a deadlock in STM, which occurs when a transaction never gets the chance to commit due to ref contention. One fallback that Clojure uses is called barging, where an older transaction is allowed to proceed in certain circumstances, forcing newer transactions to retry. If barging fails to push through the older transaction within a certain time frame, Clojure's STM will cause the transaction to fail, yielding a stack trace via a thrown exception. There is also a limit to the amount of times a transaction can be retried, which if exceeded, causes the transaction to fail.
In general, dereferencing ref types is guaranteed to never block, but inside of a transaction, dereferencing ref types may trigger a transaction retry. This may seem confusing, since a single read would always be a consistent snapshot of a ref.
However, consider the scenario where two refs are being dereferenced for the purpose of adding them and returning their sum 12. If during the course of this transaction, one of the ref's state is modified, the computation would be inconsistent because the two refs are from different points in time. This is why it's logical to retry the entire transaction.
This situation can be mitigated by maintaining a history of the ref states. If this were done, then the transaction could simply look for the state of the ref from the point in time that the transaction began.
With this scheme, each ref has a minimum and maximum history size, which can be specified as
:max-history key-value pairs to
ref. With the default minimum of 0, a transaction retry would increase the history size by one so that two states would be tracked throughout the duration of the transaction, and again increment it on the next retry, and so on.
In the worst case, if the max history size isn't large enough to track the target state of the ref throughout the duration of the transaction, then the transaction will continually retry until it can complete or until it reaches the retry limit, in which case it fails.
Consider the following code where transactions continuously increment a ref every 20 milliseconds for a total of 500 times within a future, followed by a transaction on the same thread that tries to read the ref after one second.
In the default case, the minimum history size is 0 and the maximum size is 10. This maximum history size is never enough to fully track the value that
a was at at the time the transaction was started, since by the time that it wants to read
a, it would have changed 50 times. As a result, it has to keep retrying, effectively waiting until the writer transactions finish, by which time it'll then re-read the value of
a as 500, which is the end result.
If the minimum history size is set to 50 from the beginning, then Clojure won't have to wait to incrementally grow the history by one on each retry. As a result, the transaction will be able to keep track of the state of the ref as it was at the beginning of the transaction, which was 0. That state will be at the very end of the history list.
If we set the maximum size just under the required size to 48, then it'll retry twice at which time the size (50) will be sufficient to keep track of the state. By this time, twice 50 increments will have occurred, so the value at the beginning of the third transaction run will be around 100.
If we cap the maximum size under what is required, then it is as in the first case, where the history list will not be able to grow to the required capacity, and as a result the transaction will continue to retry until it can successfully complete, which should be once all writer transactions are complete.
Write skew refers to the situation in which a transaction doesn't modify a ref, but the consistency of the transaction's changes depend on the state of the ref that was read, and this state changes mid-transaction, causing the transaction's effects on other refs to end up being inconsistent with the read ref.
ensure function can be used to mitigate this, as it dereferences a ref such that the read will conflict with any modifications prompted by other transactions until the transaction is complete, causing them to retry as necessary.
ensure function is equivalent to dummy writes, which end up requiring that the read value persists until the transaction is committed, with the advantage that
ensure generally minimizes the number of transaction retries involving read-only refs.
The core.async library provides an implementation of channels similar to Go's.
The primary channel operations within go blocks are
>! for putting and
<! for taking. There are also analogous operations for ordinary threads which block the thread on which they're called until complete
->!! for putting and
<!! for taking. These can be used on threads created with, for example,
future, but there's also a macro called
thread that's similar to
go which launches a first-class thread and returns a channel, and should be preferred over
future for channel work.
alts! function can be used in
go blocks to wait for any one of a set of operations to complete, much like
select. Timeouts are channels that close after a period of time, and can be created with the
timeout function and included in the set given to an
alt variant to place an upper bound on the amount of time spent waiting for an operation to complete.
Given the persistent, immutable nature of data structures in Clojure, it's safe and efficient to place them in channels.
Broadcast channels can be created by creating a mult channel from the source channel with the
mult function. Other channels can tap into the mult channel with the
tap function, which copies the mult channel into an existing channel (i.e. mutates the channel). Data is sent through the source channel as usual. To stop listening into the mult channel, the
untap channel is used with the mult channel and the end channel as arguments.
Code in Clojure is represented as data structures, as was described in homoiconicity. These structures are then evaluated depending on the data type's rules, such as most literals evaluating to themselves (integers, strings, keywords, etc.), symbols evaluating to the value in the var in some namespace, and lists to calls of functions, special forms, or macros.
Whereas function calls are provided their arguments already evaluated, macros are called by the compiler at compile-time with their arguments unevaluated and must return some Clojure data structure that itself can be evaluated. This data structure itself could contain other macro calls, which are recursively expanded until there are no macro calls. This is known as macroexpansion.
Macros are expanded at compile time so the compiler can catch compile-time errors.
It's easy to refer to vars within macros that aren't yet defined, and this would produce an error when the macro is actually used. This is because macros are expanded at compile time, at which time Clojure can't know if the symbol will refer to a var that has a defined value at runtime.
macroexpand-1 function can be used to debug macros as it allows one to see what code the compiler would expand the macro to. The 1 at the end of the function name refers to the fact that it only expands the top-level macro once.
macroexpand-1 function can expand to yet another top-level macro, for this the
macroexpand function exists, since it expands the top-level macro until it's no longer a macro.
Neither of the above functions expand nested macros. For this, the
clojure.walk/macroexpand-all function can be used.
It's common to want to return lists to represent further calls to functions, special forms, or macros. The
list function can be used with each symbol quoted to avoid its evaluation, but that's too cumbersome.
The first useful syntax is the backquote
`, also known as syntax-quote. The backquote is similar to the regular
quote function or syntax
', except that it fully qualifies unqualified symbols with the current namespace. The syntax-quote also handles qualification of the corresponding namespace in the event of namespace-aliasing. This qualification behavior is critical to ensure that the macro doesn't expand to code that inadvertently refers to or redefines already-named values, a safety practice known as macro hygiene.
Another advantage of syntax-quoting is that specific elements of the list can be unquoted, causing them to be evaluated within the scope of the syntax-quoted form. This can be done with the tilde
~ symbol prefix.
The unquote-splicing operator
~@ can be used to unpack the contents of one list into another.
Using the unquote-splicing operator is idiomatic for macros that accept code bodies, such as the following.
gensym function returns a symbol that is guaranteed to be unique, and thus won't collide with the environment surrounding the macro. It can take a string to prefix the symbol name with. Since creating gensym macros is very common, there is shorthand syntax for them with the pound sign
# suffix, which will automatically expand into a gensym, each time into the same one as long as it's within the same syntax-quoted form. This is known as an auto-gensym.
It's important to emphasize that the auto-gensym only evaluates to the same symbol as long as it's within the same syntax-quoted form. If auto-gensyms need to span across syntax-quoted forms, then an explicit gensym is required.
A macro that deliberately exposes a name to the caller of the macro is anaphoric. Anaphoric macros are generally discouraged as it requires the user of the macro to remember the exposed names within their scope. Instead, it's idiomatic to let users choose the names to use.
Double expansion is a common problem where a macro argument appears more than once in the expansion, causing multiple evaluations which can lead to unexpected results if they contain side-effects or decreased performance if the evaluation is computationally expensive.
The idiomatic solution to this problem is to introduce a local binding to which the evaluation of the expression is centralized and by which it is referred to as henceforth.
defmacro macro introduces the two implicit local bindings
&env binding contains a map where the keys are the names of all of the current locals, though the values are unspecified. Since the keys don't have a namespace, their values can't be accessed outside of the macro. However, within the macro, the keys and their values can be accessed using typical expansion syntax.
Since macros are currently implemented as functions that take two extra arguments at the front,
&env, it's possible to access the macro's var and dereference it to use its implementing function directly. This is possible with the
#' syntax to refer to a var specifically (instead of the value the symbol represents) and
@ to then dereference that var.
&form binding holds the whole form that is being macro-expanded, which consists of a list containing the name of the macro as a symbol and the arguments to the macro. As a result,
&form contains all of the metadata specified by the user or added by the reader.
One instance where having access to this metadata could be useful is to produce more accurate errors, with "correct" line numbers, specifically the line number of the macro's usage and not its location within the macro's expanded code.
It's important to note that by default, defining a macro does not preserve the its arguments' metadata. To preserve it, the arguments should be re-bound by creating a symbol with
gensym and using the metadata of
&form and setting it to the gensym with
with-meta, then binding the argument to that gensym.
Protocols are the corollary to interfaces in other languages 13 and they consist of a variable number of methods each with a variable number of arities, where each has an arity of at least 1 which corresponds to
this in Java, which is implicit and so can be given any name. By convention, protocol names are written in camel-case to make them easy to distinguish from other Clojure entities.
Note that the "rest" arguments destructuring syntax can't be used with Clojure protocols since they generate JVM interfaces which can't support all of the argument structure variations of Clojure functions.
Protocols can then be extended to existing types using
extend-protocol with the name of the protocol as the first argument and the type that will be extending it as the second. A variable number of symbols (type names) and corresponding lists (method implementations for the previous symbol) can be provided to
It's not necessary to implement every method in a protocol. Instead, calling an unimplemented method will simply throw an exception.
The converse to
extend-type which can be used to extend several protocols to one type.
It's possible extend a protocol to
nil, potentially doing away with many null pointer exceptions if a sane default exists for an operation.
Clojure types are Java classes that can be defined with
deftype. Records are meant for application-level data, while types are meant for low-level infrastructure types such as for new data structures. The differences between the two functions lie in the defaults that records provide for interoperating with the rest of Clojure and Java, and the capabilities of types to optimize low-level operations.
Creating an instance of a type is done by putting the type name in the function position with a period
. as suffix. Conversely, fields can be accessed with the field name in the function position with a period
. prefix and the instance as the second argument.
It's important to realize that types are not defined in namespaces, so they're not implicitly imported when using the namespace from another one, and so they must be explicitly qualified when importing.
It's also possible to provide type metadata for primitive and non-primitive fields, otherwise they are assumed to be java.lang.Object.
The enumeration of defined fields for a class defined by
defrecord is known as its basis and is available via a static method
/getBasis, and each symbol in this basis retains the metadata provided in its definition.
Record types are defined by
defrecord and are a specialization of types defined by
deftype, specifically, they support:
- value semantics
- associative collection abstraction
- metadata support
- reader support (serialization)
- convenience constructors for instances with metadata and auxiliary fields
Value semantics refers to being immutable and basing equality on the constituent fields, so that two records are equal if their fields have the same values.
Since they implement the associative collection abstraction, fields can be accessed by keywords named after the fields. A consequence of this is that new fields can be added with
assoc, which is implemented as map associated with the record. However, if a declared field is removed with
dissoc, the result is a map of the remaining fields and not the original record. Note, however, that unlike maps, records are not functions.
Record literals consist of the pound sign
# followed by the record name and a map literal containing the field names as keyword keys with their associated values. Thanks to this syntax, records can very easily be serialized and deserialized.
Records also have auxiliary constructors that take a penultimate argument consisting of the metadata to attach to the instance and a final argument consisting of extra fields to create.
defrecord implicitly create a factory function of the form
->MyType which takes fields values positionally. Records also create a
map->MyType factory function which takes a single map used for populating the new instance. These are both useful when used as higher-order functions.
deftype form is the lowest-level method of defining types in Clojure, which is why
defrecord is built on top of it. Types defined by
deftype allow for mutable fields. By contrast to
defrecord, immutable fields are accessible only via interop forms, where the field name is prefixed by a period
., and not via keyword as in the associative abstraction.
Fields can be specified via metadata to be volatile or unsynchronized, denoted respectively by
^:unsynchronized-mutable. Volatile refers to
volatile in Java, where reads and writes of the field are atomic and cannot be reordered by the JIT compiler or CPU. Unsynchronized fields are regular Java mutable fields.
Unlike immutable fields, mutable fields are always private and as such can only be accessed from within method bodies provided inline with the type definition. Mutable fields can be set using the
Protocols can be extended to existing types using the aforementioned
extend* family of functions or within the
defrecord functions, which would be known as an inline implementation.
Inline implementations correspond to Java interface implementations, which means that inline implementation methods have direct access to fields, whereas external implementations must use Java interop or keyword accessors, and method calls will be as fast as Java interface method calls.
However, this has a variety of consequences. For example, clashes can occur by implementing two protocols each containing methods with the same name and signature, which can occur with those interfaces for which
defrecord automatically provides implementations, or even by implementing protocols whose methods conflict with java.lang.Object methods.
Since inline implementations are baked into the type, those implementations can't be changed at runtime without redefining the entire type, necessitating the re-evaluation of all code that depends on the type. Worse, existing instances will not be updated after this redefinition and re-evaluation.
It's possible to create anonymous types with the
reify function, which evaluates to an instance of an unnamed type, which is useful for creating objects that satisfy a given protocol on-the-fly, much like anonymous inner classes in Java.
extend-protocol are based on
extend, which takes the type to extend as the first argument and then a variable number of protocols and implementation maps, which are maps consisting of method names as keywords to functions implementing those methods.
Since the implementation maps are just maps, they can be passed around and expanded upon.
Similarly, it's possible to simulate mixin behaviors. Consider a
Measurable protocol which defines methods for retrieving dimensions of things, and a type
Button that extends it. Given another implementation map
bordered, we want to create a
To create a
BorderedButton, we would need the implementation map of
Button. Protocols name vars that contain a map with relevant information, including an implementation map. We can then use the
merge-with function to merge the two implementation maps, combining the same methods by adding their results.
There are a variety of functions that can be used for protocol introspection. The
extenders function returns the classes that have been extended by the given protocol at the point in time in which the function is called, since classes can extend protocols at run-time. Similarly, the
extends? function tests if a type extends a given protocol. The
satisfies? function is like the
instance? corollary to
extends?, since it works on instances.
Protocols provided polymorphic dispatch by way of type-based single dispatch. Multimethods provide multiple dispatch, and even dispatch based on things other than argument type. Multimethods are created using
defmulti, which defines the multiple dispatch, and implementations are provided with
defmethod, which defines the methods to which the calls are dispatched.
Multimethods work by passing the arguments passed to the multimethod are passed to a dispatch function defined by
defmulti, which yields a dispatch value which is used to select the method to invoke for those arguments.
In the example below, the
:tag is is the dispatch value.
It's possible to define a base case implementation using the special dispatch value
It's also possible to specify what the default dispatch value should be by passing it as an option to
Hierarchies express relationships between dispatch values which can multimethods can use to refine the selection of implementation method. Relationships are defined with the
derive function, which accepts an optional hierarchy reference, and the remaining arguments express that the first derives from the second.
If the first optional argument---the hierarchy reference---is not provided to
derive, then it is assumed to encode that relationship in the default, global hierarchy. Otherwise, a separate hierarchy can be created with the
The relationships are defined to be between named objects (e.g. keywords or symbols) and classes, but classes (and interfaces) can only serve as leaves in a relationship. Since the global hierarchy is global, and thus shared, collisions are avoided by prohibiting non-namespaced keywords or symbols.
isa? function can be used to test a relationship. It also takes an optional first argument for the hierarchy reference, in case it's not the global one.
Multimethods can use independent hierarchies via a
:hierarchy option, and then use the relationships in method implementations to refer to categories of dispatch values.
The hierarchy can be modified dynamically with
alter-var-root, which can be used in a pattern where the default method implementation can extend the hierarchy to include new dispatch values which triggered that method.
isa? function treats vectors, specifically pairs, as special cases. As a result, dispatch values can be vectors consisting of a relationship. Since the Java class hierarchy is in every hierarchy, a pattern is to piggyback a Java class as added information. The implementation method below, for example, applies when the dispatch value is a
::checkable and the value in question is a Clojure set. This is possible because the dispatch function returns a pair of the dispatch value and the value's class, both encoded as a hierarchical relationship.
When a type implements multiple interfaces, multimethods won't know which implementation to pick unless one is defined as being preferred using the
prefer-method function, which takes as argument the multimethod in question and the two dispatch values, where the first is the one that is preferred over the second.
Multimethods support a variety of introspection methods including
get-method, which are pretty self-explanatory.
type function is similar to the
class function, except that if a
:type metadata is present, it returns its value instead of what
class would return.
The most popular build tool in the Java world is Maven. Maven's dependency management model consists of artifacts, which are files that are the product of the build process. Artifacts are identified by coordinates, which are a group of attributes that uniquely identify a particular version of an artifact, in the form groupId:artifactId:packaging:version.
|packaging||type of artifact (jar)|
A project defines its own coordinates in a Maven pom.xml or Leiningen project.cjl file. The pom.xml file can be uploaded to a Maven repository in order to make it available to others. The most popular Maven repositories consist of Maven central and, for Clojure, clojars.org.
Dependencies can be expressed by specifying the versions that are required. There are snapshot and release version types, where snapshot versions are tracked "bleeding edge" releases suffixed with
-SNAPSHOT and release versions are frozen releases that may follow semantic versioning practices.
Version ranges can be used to restrict or relax certain dependencies. Version range format uses mathematical range format to express inclusive or exclusive ends, where an omitted end represents "infinity." A version without range delimiters refers to a "soft" version requirement so that the version chosen is deferred to the one required by any other transitive dependency, if any. Conversely, version number within brackets specifies a "hard" version dependency.
Although Maven is the most popular build tool in Java, its dependency management and project configuration is overly verbose for what's needed with Clojure. Instead, it's more common to use Leiningen when working with Clojure.
A project.cjl file is used for project configuration and dependency management. The
defproject macro is used takes as first two arguments the projects coordinates, with the second of the two being the version string on its own, so that Maven's coordinate com.abc:site:1.0.0 corresponds to
com.abc/site "1.0.0" in Leiningen. If the first argument is an unnamespaced symbol, then it's assumed that the artifact ID and the group ID are the same.
Leiningen has a variety of sub-commands to the main
||start REPL with dependencies in classpath|
||run all tests|
||package project into a jar file|
||perform ahead-of-time compilation|
||downloads all dependencies if necessary|
To perform ahead-of-time (AOT) compilation, an
:aot slot needs to be added to the
defproject macro with a value of
:all to compile all namespaces in the project, or a vector of namespaces specifying which to compile.
It may be useful to perform AOT compilation as a sanity check, but ignore the produced class files. While Maven has support for this, Leiningen doesn't and so the best approximation would be to perform a
compile followed by a
I encountered problems when setting
:aot in the
dev environment, it seems to particularly conflict with leiningen plugins. For this reason it's probably best to reserve this option for the
uberjar or other release environment.
doto macro can perform a series of operations on an object, yielding the object at the end.
Exceptions are the same as Java's. The
try form defines the scope of the exception-handling. It can contain a variable number of
catch forms and optionally a single
finally form. The
finally form executes regardless of whether or not there were exceptions, and it's only useful for side-effecting actions as it doesn't affect the result of the
throw form can be used to throw an exception, which must take an instance of a class that is
Idiomatic Java uses
finally 14 to ensure that resources are closed properly in the case of exceptions. Java 7 introduced try-with-statements which can do this more concisely. Similarly, Clojure provides the
with-open form that does this as well, which can take a variable number of resource bindings.
Type hints are explicit indications of the object type to the Clojure compiler. When this isn't done, Clojure code that uses interop might perform reflection at run-time to determine the method to call, which can impact performance. Setting
true can yield reflection warnings which warn when a particular method can't be resolved statically.
Type hints can be added to return values by adding them to the argument vectors.
Primitive Java arrays are supported via a variety of functions.
There are also helper functions to create primitive arrays, such as
byte-array, and so on, which takes as argument the desired size or a collection from which to source the array.
Sometimes it's necessary to get the class of an array type to, for example, implement a protocol for an array of that type. Classes for arrays are written using JVM notation with left brackets
[ for each array dimension followed by a letter denoting the array type.
Class/forName function can then be used to yield a class from this notation.
There also exist type hints that are specifically for use with arrays.
reduce functions on arrays would box the resulting values. For this reason, the
areduce functions exist which avoid autoboxing. The
amap function takes as first argument the name to give the array, followed by the name to give the index, followed by the name to give the result array, and finally the expression whose result will be used as the value at the current index in the result array.
With multidimensional arrays, the intermediate levels are arrays of objects.
aset functions support multidimensional operations by recursively using
apply to get/set each level of the array. The use of
apply causes the primitives to get autoboxed, which would have a significant impact on performance. The workaround for this is to bind each array dimension to a different name in order to be able to provide type hints.
proxy form can be used to yield an instance of an anonymous class, like
reify, except that it can be used to extend an existing base class or interfaces. It takes a vector containing the base class and/or interfaces, and another vector for the arguments to use for that class' constructor. The rest of the forms are method implementations.
gen-class is more definitive in terms of defining a Java class, as it can define a class in any package with any name, extending a base class with access to its protected fields, implent interfaces, define a variable number of constructors, static and instance methods, and static main methods.
gen-class form must be AOT compiled, which causes it to emit Java class files. The implementations of the methods specified in
gen-class are delegated to regular Clojure functions, whose names start witha prefix which by default is
-. Method declarations specify their signature as a vector where the first element is the method name, the second element is a vector of the arguments' types, and the last and third element is the return type.
The namespace form
ns can accept a
:gen-class form which consists of a map that is forwarded to the
gen-class form, where the class takes on the name of the namespace.
Annotations can be created using regular metadata syntax on class-generation forms, such as
In the REPL,
*n contains the
nth recently evaluated expression's result, i.e.
*1 refers to the most recent result,
*e refers to the last uncaught exception.
The clojure.repl namespace contains various REPL utilities. The
pst function could be used to print the stack trace of the provided exception, or
*e if not provided. The
apropos function shows which functions in the loaded namespaces matches the provided regular expression or string, whereas
find-doc is similar but it searches within the documentation. The
source function can output the source code for the provided function. The
doc function can print out the documentation for a given var, and
dir can print a list of public vars in the given namespace.
Like Java, Clojure has a mixed numerics model so that it can make use of primitives and boxed numbers, although the only primitives it supports are
double. In fact, Clojure automatically widens return values of
short to their 64-bit or larger equivalents.
When an mathematical operation overflows the bounds of the value type, an exception is thrown instead of silently wrapping around as in other languages. There are
unchecked-* variants that don't perform this check. Alternatively, a global configuration value
*unchecked-math* can be set to true so that all mathematical operations are unchecked.
There are additional "prime" autopromoting variants of core mathematical functions which will automatically promote results that would otherwise overflow.
When comparing numbers,
= returns true if the numbers are the same and within the same category, e.g. both are integers. The
== function, on the other hand, returns true if the numbers are numerically equivalent, regardless of the categories they're in.
The clojure.test namespace contains the official Clojure test framework. Assertions are possible with the
is macro, which takes an expression it evaluates to determine if it is logically true and an optional message which will be printed if the assertion fails.
There are also assertions that can be used within expressions, such as
thrown? which ensures that a certain type of exception is thrown during evaluation, or
thrown-with-msg? which takes a regular expression to test the contents of the exception message.
testing macro can be used to nest tests so that each test's description is printed out when certain test fails.
deftest macro can be used to define tests such that they are marked as tests. This is done by defining a zero-argument function and adding a
^:test metadata slot. This allows test runners to easily find tests.
with-test macro can be used to define a function and its accompanying tests in a single location.
run-tests function can dynamically find tests in the specified namespaces by checking for the
^:test metadata. If no namespaces are specified, it searches in the current namespace:
It's possible to define test suites by simply having one test call another. However, this can cause the test runner to run the same tests multiple times. This can be avoided by simply omitting the
^:test metadata for subordinate tests, i.e. not using
Another way to avoid this is by defining an entry point for tests by creating a function called
test-ns-hook in the namespace, which, if found by
run-tests, is the only test function run in that namespace, giving complete control over which tests are run and their order.
Fixtures---which allow set-up and tear-down routines to establish and take down an environment required for a test---are naturally possible via higher-order functions, where the argument to the fixture is the test to run within the context of the environment the fixture establishes. Fixtures are mutually exclusive with the
test-ns-hook entry point for tests, so that if the entry point is defined, fixtures wont be run.
Fixtures are simply functions that take one argument---the test to run within its environment---and are registered using the
use-fixtures function. It takes as its first argument the kind of life cycle to use:
:onceruns the fixture once and then runs all of the tests within that environment
:eachre-runs the fixture separately for each test
are macro can templatize assertions in order to avoid repetition.
assert macro can be used to throw an error if the condition is not met, which is useful in instances where it's preferable to fail as early as possible. Assertions can be disabled globally by setting
false, which should be set before compilation of code that uses it.
fn special form has support for testing preconditions and postconditions, such that an error is thrown if one isn't met. This is done by specifying a map with
:post keys as the first value in the function body, which gets expanded to calls to
assert when the function is compiled.
The values to the keys are vectors where each item is a separate assertion. Postcondition assertions can refer to the value to-be-returned as
It wouldn't force the value in Haskell because of the concept of thunks.
This function is essentially Haskell's
Zippers are used in Xmonad to track window placement and focus.
This reminds me of Haskell's
Or more recently, the