Clojure: code as data
In Clojure, the primary representation of code is the S-expression that takes the form of nested sequences (lists and vectors). The majority of Clojure's functions are designed to operate on sequences. As a result, Clojure code can be manipulated using the same functions that are used to manipulate Clojure data. In other words, there is no distinction between code and data. Code is data. This property is known as homoiconicity. This article will explore this concept.
Code equality
We can check two pieces of code for equality by turning the code into data with a '
symbol and then comparing them with =
:
(= '(map inc items)
'(map inc items))
=> true
(= '(map inc items)
'(map dec items))
=> false
Simple.
Code diff
We can use clojure.data/diff
to find out the difference between these two pieces of code:
(clojure.data/diff '(map inc items)
'(map dec items))
=> [[nil inc] [nil dec] [map nil items]]
From this we can see that the second item in the expression is different. But what happens with nested code?
(clojure.data/diff '(when activate?
(map dec items))
'(map dec items))
=> [[when activate? (map dec items)] [map dec items] nil]
Looks like the diff
function doesn't recognise the commonality between these two pieces of code.
Commonality with tree-seq
The tree-seq
function returns a lazy sequence of the nodes in a tree, via a depth-first walk. It takes two functions and the root node of a tree. The first function needs to return true
if the node is a branch (can have children); in this case we use coll?
which returns true
if the node is a collection. The second function will be called on these branch nodes; in this case seq
which returns a sequence of the children of that node or nil
if there are none:
(tree-seq coll? seq '(when activate?
(map dec items)))
=>
((when activate? (map dec items))
when
activate?
(map dec items)
map
dec
items)
We can use tree-seq
to build a sequence of each node for two pieces of code and then compare each node using a for
comprehension, filtering the results which are equal:
(defn find-common-code [code-a code-b]
(-> (for [a (tree-seq coll? seq code-a)
b (tree-seq coll? seq code-b)
:when (= a b)]
a)))
(find-common-code '(when activate?
(map dec items))
'(map dec items))
=> ((map dec items) map dec items)
This returns the common nodes: (map dec items)
, map
, dec
and items
.
Refactoring common code
Here we have some of the code used to generate the RSS feed for this blog:
(defn generate-feed-items [posts]
(map (fn [{:keys [post-name date post-path-name]}]
(let [post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts))
(defn generate-rss-feed [posts]
(xml/sexp-as-element
[:rss
{:version "2.0"
:xmlns:atom "https://www.w3.org/2005/Atom"
:xmlns:dc "https://purl.org/dc/elements/1.1/"}
[:channel
[:title site-title]
[:description site-tagline]
[:link site-url]
[:atom:link
{:href site-rss :rel "self" :type "application/rss+xml"}]
(map (fn [{:keys [post-name date post-path-name]}]
(let [post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts)]]))
We think some of the code might be duplicated and we want to refactor it out. With what we have just learnt, we can write a function to find the largest piece of common code.
First, we need a function to define the largest piece of code. We decide to use string length as a simple heuristic:
(defn code-string-length [code]
(count (str code)))
(sort-by code-string-length > '((+ 1 2 3) (* 1 200) (map inc [1 2 3 4])))
=> ((map inc [1 2 3 4]) (+ 1 2 3) (* 1 200))
The find-largest-common-code
uses tree-seq
to get a sequence of nodes and then frequencies
to find duplicates. We also make sure these duplicates are collections as we aren't interested in duplicates variable names, strings etc. Finally, we order by our string length heuristic and pick the first item:
(defn find-largest-common-code [code]
(->> (tree-seq coll? seq code)
frequencies
(keep (fn [[code freq]]
(when (and (> freq 1) (coll? code))
code)))
(sort-by code-string-length >)
first))
Trying this out on our large piece of code returns the largest duplicate:
(find-largest-common-code
'((defn generate-feed-items [posts]
(map (fn [{:keys [post-name date post-path-name]}]
(let [post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts))
(defn generate-rss-feed [posts]
(xml/sexp-as-element
[:rss
{:version "2.0"
:xmlns:atom "https://www.w3.org/2005/Atom"
:xmlns:dc "https://purl.org/dc/elements/1.1/"}
[:channel
[:title site-title]
[:description site-tagline]
[:link site-url]
[:atom:link
{:href site-rss :rel "self" :type "application/rss+xml"}]
(map (fn [{:keys [post-name date post-path-name]}]
(let [post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts)]]))))
=>
(map
(fn
[{:keys [post-name date post-path-name]}]
(let
[post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts)
We can then refactor our code accordingly:
(defn generate-feed-items [posts]
(map (fn [{:keys [post-name date post-path-name]}]
(let [post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts))
(defn generate-rss-feed [posts]
(xml/sexp-as-element
[:rss
{:version "2.0"
:xmlns:atom "https://www.w3.org/2005/Atom"
:xmlns:dc "https://purl.org/dc/elements/1.1/"}
[:channel
[:title site-title]
[:description site-tagline]
[:link site-url]
[:atom:link
{:href site-rss :rel "self" :type "application/rss+xml"}]
(generate-feed-items posts)]]))
Much better!
A macro implementation
If we don't want to pass the code as a quoted list, we can change the find-largest-common-code
function into a macro:
(defmacro find-largest-common-code [& body]
`(->> (tree-seq coll? seq ~body)
frequencies
(keep (fn [[code freq]]
(when (and (> freq 1) (coll? code))
code)))
(sort-by code-string-length >)
first))
(find-largest-common-code
(defn generate-feed-items [posts]
(map (fn [{:keys [post-name date post-path-name]}]
(let [post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts))
(defn generate-rss-feed [posts]
(xml/sexp-as-element
[:rss
{:version "2.0"
:xmlns:atom "https://www.w3.org/2005/Atom"
:xmlns:dc "https://purl.org/dc/elements/1.1/"}
[:channel
[:title site-title]
[:description site-tagline]
[:link site-url]
[:atom:link
{:href site-rss :rel "self" :type "application/rss+xml"}]
(map (fn [{:keys [post-name date post-path-name]}]
(let [post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts)]])))
=>
(map
(fn
[{:keys [post-name date post-path-name]}]
(let
[post-url (str site-url "/" post-path-name)]
[:item
[:title post-name]
[:pubDate (date->rfc822 date)]
[:link post-url]
[:guid {:isPermaLink "true"} post-url]]))
posts)
Hopefully, this example helps illustrate the value of code as data (homoiconicity).