Clojure: merging maps by key (join)


We have two sequences of maps (tables) and we want to merge them on matching keys (relations).

(def name-table [{:userid 1 :name "Bob"}
                 {:userid 2 :name "Jo"}
                 {:userid 3 :name "Sarah"}])

(def email-table [{:userid 1 :email "bob@email.com"}
                  {:userid 2 :email "jo@email.com"}
                  {:userid 3 :email "sarah@email.com"}])

Using concat, group-by and merge

We can achieve this using concat, gourp-by and merge. We concatenate the two sequences together, group the maps by :userid key and then map over the grouped maps (rows) merging them.

(defn merge-on-key
  "Merger rows of table1 and table2 on matching key k."
  [k table1 table2]
  (->> (concat table1 table2)
       (group-by k)
       (map val)
       (map (fn [[row1 row2]]
              (merge row1 row2)))))

(merge-on-key :userid
              name-table
              email-table)

=> ({:userid 1, :name "Bob", :email "bob@email.com"}
    {:userid 2, :name "Jo", :email "jo@email.com"}
    {:userid 3, :name "Sarah", :email "sarah@email.com"})
 

This works for sequences of the same length, but what do we get if the sequences are different lengths (one of the tables has missing rows)?

(def name-table [{:userid 1 :name "Bob"}
                 {:userid 2 :name "Jo"}
                 {:userid 3 :name "Sarah"}])

(def email-table [{:userid 1 :email "bob@email.com"}
                  {:userid 2 :email "jo@email.com"}])

(merge-on-key :userid
              name-table
              email-table)

=> ({:userid 1, :name "Bob", :email "bob@email.com"}
    {:userid 2, :name "Jo", :email "jo@email.com"}
    {:userid 3, :name "Sarah"})
 

This returns all the user names with their corresponding emails if they have one. We effectively get a LEFT OUTER JOIN.

Using clojure.set/join

If we just want all user names with corresponding emails, also known as an INNER JOIN, we can use the join function from the clojure.set namespace.

(def name-table [{:userid 1 :name "Bob"}
                 {:userid 2 :name "Jo"}
                 {:userid 3 :name "Sarah"}])

(def email-table [{:userid 1 :email "bob@email.com"}
                  {:userid 2 :email "jo@email.com"}])

(clojure.set/join (set name-table) (set email-table))

=> ({:userid 1, :name "Bob", :email "bob@email.com"}
    {:userid 2, :name "Jo", :email "jo@email.com"})

This just scratches the surface of the relational algebra you can do with Clojure data structures. For more check out the clojure.set namespace.