SQLite improving performance with pre-sort


In the last post we showed how the randomness of UUID4 can have a large impact on insert speed and how UUID7 can help. But, what about other data that might have random qualities? Where we can't reach for UUID7 to solve our problems?

Random data

We will use a 160-bit (20 byte) random value generated by SecureRandom similar to the one described in this article. Why? Because for things like session tokens you probably do not want to use UUID7 (leaks information, might not have enough entropy for your use case etc). This could also represent any data that has a primary key that is random (or more specifically unordered).

Here's the code to generate them:

(defn random-unguessable-uid []
  (let [buffer (byte-array 20)]
    (.nextBytes secure-random buffer)))

Let's see how this performs.

(d/q writer
  ["CREATE TABLE IF NOT EXISTS event(id BLOB PRIMARY KEY, data BLOB) WITHOUT ROWID"])
  
(dotimes [_ 10]
  (time
    (d/with-write-tx [db writer]
      (dotimes [_ 1000000]
        (d/q db ["INSERT INTO event (id, data) values (?, ?)"
                 (random-unguessable-id) data])))))

Results:

total rowstime in ms
10000002478
20000004927
30000006262
40000007195
50000008257
60000008704
70000009244
80000009771
900000010387
1000000011103

Around a hundred thousand inserts per second. Just like with UUID4 things are slow.

Pre sort

The defining characteristic of a B+ Tree is that it is ordered. Sequential writes are fast. Random data works against this thrashing pages, causing page splits and tree re-balancing. It's not a good time.

But, we are batching the data already. So what if we sort it before we insert?

First we need a fast way to compare our random id. It's 20 bytes, we don't want to iterate over them, so instead we just take the first 8 bytes and convert them into a long. We probably don't need to compare the entire byte array to get a good enough sort. The important part is this comparison is unsigned to match SQLite.

When two BLOB values are compared, the result is determined using memcmp().

Note: This probably isn't the fastest way to sort random data. I don't write blog post with an internet connection (it's too distracting). Search is also terrible these days. So mostly work with good old offline docs with fuzzy text search using dash (zeal is the linux equivalent). If you know a faster/better way to sort byte arrays in Java/Clojure let me know!

(defn bytes->long [^bytes bytes]
  (let [bb (ByteBuffer/allocate (count bytes))]
    (ByteBuffer/.put bb bytes)
    (ByteBuffer/.getLong bb 0)))
    
(defn byte-compare
  "Compares the first 8 most significant bytes of a byte array.
  Big Endian (matches SQLites blob sort)."
  [a b]
    (Long/compareUnsigned
      (bytes->long a)
      (bytes->long b)))

Let's see how this performs:

(d/q writer
  ["CREATE TABLE IF NOT EXISTS event(id BLOB PRIMARY KEY, data BLOB) WITHOUT ROWID"])

(dotimes [_ 10]
  (time
    (d/with-write-tx [db writer]
      (->> (repeatedly 1000000 random-unguessable-id)
        (sort byte-compare)
        (run!
          (fn [id]
            (d/q db ["INSERT INTO event (it, data) values (?, ?)"
                     id data])))))))

Results:

total rowstime in ms
10000002638
20000002846
30000002976
40000003125
50000003309
60000003898
70000003749
80000003984
90000004054
100000004332

Fun! Despite the overhead, sorting the batch improves performance by around 2-2.5x.

Conclusion

Hopefully, this post helps show you how batching data opens up useful optimisations like pre-sort when dealing with unordered data.

The full benchmark code can be found here.

Thanks to Everyone on the Datastar discord who read drafts of this and gave me feedback.