SQLite improving performance with pre-sort

07 Jun 2026

In the last post we showed how the randomness of UUID4 can have a large impact on insert speed and how UUID7 can help. But, what about other data that might have random qualities? Where we can't reach for UUID7 to solve our problems?

Random data

We will use a 160-bit (20 byte) random value generated by SecureRandom similar to the one described in this article. Why? Because for things like session tokens you probably do not want to use UUID7 (leaks information, might not have enough entropy for your use case etc). This could also represent any data that has a primary key that is random (or more specifically unordered).

Here's the code to generate them:

(defn random-unguessable-uid []
  (let [buffer (byte-array 20)]
    (.nextBytes secure-random buffer)))

Let's see how this performs.

(d/q writer
  ["CREATE TABLE IF NOT EXISTS event(id BLOB PRIMARY KEY, data BLOB) WITHOUT ROWID"])
  
(dotimes [_ 10]
  (time
    (d/with-write-tx [db writer]
      (dotimes [_ 1000000]
        (d/q db ["INSERT INTO event (id, data) values (?, ?)"
                 (random-unguessable-id) data])))))

Results:

total rows	time in ms
1000000	2478
2000000	4927
3000000	6262
4000000	7195
5000000	8257
6000000	8704
7000000	9244
8000000	9771
9000000	10387
10000000	11103

Around a hundred thousand inserts per second. Just like with UUID4 things are slow.

Pre sort

The defining characteristic of a B+ Tree is that it is ordered. Sequential writes are fast. Random data works against this thrashing pages, causing page splits and tree re-balancing. It's not a good time.

But, we are batching the data already. So what if we sort it before we insert?

First we need a fast way to compare our random id. It's 20 bytes, we don't want to iterate over them, so instead we just take the first 8 bytes and convert them into a long. We probably don't need to compare the entire byte array to get a good enough sort. The important part is this comparison is unsigned to match SQLite.

When two BLOB values are compared, the result is determined using memcmp().

Note: This probably isn't the fastest way to sort random data. I don't write blog post with an internet connection (it's too distracting). Search is also terrible these days. So mostly work with good old offline docs with fuzzy text search using dash (zeal is the linux equivalent). If you know a faster/better way to sort byte arrays in Java/Clojure let me know!

(defn bytes->long [^bytes bytes]
  (-> (ByteBuffer/wrap bytes 0 8)
    (ByteBuffer/.getLong 0)))
    
(defn byte-compare
  "Compares the first 8 most significant bytes of a byte array.
  Big Endian (matches SQLites blob sort)."
  [a b]
    (Long/compareUnsigned
      (bytes->long a)
      (bytes->long b)))

Let's see how this performs:

(d/q writer
  ["CREATE TABLE IF NOT EXISTS event(id BLOB PRIMARY KEY, data BLOB) WITHOUT ROWID"])

(dotimes [_ 10]
  (time
    (d/with-write-tx [db writer]
      (->> (repeatedly 1000000 random-unguessable-id)
        (sort byte-compare)
        (run!
          (fn [id]
            (d/q db ["INSERT INTO event (it, data) values (?, ?)"
                     id data])))))))

Results:

total rows	time in ms
1000000	1987
2000000	2251
3000000	2296
4000000	2614
5000000	2687
6000000	3244
7000000	3118
8000000	3311
9000000	3485
10000000	3835

Fun! Despite the overhead, sorting the batch improves performance by around 2-3x.

Conclusion

Hopefully, this post helps show you how batching data opens up useful optimisations like pre-sort when dealing with unordered data.

The full benchmark code can be found here.

Thanks to Everyone on the Datastar discord who read drafts of this and gave me feedback.

SQLite improving performance with pre-sort

Random data

Pre sort

Conclusion

Discussion