In-memory databases

I first bumped into the in-memory database concept back around 2001, and since then have considered it a classic disruptive technology that would inevitably marginalize, then unseat conventional SQL databases.

All the old R-series derived database optimization strategies from back in the 1970's operate in a linear fashion, like a zipper. They assume you read one record at a time from each table ("stream"), compare those records with each other ("join"), and immediately write the result back to disk. (The metaphor actually predates disks, it started with punched card stacks.)

The old database design assumed you never have more than one record from each table in memory at a time, because who has that much memory? Everything is centered around that design assumption: you have a tiny amount of ram, so everything comes from disk in bite sized pieces and goes right back again. This assumption was undermined by Moore's Law, which doubled the available ram every year for the past few decades, and continues to do so.

In 2004, the department of defense commissioned a 2.5 terabyte ramdisk for use as a database server. The industry switched to 64 bit (x86-64) processors in 2005. The Tyan Thunder supported 64 gigabytes of ram, the Nvidia MCP55 pro chipset supported 128 gigs, the elegantly named "Tyan S4989WG2NR-SI" supported 256 gigs, and these days the supermicro 7500 supports half a terabyte of RAM.

So these days 64-bit systems with half a terabyte of ram are available retail. (And if you want to wander away from the PC, IBM PowerPC systems like the 780 server can hold a couple terabytes RAM each.)

Keeping all your tables in memory means you literally get a 3 orders of magnitude speedup (1000x), and you can use simple generic indexing strategies so the code becomes really simple. The first entirely in-memory database implementation I saw (a sourceforge project back in 2001) was a 1000 line python implementation on sourceforge that stored everything in python dictionaries. Over 90% of the code of that was implementing the SQL parser, not the actual database.

For persistence, all you need is an on-disk snapshot of the initial state, plus an append-only log of transactions that change that state. When your database needs to be restarted, you re-read the snapshot and replay the log to update the snapshot. Most of your atomicity and persistence guarantees center around making sure the log entries are committed to disk, but it's all fast streaming writes instead of having to seek all over the disk, and thus has no impact on reads.

When your log gets too long, you fork the database and have the child write a new snapshot, freeing its memory as it does so. The parent closes the old log file, opens a new one, and continues as normal. Due to the copy on write nature of fork, this doesn't require more physical memory as long as the child frees its pages as fast as the parent dirties them. When child finishes writing the new snapshot, the old snapshot and old log become obsolete and the child can archive or delete them as appropriate before exiting. If something goes wrong before the new snapshot is written out, you can recover the database state by reading the old snapshot and replaying the old and new logs, in order.

The more recent "nosql" movement is basically following up on this by saying "ok, if we've got our tables in memory why do we need SQL to talk to them?" The obvious way to design an in-memory database is thus in two parts: a direct layer providing function access to the data store with transactions and searching and persistence and such, and the other providing an SQL layer on top of that. Whether the database is in shared memory or not is an implementation detail.

The current crop of databases (mysql, postgresql, oracle, and so on) are all painfully obsolete. As with the introduction of Unix, they were obsoleted by something much simpler, which figured out that 90% of the stuff they spent their time doing didn't need to be done at all.