From Merge Sort to Silicon: Why We Had to Rethink Computer Science

Boaz Menuhin
2 hours ago
3 min read

Have you ever stopped to think about what’s hidden in your database?

There are many types of databases. Think about the relational ones that you are familiar with, those with SQL and tables. Most of them support (roughly) the same set of operations. And they use the same algorithms to run as fast as possible on general-purpose processors such as x86 or Arm. General purpose implies that these processors can do anything, and they specialize in nothing.

But what happens if we move beyond the CPU to a processor that operates on entirely different principles?

For those who studied Computer Science, consider what the syllabus would look like if the only processor that exists is a GPU. What would the data structures course look like? Which algorithms will be taught? What would be emphasized on a compilers course?

There is a lot of theory embedded into the basic building blocks that we use every day. Some aspects of Hash-Tables (casually referred to as Dictionaries) are nothing less than a magic happening on a daily basis. Compilers harness ideas from combinatorics and graph theory. Cache mechanisms and scheduling policies affect the behavior of every program running on a computer (and are usually transparent to the user) while having a rich literature that supports them. And these are just the basic building blocks!

More theory emerges when we put databases in. And when considering Big Data,

big-O notation becomes extremely relevant and noticeable.

All of this theory (the algorithms we learn, the data structures we rely on, the optimizations compilers perform) was developed for a CPU-centric world. But when the underlying hardware changes fundamentally, that entire theoretical foundation needs rethinking.

At Speedata, replacing the CPU isn’t a thought experiment. We’re changing compute perspective in real-time and building the APU (Analytics Processing Unit).

Beyond the CPU/GPU Binary: The APU

The APU processor is tailored specifically for database workloads. It is not a CPU. It is not a GPU. Its architecture is novel, and in many ways deliberately unconventional, combining the best of both worlds of CPU and GPU.

And because the architecture is so different, we have our own ISA, and thus we also need our own assembler, which requires our own compiler optimizations and our very own standard library. In fact, what we are doing in Speedata is building a whole computer from the ground up.

The Theory in Practice: The Sorting Problem

To understand the impact of this shift, consider the classic problem of sorting a list of $n$ elements. The default answer is reflexive: use Merge Sort. It runs in $O(n \log n)$ time, and we know there’s a lower bound of $\Omega(n \log n)$. Case closed.

Except it isn’t.

With more careful reflection, we recall that the lower bound is for the number of comparisons, not for time. Modern hardware can perform multiple operations in a single cycle, effectively "breaking" the traditional time-complexity model that assumes sequential execution. Furthermore, Merge Sort is highly "adaptive", each comparison depends on the result of the previous one, which makes it difficult to parallelize effectively in hardware.

On the other extreme, we could investigate Sorting Networks. These are fully non-adaptive (making them excellent for parallelism and silicon), but the most efficient theoretical constructions (like the AKS network are ruled out because of their massive "galactic" coefficients).

Breaking the Box

Trying to benefit from both worlds, we consider recent advancements in vectorized implementations of quicksort.

However, breaking out of the confines of the current hardware, we realize that we could do much better if only our APU had a certain feature.

Then comes the breakthrough. We aren't just software engineers trying to fit our code into a fixed box; we work in the same office as the architects of the silicon. We walked into the room next door, described the ideal instruction set for our sorting problem, and collaborated on a hardware feature that made the "impossible" optimal.

We aren't just writing software; we're co-designing the computer itself.

This experience isn't unique to our team. Every day, we have to keep our theoretical muscles in shape to get better insight into the algorithmic properties that allow us to accelerate databases. (We also practice theoretical research, see this talk from FOCS 2025).

We often mistake the 'existing way' for the 'only way.' At Speedata, we’re changing the perspective, developing hardware that doesn't just run algorithms, but redefines what they can achieve. If you're tired of working inside the box, come help us build a new one.

See open positions here.

See How Speedata Performs on Your Workloads

Beyond the CPU/GPU Binary: The APU

The Theory in Practice: The Sorting Problem

Breaking the Box