top of page
lgo-Speedata-v02-0.png

See How Speedata Performs on Your Workloads

Test the Speedata Workload Analyzer with your Spark logs. It's free, secure, and runs in your environment.

259× Faster! What Microsoft's Award-Winning SIGMOD Paper Says About the Speedata APU 

  • Writer: Daniela Sztulwark
    Daniela Sztulwark
  • 4 days ago
  • 3 min read

Microsoft's paper "CoddSpeed: Hardware Accelerated Query Processing in Microsoft Fabric" won the Best Industry Paper Award at SIGMOD 2026, one of the top academic conferences in database research. The paper documents Microsoft's multi-year effort to move analytics from CPUs onto hardware accelerators and the Speedata C-200 Analytics Processing Unit (APU) is one of the three coprocessor architectures Microsoft integrated and benchmarked inside its next-generation Fabric architecture. 


Every measurement in the paper was run on Microsoft's infrastructure and published under Microsoft's name. Here's what's inside:


Microsoft's paper "CoddSpeed: Hardware Accelerated Query Processing in Microsoft Fabric" won the Best Industry Paper Award at SIGMOD 2026, one of the top academic conferences in database research. The paper documents Microsoft's multi-year effort to move analytics from CPUs onto hardware accelerators and the Speedata C-200 Analytics Processing Unit (APU) is one of the three coprocessor architectures Microsoft integrated and benchmarked inside its next-generation Fabric architecture. 

Why Microsoft is moving analytics off CPUs 


Microsoft's framing is blunt, hardware accelerators now surpass traditional CPU servers by orders of magnitude in compute, memory, and networking, and in the paper's words, "Running analytics on hardware accelerators is therefore a business imperative." 


But Microsoft also made a deliberate architectural bet, that no single chip wins everything. Rather than committing to one accelerator, they built a hardware-agnostic Coprocessor Abstraction Layer (CAL) that lets host engines like SQL Server and Fabric Data Warehouse offload query fragments to any coprocessor that implements the API.


They validated three: GPUs, an in-house FPGA, and the Speedata APU. 


Microsoft CAL architecture diagram showing FPGA / GPU / ASIC implementations side by side

How Microsoft describes the APU 


Section 3.5 of the paper describes the engagement, which began in late 2023, and the hardware itself. In Microsoft's description, the C-200 APU is a CGRA-based dataflow machine on a 16-lane PCIe 5.0 board with 32GB of onboard RAM. It zero-copies columnar data files such as Parquet directly from host memory, performs decompression and decoding in hardware, executes relational operators: select, project, join, aggregation, and supports complex and nested data structures in streaming mode, covering all core SQL primitives. The APU operates at PCIe 5.0 line rate for all one-pass operations. 


That last set of details matters for a reason the paper itself spells out. Among Microsoft's "lessons learned" is pushing only a few operators to a coprocessor rarely delivers meaningful speedups, because transfer costs outweigh the accelerator's benefits. Real acceleration requires pushing down large query fragments. Broad SQL coverage in silicon is what makes that possible. 




The APU performance numbers 


Microsoft reports standalone APU performance through the CAL API, measured from the moment CAL is called to the moment result sets are generated, on TPC-H at scale factor 100, in Parquet format, warm in CPU memory, against single-core SQL Server on an Intel Xeon Platinum 8473C CPU @ 2.80GHz. 


The results, from Table 2 of the paper: 

TPC-H Query 

APU Speedup 

Q07 

36.7× 

Q10 

40.7× 

Q12 

49.4× 

Q13 

259.2× 

Q16 

61.5× 

The pattern Microsoft highlights is that the heavier the compute, the larger the gain.


Q13's 259.2× comes from a LIKE predicate over the O_COMMENT column, string-heavy, branching work that is, in the paper's words, "largely amenable to hardware acceleration." It's the class of query that strains CPU-based analytics at enterprise scale.



Speedup of the APU integration with CAL vs. SQL Server single core

Where GPUs fit and where they don't 


Microsoft is clear-eyed about its costs. In the Discussion section, they describe the FPGA and APU efforts as invaluable for exploring "different price-performance trade-offs when compared with the powerful but costly and power-hungry GPUs." 


That sentence is the thesis Speedata was founded on, now in the peer-reviewed record under Microsoft's name: real analytic queries, joins, filters, aggregations, and branching logic over messy enterprise data, are not matrix math, and the economics of running them on GPU infrastructure don't hold. Purpose-built silicon offers a better price-performance path. 




One more lesson worth noting 


Microsoft's first "lesson learned" in the paper is transitioning to hardware-accelerated query processing "should be evolutionary, not revolutionary", integrating coprocessors into existing architectures rather than replacing entire systems which is how the APU deploys, inside the engines and architectures enterprises already run, with no re-architecture. 


Congratulations to the Speedata team - 16 of our amazing engineers contributed to this industry-altering project. Thank you, Jonathan, Dror, Boaz, Pop, Vlade, Luka, Tal S., Konst, Uros, Daniel Markovsky, Atheel, Ofer, Guy, Jovana, Ofek, and Matan. This is your win.


See what the APU can do for your workloads - Test it for free.



Read Microsoft's paper "CoddSpeed: Hardware Accelerated Query Processing in Microsoft Fabric" here.



*All images credited to Microsoft.

 

bottom of page