top of page

Deliver unmatched processing throughput to your Spark workloads

The Speedata APU achieves its breakthrough throughput by mapping the required processing into its internal hardware pipeline. The Speedata Dash software automatically configures a data flow in silicon, where row processing is broken into hundreds of steps, each efficiently flowing to the next one at every hardware clock. Therefore, at any given time, hundreds of rows are at different stages of processing in the hardware, in parallel, resulting in a processing throughput of over a billion rows per second.

Base_4x.png
Joins and agressions_4x.png
Decompression_4x.png
Colomur Processing_4x.png
Decoding_4x.png
Row Assembly_4x.png
Calisto_4x.png

Accelerated Parquet processing in hardware

Parquet is the leading file format for Analytics. Speedata’s APU efficiently processes Parquet files as part of its hardware pipeline, from decompressing and decoding columns through columnar filters and projections to rows assembly and flattening of nested data (EXPLODE).

Decompression

Uncompressing Parquet columns

Decoding

Decoding Parquet columns

Columnar processing

Computing column-level filters and projections

Row Assembly

Assembling columns into rows

Joins and aggregations

Computing joins and aggregations

Shuffle preparation

Preparing Spark task output

Seamless integration
with Apache Spark

Speedata’s Dash software transparently plugs into the Spark Catalyst optimizer to automatically identify and offload compute-intensive work to the APU, delivering dramatic acceleration for Apache Spark 3.x workloads on Kubernetes, YARN and standalone cluster managers

Apache spark graph

Analytics at the Speed of Silicon

Accelerate Apache Spark by 100x -

right at the hardware layer, with zero code changes

bottom of page