HARDWARE ACCELERATION
The ever-increasing processing performance that Moore’s Law predicted has run its course.
The performance and power gains of simply continuing to shrink transistors has run up against diminishing returns for the past decade. Indeed, in many areas of computing, general-purpose chip architectures have all but reached their maximum processing ability per dollar and are no longer practical for today’s colossal analytics and AI workloads.
The CPU, historically the semiconductor workhorse and hero of Moore’s Law, was designed for generic tasks and has improved performance only 3% year-over-year for the past three years — an especially small improvement in the face of today’s massive and growing demands.
For data infrastructure engineers and data architects, this price-performance decline relative to demand for scale and efficiency means trying to do more despite greater limitations in the data center – getting more performance, value, innovation, and business impact from finite space for additional hardware, little incremental infrastructure budget, and constraints from ballooning energy costs and environmental policies.
Simultaneously, data teams who rely on the infrastructure are forward-planning for new business-critical analytics priorities, while capacity is largely impeded by central IT’s planning cycles from three years ago (reflecting data trends from up to three years before that).
So what’s the best course forward? Enter domain-specific hardware acceleration. Hardware accelerators are processing engines designed for particular workloads beyond generic compute. Recently, the most prominent example of hardware acceleration has been the GPU, whose architecture is specialized for graphics and AI processing. But, despite the performance benefits of GPUs over CPUs for certain workloads, the greatest need for hardware acceleration persists in database analytics, by far the costliest and least-optimized category in the data center.
In fact, analytics and database workloads like ETL, including data cleansing and transformations, collectively contribute over 50% of current annual data center costs, and associated hardware spend topped $135B in 2022, according to Statistia research. However, the capacity trade-offs from the AI and machine learning workloads for which GPUs are optimized make them a low-ROI mismatch for most analytics priorities.
KEY OBJECTIVES FOR
HARDWARE ACCELERATION
The primary benefits of hardware acceleration include consolidation of workloads, optimized cluster utilization, reduced number of servers, higher density configurations, optimized rack space, and energy efficiency:
Consolidation of​
Workloads
-
With hardware acceleration, a single server equipped with accelerators can handle more intensive workloads. This enables the consolidation of multiple analytics workloads onto fewer physical servers.
-
By consolidating workloads, you can optimize resource utilization and minimize the overall footprint of your infrastructure.
Optimized Cluster
Utilization
-
Hardware acceleration optimizes cluster utilization by allowing Spark to make better use of available resources.
-
Improved utilization means that you can achieve higher performance without having to scale up the cluster size, leading to cost savings.
Energy
Efficiency
-
Hardware accelerators are designed to be more energy-efficient for analytics and database workloads, compared to general-purpose CPUs.
-
Utilizing hardware acceleration achieves higher computational efficiency per watt of power consumed, resulting in a more energy-efficient data center and reduced power consumption.
-
Lower energy consumption not only reduces operational costs, but it also contributes to environmental sustainability.
Higher Density
Configurations
-
Hardware accelerators are designed to be more compact than traditional CPUs, allowing for higher density configurations in server racks.
-
This higher density can contribute to better space utilization within the data center.
Optimized Rack Space
-
As the number of servers decreases due to hardware acceleration, this presents opportunities to optimize the layout of server racks.
-
Fewer servers potentially frees up room for other equipment and improves overall airflow and cooling efficiency in the data center.
Reduced Number
of Servers
-
Hardware accelerators can handle tasks more efficiently than traditional CPUs, reducing the need for a large number of servers to process data.
-
The decreased reliance on numerous servers can lead to a more compact and space-efficient data center and lower hardware costs.
THE SHORTCOMINGS OF HARDWARE
ACCELERATION WITH FPGAs ALONE
Given the enormous challenge and opportunity of hardware acceleration for long-running database analytics workloads like ETL and SQL, field-programmable gate arrays (FPGAs) represent a common option for short-term processing performance improvements over CPUs alone. But their relatively low price-performance ultimately makes them an unsustainable choice for most enterprise analytics workloads.
One of the advantages of FPGAs for hardware acceleration is their general availability, since they have been on the market from manufacturers like Xilinx and AMD for a long time. A shorter time to market due to lower fabrication and manufacturing overhead than a processor chip also means we see more new FPGA options more often. But FPGAs are not especially performant, undercutting the actual priorities for most enterprises trying to accelerate their analytics capabilities to meet the actual needs and goals of their data teams.
​
FPGAs are programmable by nature, so their configurable logic blocks, interconnects, and other engineering-defined resources that make them flexible also end up consuming much more power than other hardware accelerators. This configurability is also reflected in their low density and large form factor. When space is limited – and space unavailability has become a defining characteristic of the modern enterprise data center – FPGAs lose their appeal and value very rapidly by exacerbating the cost of space overutilization.
Although FPGAs tend to have low implementation costs, their high ongoing programming requirements and low levels of integration ultimately drive up their recurring engineering costs. Especially for complex or high-volume production jobs typical of long-running analytics, these operating expenses drive up FPGAs’ total cost over their amortized lifecycle, often washing out their unit price-performance, even compared to generic CPUs. An example of an FPGA hardware accelerator for analytics is Neuroblade.
SPEEDATA APU: MAXIMUM HARDWARE ACCELERATION FOR ANALYTICS
Despite their prevalence, FPGAs are, at best, a very short-term approach to hardware acceleration without much focus on the underlying cost, scale, space, and operational issues that drive down the price-performance of analytics in the data center.
Alternatively, a much longer-term approach to hardware acceleration can start with an application-specific integrated circuit (ASIC) for database analytics. ASICs are specialized semiconductor devices and circuitry custom-designed for distinct applications, offering performance optimized to a set of use cases, increasing their operational efficiency over their lifecycle as they scale.
With an ASIC processor, users can integrate multiple functions and into a single chip, reducing the need for additional components and simplifying the data infrastructure engineer’s and data architect’s overall jobs when it comes to designing, planning, and integrating infrastructure. The benefits are reduced systems complexity, lower costs, smaller footprint, higher compute density, lower maintenance, greater energy efficiency, and improved reliability. An ASIC is ideal for analytics’ high data volume and long-running workloads, significantly driving down per-unit cost as production volume increases.
Speedata’s Analytics Processing Unit (APU) is the world’s first ASIC processor architected as a hardware accelerator for analytics engines like Apache Spark, Trino, and Presto. The APU executes a broad range of tasks in parallel and can handle any data type and field length. Historically, the hardware and compute costs of analytics have grown directly in proportion to the growth of data (although the costs have actually begun to accelerate past data growth when accounting for operating expenditure as well as capital outlay). In direct competition, the APU accelerates real enterprise Spark analytics workloads up to 100x faster than CPUs and many multiples faster than GPUs. In a head-to-head comparison, the APU demonstrates an average 91% cost reduction for analytics workloads.
Speedata’s APU uniquely decompresses, decodes, and processes millions (or even billions) of records from Parquet or ORC files per second, eliminating the I/O, compute, and capacity bottlenecks created by other chips that have to write and store intermediate data back to memory.