How server disaggregation could make cloud data centers more efficient

The growth in cloud computing has shone a spotlight on data centers, which already consume at least 7 percent of the global electricity supply and growing, according to some estimates. This has led the IT industry to search for ways of making infrastructure more efficient, including some efforts that attempt to rethink the way computers and data centers are built in the first place.

In January, IBM researchers presented a paper at the High Performance and Embedded Architecture and Compilation (HiPEAC) conference in Manchester on their work towards disaggregated computer architecture. This work is part of the EU funded dReDBox project, part of the Horizon 2020 research and innovation program.

Disaggregation means separating servers into their constituent compute and memory resources so that these can be allocated as required according to the needs of each workload. At present, servers are the basic building blocks of IT infrastructure, but a workload cannot use more memory or CPU resources than are available in one single server nor can servers easily share any spare resources outside their own box.

“Workloads deployed to data centers often have a big disproportionality in the way they use resources. There are some workloads that consume lots of CPU but don’t need much memory, and on the other hand other workloads that will use up to four orders of magnitude more memory than CPU,” said Dr Andrea Reale, Research Engineer for IBM.

Across the datacenter, this means that some servers will be utilizing all their CPUs but still have lots of spare memory, while for others it will be vice versa, and these resources continue to suck power even if they are not being used. According to Reale, about 16 percent of CPU and 30 percent of memory resources in a typical datacenter may be wasted this way.

But what if you could compose servers under software control to have as many CPUs and as much memory as each particular workload requires?

Separating compute and memory

The dReDBox project aims to address this by using discrete compute and memory modules known as bricks. These are connected together by high speed links, enabling enough compute bricks to be paired with enough memory bricks to meet the requirements of whichever workload is running at a given moment. In theory, this enables a server to be composed for a specific application, with as many CPU cores and as much memory as the job requires, and those resources can then be returned to the pool and used for something else once the workload is no longer required.

As part of its research project, the dRedBox team has built a demonstration system where the bricks are built around Xilinx Zynq Ultrascale+ ARM-based system-on-chip (SoC) silicon. The compute bricks have a small amount of local memory, while the memory bricks have a much larger amount of DDR4 memory that they serve up for the compute bricks.

There are also two other kinds of brick in the dRedBox architecture; accelerator bricks that may provide either GPU or FPGA hardware to boost applications like machine learning or analytics; and a controller brick, which is a special type of brick that manages all the others.

To fit in with existing data center infrastructure, the dRedBox team envisages that the bricks in any production deployment would be housed in a 2U enclosure resembling a standard rack-mount server system. These enclosures may contain any mixture of brick types.

The beauty of this modular arrangement is that it also makes for easy upgrades; the operator can simply replace compute bricks for newer ones with higher performance, or likewise swap memory bricks for ones with a greater memory capacity, rather than junk the entire server.

However, the key part of the whole architecture is the interconnect technology that links the bricks together. This has to be both high-speed and low latency, otherwise performance would take a hit when a compute brick reads data stored in a memory brick.

For its demonstration system, the dRedBox team has used an electrical switch matrix to connect bricks within an enclosure, while an optical switch matrix links to bricks within another enclosure in the rack. Unusually for an IT environment, these switch matrices are circuit switched, meaning they create a dedicated pathway between bricks once configured, unlike a packet-switched network such as Ethernet, where data is routed to its destination based on the address in the data packet.

This arrangement was chosen precisely because of the need for low latency, according to Reale.