GPU Systems

GPU Fabric

Workstation

SERVER

GPU Server NVLink

GPU Server NVSwitch

GPU Racks

Workstation:
Server:
GPU Server (NVLink):
GPU Server (NVSwitch):
GPU Racks:
GPU Fabric:

Overview
Cooling

The forefather of graphics card:

Classic graphics cards are operated in a PCI-E slot and, if necessary (greater than 75W), receive sufficient power via additional power cables.
A GPU is ultimately nothing more than a very powerful graphics card, which is not used for display on a screen, but for fast parallel calculation of complex tasks. It is far superior to a classic CPU in many areas – but not everywhere – for example when tasks cannot be parallelised.

The bottleneck PCIe:

PCIe in version 5 or 6 is already very fast, but is not enough if you want to operate several GPUs in such a way that they work like a single GPU; this requires a very high bandwidth between the GPUs, which must then have direct access to each other: This is why Nvidia has developed NVLink, a high-performance connection between professional GPUs that need to support this.

Who needs this?

Whether NVLink is required depends on the task; if you can calculate with several (even many) individual GPUs, you do not need a high-speed connection between the GPUs, but you need a single, very powerful GPU to solve the task efficiently, then NVLink is the solution.

NVIDIA NVLink

Source: https://www.nvidia.com/de-de/data-center/nvlink/

NVIDIA® NVLink® is a direct high-speed connection between GPUs. NVIDIA NVSwitch™ takes interconnectivity to the next level by integrating multiple NVLinks to enable all-to-all GPU communication at full NVLink speed within a single node such as NVIDIA HGX™ A100.

With the combination of NVLink and NVSwitch, NVIDIA was able to efficiently scale AI performance across multiple GPUs and win MLPerf 0.6 – the first industry-wide AI benchmark.

How does it work?

Up to 4 modern GPUs can be connected via NVLink, which is a cost-effective way of creating a virtual GPU consisting of 4 physical GPUs.
If more power is required, the NVSwitch is the way to go. This uses the special SXM slot designed for this purpose, which transmits both the PCI-E signals and the NVLink signals directly to the switch and is designed for the power required for high-performance GPUs (2025 is the standard value of 1500W per GPU)

If the power is not enough?

If a single server is not sufficient, 8 servers, for example, can be grouped together in a rack and connected via a high-speed network (400-800 Gbit in 2025); they are then configured in such a way that a single virtual GPU is created, but with the performance of 64 physical GPUs.

And still the top class:

Enormous computing power is required, especially for tasks in the AI sector. Today’s installations reach 100,000 GPUs and more (including their own power plant), consisting of many racks equipped with GPU servers, connected via high-speed fabrics, so the entire data centre can act like a virtual GPU.

NVIDIA NVSwitch

Source: https://www.nvidia.com/de-de/data-center/nvlink/

With the rapid spread of deep learning, the need for faster and more scalable networking has also increased. This is because PCIe bandwidth often proves to be a bottleneck for multi-GPU systems. Scaling deep learning workloads requires significantly higher bandwidth and lower latency.

NVIDIA NVSwitch relies on NVLink’s advanced communication capability to solve this problem. For even higher deep learning performance, a GPU fabric supports more GPUs on a single server, networked together through full bandwidth connections. Each GPU has 12 NVLinks to the NVSwitch to enable high-speed, all-around communication.

As always:

In the past, since the mid-90s, processors have been actively cooled with fans. With a 486 processor, a small heat sink 10 mm high was sufficient to cool the processor with the air flow barely available in the housing.

As always:

Modern CPUs have a TDP value of up to 500W, GPUs up to 1500W. This means that it is no longer technically possible to cool all components with air. On the one hand, up to 30% of the energy used is lost to the high-performance fans that blow the heat out of the device, which then has to be transported outside via cooling machines using even more energy and is lost unused. Not very efficient and not favourable in terms of the electricity bill either.

That's better:

Everyone knows that if you want to cool something, you use water: every modern car with an internal combustion engine does this.

Why is that the case? Well, hydrogen has the highest heat storage capacity – unfortunately, liquid hydrogen cannot be used in a technically sensible way, but fortunately water consists of 2 parts hydrogen and one part oxygen, so although it only has less than 30% of the capacity of hydrogen, it still has much more than other common materials, so it can absorb a comparatively large amount of heat energy with little heating.

Good for cooling, bad for the energy bill for a hot bath.

For the same cooling effect of one litre of water you theoretically need about 4m3 of air, while the water splashes comfortably, the air has to be blown at high speed over heat sinks. In reality, the amount of air is even greater because the heat transfer is less efficient.

Air Cooling

Source: https://store.supermicro.com/us_en/pub/media/catalog/

Performance:
Volume:
Installation:
Costs:
Maintenance:
Aesthetics:
Service life:
Risk:

The solution:

In order to achieve optimum heat dissipation, special heat exchangers are used on all relevant components (GPUs, CPUs, memory, voltage regulators) and in this way the heat is transferred directly into a liquid, which in turn transfers the energy into water in a heat exchanger.

The great thing about this is that you now have “hot” water at ~60-70 degrees, which can be put to good use, e.g. for heating or hot water. If this is not possible, the hot water can be cooled again with a cooling tower on the roof or other heat exchangers without using a lot of energy so that it can be used again for the servers. In contrast to air cooling, which requires intake air at around 20 degrees, the water can return at around 40 degrees – so there is no need for energy-hungry chillers even in summer.

The Workaround:

Many data centres in Europe are stuck in the 90s, rack power is often 5kW and “liquid cooling” is a foreign word.

You can’t run a GPU server there – you won’t get far with 5kW. We don’t have a solution for that, it’s up to the data centre operator. But there is a solution for the cooling issue: on the one hand, modern GPUs with air cooling have a reduced TDP value (unfortunately, less energy also means less computing power for the same purchase price) and on the other hand, as mentioned above, the air movement can only be achieved through high energy consumption by the fans.

Our side-by-side rack contains a large heat exchanger that is operated between 2 GPU racks, so two “Direct Liquid Cooled” racks with 100kW power each can be optimally cooled using this “side-by-side” rack, you have the full GPU power and overall less power consumption, no conversion is required in the data centre, the thermal energy is released into the room as with servers, only more efficiently.

Liquid Cooling

Source: https://store.supermicro.com/us_en/pub/media/catalog/

Performance:
Volume:
Installation:
Costs:
Maintenance:
Aesthetics:
Service life:
Risk:

GPU systems

The forefather of graphics card:

The bottleneck PCIe:

Who needs this?

NVIDIA NVLink

How does it work?

If the power is not enough?

And still the top class:

NVIDIA NVSwitch

As always:

As always:

That's better:

Air Cooling

The solution:

The Workaround:

Liquid Cooling

Home

Artificial Intelligence

Industry

Products

Support

Company

Quantum Computing

Legal