Everyone’s talking about liquid cooling. Should we adopt direct to chip (D2C) or immersion? How long do we have before it becomes a market imperative? And, how much do we actually know about the practicalities of installing liquid cooled technology (LCT)?
The step change that’s needed to bring LCT into data centres can be daunting. It requires a rethink of how we’re doing things – just like the AI processes it supports. Along with procuring new equipment, operators will need to review their mechanical, electrical and structural infrastructure, forge a new path for commissioning and overhaul their resiliency and maintenance plans.
Operational procedures will require a total revamp with new procedures and support equipment. This may include lifting gantries to pull servers out of immersion tanks, modified small form-factor pluggable transceivers (SFPs) to interface servers in an immersion environment, leak detection equipment, coolant distribution units (CDUs), tertiary cooling loops and distribution manifolds.
Operations managers will need to upskill and develop strategies to deal with new technology and methods to deploy them. They’re no longer just patching in data and power; they may now find themselves connecting coolant piping, bleeding off cooling loops and trying to manage contamination within the systems.
The transition to hybrid data centres will pose an especially complicated challenge for infrastructure design. We’re looking at 2 very different types of cooling technologies, each with their own subsets and a huge amount of traditional gear that remains air cooled.
Taking the heat out of AI: Sustainable solutions for liquid cooled AI data centres
To learn more about why data centres need liquid cooling technology, download Tetra Tech’s white paper.
Considerations for getting liquid-cooled ready
How far do you want to go?
When planning for LCT, an operator needs to be clear on how far they want to deploy. This poses a difficult question in the current environment where multiple technologies are available. We’re trying to cater for technologies and equipment that may not be fully rolled out in the market and are experiencing rapid evolution and development.
Do you want:
- tap-off points in the chilled water system
- fully deployed suites ready to start accepting D2C equipment
- to upsize the electrical infrastructure and floor loadings to suit?
What to know going into design:
- Technology type, i.e. D2C or immersion
- Power densities
- Target client, i.e. a hyperscaler (that takes the entire hall and fills it with typical equipment) or multiple clients in a co-location model (where there are varying requirements for equipment)
- Specific equipment requirements, for example temperatures, pressures and flow rates
- Resiliency requirements and if any formal accreditation is being sought for the facility.
Every operator wants to focus on their return on investment so spending money on infrastructure that doesn’t align with customers’ needs should be avoided.
Liquid cooling options available
Partial design and installation: consideration for tap-off points from existing pipework, additional branches off the primary water loop and spatial allowances for liquid cooled equipment.
Design without installation: choose equipment, review infrastructure, i.e. electrical, mechanical and structural loading, and complete a design but don’t proceed with installation. The hall may be compatible with traditional air-cooled equipment in the interim.
Extended design and installation: install CDU units and extend secondary pipework into data hall. This may extend to provisioning racks with manifolds when considering L2C.
Equipment and servers
Many different solutions and products are available. The industry will standardise over time but, for now, it’s essential to know the specific requirements of planned equipment. Nobody wants to own the next Betamax player when the whole world moves to VHS.
The first step is to ascertain the equipment going in and its performance requirements. These might include water supply temperatures, temperature rise across equipment, and pressures and case temperatures on the graphics processing unit (GPU).
If you don’t know, don’t go too far. Instead, work on your strategy for procurement, design and build. Hold off on action until you’ve secured your customer and confirmed their requirements and equipment compatibility.
What you need to know about the equipment you plan to install:
- Required supply temperature – this varies from 25 to 40°C and has significant effects on the rating of equipment facility water temperatures and system topology of heat rejection plant.
- Expected temperature rise – the return water loop temperature impacts CDU selection and required performance.
- Compatible cooling fluids – the industry appears to be aligning on PG25 in single phase D2C solutions, however compatibility should always be reviewed. Incompatible materials present a significant risk to cooling performance caused by enhanced degradation that could lead to leaks, contamination or blockages. Changing fluid selection is a costly process.
- Pressure requirements and equipment – equipment needs to be compatible with the proposed cooling loop pressure.
- Residual heat loss that still requires air cooling – in D2C applications, traditional air cooling is still required to dissipate some residual load from power supplies, memory and more.
- Server warranty – is it warranted for installation in an immersion environment? If going for an immersion solution, who carries the warranty risk for equipment?
- Cold plate channel size – filtration is required to protect the cold plate from fouling, reducing flow rate and risking overheating.
- Allowable change and rate of change to flow temperatures at server level. Thermal stability is imperative due to the much larger heat-rejection quantities.
How is the equipment configured internally?
- Are the cold plates configured in parallel or multiple in series? This will impact the expected supply temperature and rise expected across the equipment.
- What material is the cold plate constructed from and what other materials may be present in the equipment? This will impact compatible coolants and materials selections for supporting infrastructure.
The racks for this equipment are also trending up in size and weight to support the servers and additional cooling infrastructure. If you’re building a new data hall, your floor-to-floor heights, rack layouts and footprints need to accommodate this size change.
If retrofitting an existing space, you’ll need to review existing limitations to ensure compatibility.
Installing D2C in an existing facility
Net zero
While liquid cooling technologies provide an opportunity to reduce power usage effectiveness (PUE) and limit the energy devoted to keeping servers cool, the technology driving this transition significantly increases energy consumption.
Using existing equipment and infrastructure
Essential considerations:
- Structural loading capacity of the existing floors – liquid-cooled racks have substantially higher weights and floor loadings.
- Space – a data hall needs sufficient space to install LCT. Pipework needs to be reticulated in an, often, already cramped space. Considering installation and management of pipework and leaks into the future may impact where pipework can be installed.
- Electrical supply and distribution – this needs to be sized to accept the higher rack loadings that come with higher density equipment.
- Chilled water infrastructure – do you have existing infrastructure to supply LCT and are the temperatures compatible?
Plant compatibility
Many data centres have transitioned to alternative technologies – such as indirect evaporative coolers – to reduce energy consumption. If existing centralised chiller plant is present, you need to review its compatibility with required temperatures in the facility water loop. New central plant will be required if some of the proposed rack supply temperatures aren’t compatible with dry coolers alone.
Resiliency
Thermal conditions
With significantly greater thermal output, D2C cold plates require a constant supply of coolant to maintain thermal conditions. Some manufacturers are suggesting just seconds of cooling loss can see the GPU going into thermal overload. Maintaining a constant supply is more important than in a traditional air-cooled data centre.
Redundancy
The traditional knowledge around how to deliver redundancy is being challenged. Traditionally, redundancy for mechanical systems has been at the room level. Designs need to be rethought to bring this down to the rack level.
In an environment already becoming more congested, this presents spatial challenges and introduces additional cost and complexity.
Managing a single loop
For D2C solutions, there are still only facilities for a single loop at the equipment level. If this loop is compromised, so are the racks. Consideration for smart valving strategies and automatic cut-offs and response will become essential.
Commissioning
Commissioning is a critical element for data centres, demonstrating that the facility will perform as designed and risk will be minimised.
The biggest limitation with LCT is that systems will only be able to be commissioned so far down the network. The secondary pipework network can still be commissioned and CDU performance confirmed, however, the impact of each rack’s installation is impossible to simulate.
For traditional mechanical technologies, commissioning was a one-off activity until major repair or modification. With D2C solutions, the equipment needs to be recommissioned with every new piece of equipment. Systems need to be purged and coolant concentrations, flow rates and fluid temperatures reviewed. This all takes time and requires specialist knowledge and training.
Most importantly, records are going to need to be maintained. Effective management of this process will be essential and will require well-thought-out monitoring and record keeping.
The evolving roles of stakeholders
Mechanical designers will find themselves down at rack level delivering bespoke design solutions until the industry begins to standardise.
Designers will need greater information on proposed equipment to allow solutions to be tailored and compatibility verified. As it currently stands, there’s no one-size-fits all solution.
Client support teams will require knowledge of the questions to ask future tenants about their systems requirements and proposed equipment.
Operators need to upskill to understand, operate and maintain this new equipment. They now need expertise in chemical treatment and management, hydraulic purging, contamination control, an understanding of commissioning and balancing flow rates.
Standards organisations such as IEEE, IEC, ASHRAE and manufacturers will need to continue to work together to standardise equipment and performance parameters to allow standardised solutions. With the rate that technology is changing and the race for performance gains, this will not be an easy task.
Redundancy guidelines such as TIA and Uptime will also need to continue to adapt and define expectations in the performance of these new technologies.
While many of the components are similar to those found in existing applications, the line between customer and facility owner is more blurred than ever. For traditional cooling systems, the responsibility of the facility owner was to provide air at a set temperature to the rack; this has changed to supplying liquid. It is the facility operator’s responsibility to provide this to the CDU, the tertiary water loop, the rack or the server itself.
Enterprise customers will need to have strict IT procurement policies and heavy liaison with their operations teams to ensure compatibility with any LCT infrastructure installed.
It’s not a question of ‘if’ but ‘when’
Becoming liquid-cooled ready is a significant shift that operators must embrace to remain competitive and efficient. The transition to LCT, through direct-to-chip or immersion, necessitates a comprehensive re-evaluation of existing infrastructure, operational procedures and skill sets.
As operators navigate this complex landscape, they must balance the adoption of cutting-edge equipment with the practicalities of design, installation and maintenance.
By proactively addressing critical considerations – such as cooling demands, equipment specifications, structural accommodations and resiliency challenges – operators can position themselves for success.
Upskilling personnel and fostering collaboration among stakeholders will be essential. Adhering to evolving standards will ensure that data centres are not only efficient and effective but also aligned with the industry’s push for sustainability and reduced energy consumption.
Ultimately, the question is not whether to adopt liquid cooling, but how quickly and strategically operators can integrate these technologies into their facilities. As the market continues to evolve, those who take the initiative to become liquid-cooled ready will be well-equipped to leverage the benefits of enhanced performance and increased energy efficiency.
The journey may be complex, but the potential rewards make it a necessary path for the future of data centre operations.