AI Infrastructure Meets Grid Optimization | Hydra Host, Parasail & Mercury

We are witnessing the most striking generational shift: by 2030, the United States will use more electricity for AI than it will use to manufacture all industrial products combined. 40% of new grid usage will be for AI. 

For this collaboration, we have built an end-to-end solution that optimizes the load on the electrical grid to deliver significant savings to AI customers. We deliver state-of-the-art open source models (e.g. Kimi K2) to thousands of users on the latest NVIDIA GPUs (B200s, H200s). These customers are building frontier applications with previously unimaginable ramp ups in users, so cost and scale is critical to them. 

 

Because the power grid is the most sensitive and overprovisioned to small moments of peak load, reducing this load for brief moments totalling only 3% of the day can turn a “no” from a utility company into a “yes”. This leads to a surprising magnification in cost savings across the physical infrastructure chain. With Parasail’s AI Deployment Network easily able to match load to supply and rebalance loads, we can pass cost savings > 10% onto end customers with no downsides or impact, and we can ensure hardware is available when they scale. 

 

As this project demonstrates, great things happen when companies that are masters of their craft of the respective parts of this value chain come together. From end AI users to the inference platform through the data center and GPU rental ecosystem, all the way to power generation and grid operation. Here is how Parasail, Hydra Host, and Mercury Computing came together to build a unique solution to save customers money and help solve the toughest challenge our electrical grid has ever seen.

 

Hydra's Key contribution:

With Hydra, Parasail and Mercury did not have to worry about sourcing GPUs, managing the physical infrastructure, or running the software that managed power load events. Our Brokkr API automated the provisioning of bare metal B200s and H200s across a vast network of data center partners, allowing compute to be deployed close to power sources and scaled with real-time demand. Parasail simply obtained high-performance GPUs in a familiar interface at a price far better than anything else on the market, due to the cost advantages of bare metal over cloud. Mercury focused on securing power from providers and installing their software in Hydra’s infrastructure. Parasail lives on the far end of demand (AI inference provider) and Mercury lives on the far end of supply (power supply to data centers). Hydra Host handled everything in between.


Parasail:

About Parasail:

Parasail pulls together the world’s supply of GPU compute to build the best AI inference platform. Our AI Deployment Network connects a massive fleet of GPUs around the world into a single global pool with managed inference services, giving customers the ability to run AI deployments at peak performance and efficiency anywhere in the world with no effort or complexity. With over 40B tokens of production traffic served daily, Parasail is quickly becoming the AI inference platform of choice for customers that care about running the latest models and deploying large-scale workloads while maintaining the flexibility and cost effectiveness critical for success. 


Parasail's Key Contribution:

Parasail’s role is to bring AI demand to this GPU infrastructure collaboration and manage the inference endpoints for customers. Parasail’s AI inference platform is built for the unprecedented explosion of demand for AI and the planetary-scale buildout of GPU infrastructure. They power a wide variety of workloads and use cases from high-reliability production-grade AI endpoints to overnight processing of massive batch workloads. Their platform excels at matching the supply of compute at a given time to demand, meaning power throttle events can easily be absorbed by load balancing to other sources or assigning the GPUs to less time-sensitive workloads.

 


Mercury Computing:

About Mercury Computing:

Mercury Computing provides workload flexibility for data centers, speeding time to power through flexible utility interconnection. Mercury’s FlexSLA solution, ideal for AI workloads, lowers the cost of computing in exchange for occasional, non-interruptive power scaling—turning data centers into flexible grid resources and aligning incentives between data center operators, their compute tenants and local utilities.

 

Mercury's Key contribution:

Through its partnership with Parasail and Hydra Host, Mercury has officially launched its FlexSLA solution—and along with it the first live demonstration of AI workload flexibility.


Mercury’s FlexSLA provides discounted computing in exchange for occasional workload flexibility via non-interruptive power scaling. Workload flexibility is highly valued by both data center operators and local utilities. A recent study by the Nicholas Institute for Energy, Environment and Sustainability at Duke University found that flexibility in just one quarter of one percent of hours—roughly 20 hours per year—would unlock 76 gigawatts of headroom for data center expansion over today’s grid infrastructure, avoiding the need for costly and prolonged grid infrastructure upgrades.


Through collaboration with Parasail and Hydra Host, FlexSLA is now live and in production at a state-of-the-art colocated AI data center in Brooklyn, New York, powered by local utility Con Edison. The flexible workloads are production inference requests run by Parasail against Kimi 2, a recently released open source model with over 1 trillion parameters. Parasail is hosting Kimi 2 on the newest generation NVIDIA Blackwell B200 8-way HGX server managed by Hydra Host.


FlexSLA is currently available for any Hydra Host GPU reservation in data centers across the US. It can be added to existing agreements or embedded into new GPU reservations. Moreover, Mercury has integrated its software solution with Brokkr, Hydra Host’s bare metal GPU server management platform, making it fast and easy for customers to get started. There’s no install, configuration, or hardware retrofit necessary.

 

For Parasail, Mercury’s first live FlexSLA customer, the impact on performance has been minimal, with initial results showing up to 25 percent decrease in energy consumption with only a 5 percent reduction in tokens per second inference throughput. Parasail’s distributed AI infrastructure easily handles occasional periods of brief power scaling, making the performance impact of FlexSLA barely noticeable—and its economics a no-brainer.

Share on