Blog Article:

How Infrastructure-First AI Factories Maximize the Economics of AI

Understanding the ROI of an OpenNebula AI Factory

NVIDIA GTC Taipei marked an important milestone for OpenNebula. During Jensen Huang’s keynote, OpenNebula was highlighted as part of NVIDIA DSX’s AI Factory Software ecosystem, reinforcing the role of infrastructure control planes in the next generation of AI infrastructure. One of the key messages from the presentation was simple but powerful: in an AI Factory, compute is revenue. That means the economics of AI infrastructure are shaped not only by the number of GPUs deployed, but also by operational parameters such as time to first token, tokens per watt, reliability, and the useful life of the infrastructure.

AI Jensen Talk

AI Factories are not just GPU clusters. Once they move into production, the challenge is no longer simply running models. The real challenge is turning expensive GPU infrastructure into a platform that can be shared securely across users and organizations, operated efficiently, measured accurately, and offered as a reliable service.

That is why the management layer matters. Kubernetes platforms help run AI applications, and MLOps platforms help manage models and pipelines. But neither is designed to operate an AI Factory as a whole. Large-scale AI environments also need a foundation that can manage infrastructure, virtualization, bare metal, Kubernetes clusters, tenants, quotas, accounting, networking, storage, observability, and day-to-day operations in a consistent way.

This is where OpenNebula fits. OpenNebula provides the infrastructure control plane that brings these elements together, allowing GPU resources to be delivered as secure, multi-tenant cloud services while maintaining operational simplicity and governance.

The result is straightforward: infrastructure providers can extract more value from the same hardware investment. By simplifying operations, improving resource utilization, and enabling flexible service delivery models, OpenNebula helps reduce time to first token, increase tokens per watt, improve reliability, and extend the useful life of AI infrastructure across multiple generations of hardware and software.

A Simple Example: 1,000 GB200 GPUs

To keep the example simple, consider an AI Cloud with 1,000 GB200 GPUs. If the actual deployment uses multi-GPU nodes, the numbers scale linearly with the number of GPUs, while the economic logic remains the same.

Based on publicly available pricing for B200/GB200-class cloud GPU capacity, a reference market value of €7–€15 per GPU-hour is used, with the lower end reflecting B200-class on-demand pricing and the middle-to-upper range reflecting GB200/B200 capacity from major cloud and specialist GPU providers.

ItemValue
GPUs managed1,000
Reference value per GPU-hour€7–€15
GPU capacity value per hour€7,000–€15,000
GPU capacity value per year, 24×7€61M–€131M

This means that even a relatively small 1,000-GPU AI cloud represents tens or hundreds of millions of euros per year in potential GPU capacity value.

The ROI question is therefore not only how much infrastructure is deployed, but how much of that infrastructure can be effectively used, shared, governed, and monetized. Small improvements in effective utilization have a major financial impact.

Effective utilization improvementAdditional usable GPU capacity value / year
1%~€0.6M–€1.3M
2%~€1.2M–€2.6M
3%~€1.8M–€3.9M

This is why the infrastructure control plane matters. At AI Factory scale, even modest gains in utilization, provisioning speed, tenant allocation, reliability, or workload placement can unlock millions of euros of additional usable GPU capacity.

That changes the ROI discussion. The question is not simply, “How much does the platform cost?” The better question is, “How much additional usable GPU capacity can the platform unlock?

If OpenNebula improves effective GPU utilization by only 1–2 percentage points, the additional usable GPU capacity can offset a substantial part, or potentially all, of the annual platform subscription cost. For the proposed GPU model, a 1% improvement in effective utilization represents ~€0.6M–€1.3M per year of additional usable GPU capacity, which is directly comparable to the annual OpenNebula subscription cost.

In other words, OpenNebula is not just a control-plane cost. It is an operational efficiency layer that helps maximize the utilization, governance, monetization, and sovereignty of the AI Gigafactory’s GPU investment.

OpenNebula helps infrastructure providers get more value from their AI investments by reducing time to first token, increasing tokens per watt, improving reliability, and extending the useful life of their infrastructure.

Reducing Time to First Token

For users, AI infrastructure becomes valuable when the model starts responding. That makes “time to first token” more than just an inference metric. It is also an infrastructure metric.

If a tenant has to wait days for GPU allocation, network setup, storage access, identity configuration, and model deployment, the infrastructure is not behaving like a cloud. It is behaving like a queue.

OpenNebula shortens that path by automating the full lifecycle of GPU-enabled virtual machines, Kubernetes clusters, Slurm environments, and inference services. A tenant can request capacity or an AI service from a catalog, while OpenNebula handles the infrastructure allocation, isolation, quotas, accounting, and lifecycle operations behind it.

For the user, that means faster access to AI services. For the provider, it means faster time to revenue.

Increasing Tokens per Watt

The next generation of AI infrastructure will be judged not only by tokens per second, but by tokens per watt.

Power, cooling, and utilization are now strategic constraints. A GPU that is powered and cooled but poorly allocated is wasting money and energy.

OpenNebula helps operators place the right workload on the right infrastructure. High-end GB200 capacity can be reserved for workloads that need it. Other workloads can run on different GPU classes, VMs, Kubernetes clusters, or batch environments. The platform can also support quotas, accounting, placement policies, service catalogs, and lifecycle automation to reduce stranded capacity.

That is the practical meaning of tokens per watt: not only making the model efficient, but making the whole infrastructure more useful.

Improving Reliability

AI Factories will be operated as critical infrastructure. They cannot depend on isolated scripts, manual processes, or disconnected tools.

OpenNebula provides a unified operational model across the infrastructure layer: tenants, GPUs, VMs, Kubernetes clusters, bare metal, Slurm, AI services, quotas, accounting, monitoring, automation, and lifecycle management.

This matters because production incidents rarely stay inside one layer. A tenant issue may involve GPU allocation, DNS, certificates, IAM, firewall rules, storage, Kubernetes, Slurm, and observability at the same time.

When those layers are managed separately, troubleshooting becomes slow and risky. When they are governed through a single infrastructure control plane, operations become more predictable.

Reliability improves because complexity is reduced.

Extending the Useful Life of the Infrastructure

AI hardware evolves quickly. New GPU generations arrive, new interconnects appear, new inference engines become popular, and customer requirements change.

A platform tied too closely to one runtime, one hardware generation, or one operating model will age quickly.

OpenNebula is designed to avoid that. It can manage heterogeneous infrastructure across different CPU architectures, GPU families, virtualization models, Kubernetes runtimes, Slurm environments, storage systems, networking fabrics, and AI platforms.

That flexibility helps extend the useful life of the infrastructure. Not every workload needs the newest GPU. Older or lower-cost accelerators can still be useful for development, batch inference, testing, education, fine-tuning, and SME workloads. Premium GPUs can be reserved for the workloads that truly need them.

This also supports sovereignty. If new European accelerators, DPUs, or AI processors become available, the management platform should be able to incorporate them without forcing a redesign of the whole AI Factory.

The Real Role of OpenNebula

OpenNebula is not another AI framework. It is the infrastructure-first control plane that makes an AI Factory manageable as a cloud.

It enables providers to expose multiple service models from the same infrastructure: GPU-as-a-Service, VM-as-a-Service, Bare-Metal-as-a-Service, Kubernetes-as-a-Service, Slurm jobs, LLM-as-a-Service, and RAG services.

This flexibility matters because no two AI Factories are the same. Some tenants want Kubernetes. Some want VMs. Some want bare metal. Some only want an API endpoint. Some need strong isolation. Others need maximum throughput.

An infrastructure-first platform lets the provider support all of these models without rebuilding the platform each time.

For a 1,000-GB200-GPU AI cloud, a 1% utilization improvement can already be worth roughly the same order of magnitude as the annual OpenNebula subscription. At 2% or 3%, the value unlocked can exceed the platform cost by a wide margin. 

This is the core argument: OpenNebula helps infrastructure providers get more value from their AI investments by reducing time to first token, increasing tokens per watt, improving reliability, and extending the useful life of their infrastructure. 

That is the difference between owning GPUs and operating an AI Factory. Learn how to build and operate AI infrastructure with full control.

Ignacio M. Llorente

Managing Director - Business Development at OpenNebula Systems

Jun 10, 2026

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *