Blog Article:

Building Sovereign AI Factories: Multi-Tenant AI-as-a-Service with OpenNebula

Pablo del Arco

Cloud-Edge Ecosystem Engineer at OpenNebula Systems

Product



Apr 29, 2025

Artificial Intelligence | Automation | Management

The demand for enterprise-grade platforms to deploy and scale AI models is growing rapidly, especially with the emergence of AI Factories, dedicated environments where multiple teams run AI workloads independently on shared infrastructure. OpenNebula is meeting this challenge with incremental releases of its AI-as-a-Service (AIaaS) stack, now available in the OpenNebula Marketplace, a powerful multi-tenant back-end architecture to ensure complete workload isolation, secure resource sharing, and major enhancements in hardware integration, including support for GPU, ARM virtualization, and integration with NVIDIA BlueField-3 DPU, InfiniBand, and 5G, all designed to ensure optimal performance for demanding AI workloads.

The Ray Appliance for AI Factories

At the heart of this solution is the Ray Appliance, a ready-to-use software appliance directly available at the OpenNebula Marketplace. It automatically manages the deployment, model optimization and creation of endpoints for LLM inferencing by using vLLM and Hugging Face, while leveraging distributed computing to run larger models across multiple GPUs. Key features include:

Flexible inference engines. Choose between vLLM and Hugging Face frameworks to deploy your model. Select between both options based on your specific needs, whether you’re prioritizing performance or compatibility.
Hugging Face integration. Get started quickly with popular, ready-to-use models like LLaMA, Qwen, and EuroLLM, all pre-integrated into the platform and accessible for deployment thanks to our integration with the Hugging Face repository using your own access tokens.
OpenAI API compatibility. If you’re already using OpenAI models in your products, you can seamlessly integrate our deployed models by making minimal code changes with our AI-as-a-Service appliance.
Quantization for larger models. Use quantization in 4 and 8 bits in order to load larger models into your infrastructure. Enjoy the capacity and power of large models on small GPUs.
Advanced GPU support. Enjoy full support for vGPUs, GPU passthrough, and multi-GPU configurations, enabling fine-grained resource allocation per tenant or workload.
Infrastructure-agnostic deployment. Run your AI workloads wherever you need them—on-premises, in the public cloud, or at the edge, with a platform that adapts to your infrastructure strategy.

Whether you’re running LLMs like EuroLLM, LLaMA, or Qwen, the platform delivers a plug-and-play experience for high-performance, fully local AI deployment.

Watch the Full Demo: Deploying EuroLLM with GPU Acceleration

We’ve produced a detailed screencast that walks you through the entire process, from launching the Ray Appliance and configuring resources to selecting your model and deploying it on a GPU-enabled host, all the way to interacting with a fully local LLM instance through a sleek chatbot interface.

This video is the perfect starting point for anyone interested in setting up their own AI Factory with OpenNebula—no external APIs, no vendor lock-in, and full on-prem performance.

What’s Next?

This is just the beginning. We’re actively extending the AIaaS stack with:

Training and fine-tuning workflows built directly into the Ray Appliance
Integration with NVIDIA Dynamo for ultra-low-latency inference at scale
Retrieval-Augmented Generation (RAG) support for more accurate, context-aware responses

OpenNebula Systems remains committed to building open, flexible, and production-ready AI infrastructure, giving you the tools to transform your cloud into a fully operational AI Factory.

To learn more, click here.

Funded by the Spanish Ministry for Digital Transformation and Civil Service through the ONEnextgen Project (UNICO IPCEI-2023-003), and co-funded by the European Union’s NextGenerationEU through the RRF.

0 Comments

Submit a Comment Cancel reply

← Previous Post Next Post →

Blog Article:

Building Sovereign AI Factories: Multi-Tenant AI-as-a-Service with OpenNebula

Pablo del Arco

Product

Product

Apr 29, 2025

The Ray Appliance for AI Factories

Watch the Full Demo: Deploying EuroLLM with GPU Acceleration

What’s Next?

0 Comments

Submit a Comment Cancel reply

Related Articles

Beyond Kubernetes: What You Really Need for Multi-Tenant Cloud at Scale

Re:Virtualize with OpenNebula 7 – A Smarter Path Beyond VMware

Optimizing Infrastructure at Scale with AI-Powered OneDRS

Get timely updates on new releases, events, webinars, cool hacks, and much more!

The Open Source Cloud & Edge Computing Platform.

Company

Documentation

Community