Blog Article:

AI Factories: Automated Deployment with OneDeploy

As companies increasingly turn to AI for competitive advantage, having a cloud infrastructure that can reliably support GPU-intensive workloads becomes critical. AI Factories address this need by providing a dedicated, scalable foundation for training, inference, and data-intensive AI pipelines.

The AI Factory Deployment Blueprints are practical guides for building a multi-tenant OpenNebula cloud optimized for AI workloads, covering both training and inference use cases. They offer a clear and reproducible path to stand up this infrastructure quickly, either on-premises or in the cloud. They combine infrastructure and architecture recommendations with automated deployment and validation workflows, helping ensure that the environment delivers the expected performance and behaves correctly once deployed. These blueprints are intentionally simplified deployments designed to run on mainstream GPU platforms, making them ideal for demonstrations, proof-of-concepts, and evaluation purposes.

To keep the deployments lightweight and easy to adopt, the blueprints don’t include integrations with advanced platforms such as GB200/GB300, NVLink, or Spectrum-X. Instead, they focus on enabling organizations to rapidly validate AI Factory concepts, assess performance, and understand operational requirements before moving to more complex, large-scale production architectures.

This first blog post presents the initial blueprint, designed to automate the deployment of a simplified AI Factory.

Deployment Automation: On-Premises or Cloud Bare Metal

The blueprints use OneDeploy to automate installation and configuration in a consistent, repeatable way. By minimizing manual steps, OneDeploy reduces configuration drift and simplifies ongoing operations.

The cloud topology—frontend services, compute nodes, and GPU-enabled hosts—is defined in an inventory file. This includes GPU passthrough assignment, networking and storage settings, and virtual network configuration. With this approach, deploying an AI-ready cloud becomes straightforward and reproducible, whether for a small proof of concept or a larger production setup.

Multiple deployment models are supported. Organizations can deploy on-premises using standard hypervisor hosts, or use public cloud bare-metal servers with GPUs when local hardware isn’t available. As an example, the blueprint documents the full deployment process on Scaleway Elastic Metal.

This flexibility allows teams to adopt AI-ready infrastructure without committing to a single operating model, making it well suited for private, hybrid, or edge environments.After deployment, the environment can be validated to confirm it’s ready for AI workloads. Validation options include running LLM inference benchmarks with vLLM, or deploying a GPU-enabled Kubernetes cluster and verifying it with representative workloads. For Kubernetes-based AI use cases, the blueprint also supports integration with GPU orchestration and scheduling frameworks such as NVIDIA Dynamo or NVIDIA KAI Scheduler, enabling flexible, multi-tenant AI workloads on top of the AI-ready cloud.

On-premises AI Factory Deployment

On-cloud Deployment on Scaleway

Stay tuned for the next post in the series.