AI infrastructure is evolving quickly, and so are the ways we present new capabilities in OpenNebula.
Alongside our existing Product Features screencasts, we’re introducing a new format: Feature Preview. These short videos focus on upcoming functionality, highlighting the problems being addressed and the direction of the platform as it continues to expand.
The first Feature Preview focuses on OneLLM, a new capability currently in development designed to bring managed AI inference directly into Sunstone, OpenNebula’s graphical user interface.
For many organizations, running AI models on their own infrastructure still requires stitching together multiple tools for infrastructure management, model distribution, and inference serving. OneLLM aims to simplify this experience by unifying these workflows inside a single platform, enabling administrators to govern AI resources while giving tenants a straightforward self-service experience.
The Challenge of Running AI Models in Private Infrastructure
Deploying AI workloads in private infrastructure typically involves managing GPU servers, handling model weights, and operating inference services as separate layers.
There is often no unified way to define hardware profiles, curate model catalogs, and manage deployed endpoints in one place. As a result, teams frequently rely on external tooling outside the cloud management platform, which fragments operations and increases operational overhead.
Bringing AI Inference into OpenNebula
OneLLM brings AI inference into OpenNebula as a native capability inside Sunstone. It introduces a unified workflow where administrators define the available compute resources for AI workloads, curate the models that can be deployed, and enable tenants to consume those models through a self-service interface.
Administrators start by creating reusable instance types that define the hardware profiles available for AI workloads. These include GPU type, VRAM capacity, compute tier, and supported model sizes, ensuring consistent and reusable configurations across the infrastructure.
OneLLM also introduces a centralized model catalog where AI models are downloaded, versioned, and access-controlled, ensuring that only approved models are exposed to tenants within the environment.
From the tenant side, the experience is reduced to selecting a model, choosing an instance type, and deploying an inference endpoint directly from Sunstone. OpenNebula handles the underlying provisioning, model loading, and inference setup.
Once deployed, the endpoint is exposed through an OpenAI-compatible API, allowing existing applications to connect without any code changes.
The Feature Preview: OneLLM in Action
In this screencast, we walk through the experience from both administrator and tenant perspectives.
From the administrator side, we define instance types and manage the model catalog. From the tenant side, we deploy a model, launch an inference endpoint, and interact with it directly through the platform.
The demo highlights how OneLLM reduces operational complexity and brings AI inference into the same control plane used to manage infrastructure.

Toward a Unified AI Platform
OneLLM is a step toward a more unified AI infrastructure experience in OpenNebula, where infrastructure management and AI service consumption come together in a single platform.
By reducing fragmentation between infrastructure, models, and inference services, OneLLM will help organizations move from raw GPU resources to usable AI services more directly.
This is the first entry in our Feature Preview series, offering an early look at how OpenNebula is progressing. Stay tuned for upcoming videos.
Want more? Head over to our screencasts hub and see it all in action.



0 Comments