Blog Article:

OneLLM: Bringing Managed AI Inference into OpenNebula

AI infrastructure is evolving quickly, and so are the ways we present new capabilities in OpenNebula.

Alongside our existing Product Features screencasts, we’re introducing a new format: Feature Preview. These short videos focus on upcoming functionality, highlighting the problems being addressed and the direction of the platform as it continues to expand. 

The first Feature Preview focuses on OneLLM, a new capability currently in development designed to bring managed AI inference directly into Sunstone, OpenNebula’s graphical user interface.

For many organizations, running AI models on their own infrastructure still requires stitching together multiple tools for infrastructure management, model distribution, and inference serving. OneLLM aims to simplify this experience by unifying these workflows inside a single platform, enabling administrators to govern AI resources while giving tenants a straightforward self-service experience.

The Challenge of Running AI Models in Private Infrastructure

Deploying AI workloads in private infrastructure typically involves managing GPU servers, handling model weights, and operating inference services as separate layers.

There is often no unified way to define hardware profiles, curate model catalogs, and manage deployed endpoints in one place. As a result, teams frequently rely on external tooling outside the cloud management platform, which fragments operations and increases operational overhead.

Bringing AI Inference into OpenNebula

OneLLM brings AI inference into OpenNebula as a native capability inside Sunstone. It introduces a unified workflow where administrators define the available compute resources for AI workloads, curate the models that can be deployed, and enable tenants to consume those models through a self-service interface.

Administrators start by creating reusable instance types that define the hardware profiles available for AI workloads. These include GPU type, VRAM capacity, compute tier, and supported model sizes, ensuring consistent and reusable configurations across the infrastructure.

OneLLM also introduces a centralized model catalog where AI models are downloaded, versioned, and access-controlled, ensuring that only approved models are exposed to tenants within the environment.

From the tenant side, the experience is reduced to selecting a model, choosing an instance type, and deploying an inference endpoint directly from Sunstone. OpenNebula handles the underlying provisioning, model loading, and inference setup.

Once deployed, the endpoint is exposed through an OpenAI-compatible API, allowing existing applications to connect without any code changes.

The Feature Preview: OneLLM in Action

In this screencast, we walk through the experience from both administrator and tenant perspectives. 

From the administrator side, we define instance types and manage the model catalog. From the tenant side, we deploy a model, launch an inference endpoint, and interact with it directly through the platform.

The demo highlights how OneLLM reduces operational complexity and brings AI inference into the same control plane used to manage infrastructure.

OneLLM

Toward a Unified AI Platform

OneLLM is a step toward a more unified AI infrastructure experience in OpenNebula, where infrastructure management and AI service consumption come together in a single platform.

By reducing fragmentation between infrastructure, models, and inference services, OneLLM will help organizations move from raw GPU resources to usable AI services more directly. 

This is the first entry in our Feature Preview series, offering an early look at how OpenNebula is progressing. Stay tuned for upcoming videos.

Want more? Head over to our screencasts hub and see it all in action.

Marco Mancini

Principal Technologist AI & Data Operations at OpenNebula Systems

May 29, 2026

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *