Blog Article:

AI Inference at the Edge with Ampere ARM64 and OpenNebula

Alexandre Guillemin

Partner Management Specialist at OpenNebula Systems

Product



May 28, 2025

Artificial Intelligence | Automation | IPCEI-CIS | Management

As AI workloads increasingly move toward the edge, powered by the demand for low latency, cost efficiency, and data sovereignty, having a flexible and efficient infrastructure stack becomes essential. In this context, deploying AI inference directly on energy-efficient edge nodes—like Ampere ARM64 servers—presents a compelling solution for small models.

OpenNebula now offers native ARM64 support, officially certified on Ampere processors, making it easier than ever to run AI applications outside centralized public clouds. OpenNebula enables deployment on a compact, 2-node ARM64-based edge cluster, supporting AI inference workloads with Hugging Face models and the Ray framework.

When AI Inference with ARM?

For optimized inference workloads, ARM provides a powerful alternative to GPUs—especially in scenarios where low latency, energy efficiency, and a compact deployment footprint are key requirements.

Power Efficiency and Thermals: ARM64 processors like Ampere Altra offer high performance per watt, making them ideal for power-constrained environments such as edge data centers, telecom sites, or remote industrial locations.
Cost-Effectiveness: running inference workloads on ARM CPUs can be significantly more cost-efficient than using GPUs, especially for lightweight or medium-complexity models, where GPU acceleration might be overkill.
Scalability and Density: ARM-based servers often provide higher core counts with lower power consumption, allowing more inference workloads to run in parallel per rack—supporting scalable edge deployments.
Deployment Simplicity and Portability: ARM-based inference appliances can be fully containerized or virtualized, making them easier to deploy, manage, and replicate across distributed edge sites without requiring specialized infrastructure or drivers.
Sovereignty and Supply Chain Resilience: ARM64 platforms offer vendor diversity and hardware independence, aligning with sovereign computing goals in Europe and elsewhere—reducing dependency on closed GPU ecosystems.

Edge-Ready Cloud with OneDeploy and Ampere ARM64

OpenNebula can be seamlessly deployed on Ampere-based ARM64 edge nodes using OneDeploy, an automation tool powered by Ansible. This approach treats the entire infrastructure as code, allowing organizations to configure, replicate, and scale edge environments with minimal effort.

From setting up the OpenNebula cluster to configuring networking and storage, everything is managed up front—ensuring a fast, repeatable deployment process suitable for production-grade edge AI operations.

ARM-Compatible Apps from the OpenNebula Marketplace

Once the cluster is live, the next step is application provisioning. OpenNebula’s Marketplace offers a catalog of pre-built appliances, including ARM-compatible images.

The Ray appliance—designed specifically for ARM64—can be downloaded and launched directly within the environment. This appliance comes preconfigured to run AI inference using a model from Hugging Face, demonstrating how easy it is to deploy powerful AI workloads at the edge.

Real-Time AI Inference Using Ray and Hugging Face

After deploying the appliance, the virtual machine is instantiated with fine-tuned parameters. This includes setting the model, temperature levels, model prompts, and attaching the necessary networking interface for the VM.

Once deployed, the Ray-based VM is ready to serve inference requests. The included chatbot interface enables real-time interactions with the Qwen 2.5 model, such as answering general knowledge questions and performing basic math. These are lightweight yet meaningful tasks that demonstrate how compact AI models can run efficiently on ARM-based edge hardware.

Watch the Demo: Deploying AI at the Edge with Ampere and OpenNebula

The demo guides you through the entire process, including

Setting up an ARM64 edge cluster using OneDeploy
Downloading the Ray AI inference appliance from the Marketplace
Configuring and deploying an inference VM
Running real-time queries via chatbot
Observing the entire deployment process via CLI tools

Looking Ahead: AI at the Edge Made Easy

This example highlights the growing relevance of cloud repatriation and sovereign edge AI—bringing workloads closer to users, reducing costs, and ensuring full control. With native support for ARM64 and integration with tools like OneDeploy and Ray, OpenNebula empowers organizations to embrace AI at the edge—efficiently, securely, and with complete sovereignty.

Stay tuned for more demos showcasing how OpenNebula enables distributed, real-time AI for next-gen infrastructure needs.

To learn more, click here.

We’d like to thank Ampere for providing the ARM-based servers used in this demo, which are also part of the ongoing certification process for OpenNebula.

Funded by the Spanish Ministry for Digital Transformation and Civil Service through the ONEnextgen Project (UNICO IPCEI-2023-003), and co-funded by the European Union’s NextGenerationEU through the RRF.

0 Comments

Submit a Comment Cancel reply

← Previous Post Next Post →

Blog Article:

AI Inference at the Edge with Ampere ARM64 and OpenNebula

Alexandre Guillemin

Product

Product

May 28, 2025

When AI Inference with ARM?

Edge-Ready Cloud with OneDeploy and Ampere ARM64

ARM-Compatible Apps from the OpenNebula Marketplace

Real-Time AI Inference Using Ray and Hugging Face

Watch the Demo: Deploying AI at the Edge with Ampere and OpenNebula

Looking Ahead: AI at the Edge Made Easy

0 Comments

Submit a Comment Cancel reply

Related Articles

Beyond Kubernetes: What You Really Need for Multi-Tenant Cloud at Scale

Re:Virtualize with OpenNebula 7 – A Smarter Path Beyond VMware

Optimizing Infrastructure at Scale with AI-Powered OneDRS

Get timely updates on new releases, events, webinars, cool hacks, and much more!

The Open Source Cloud & Edge Computing Platform.

Company

Documentation

Community