As AI workloads increasingly move toward the edge, powered by the demand for low latency, cost efficiency, and data sovereignty, having a flexible and efficient infrastructure stack becomes essential. In this context, deploying AI inference directly on energy-efficient edge nodes—like Ampere ARM64 servers—presents a compelling solution for small models.
OpenNebula now offers native ARM64 support, officially certified on Ampere processors, making it easier than ever to run AI applications outside centralized public clouds. OpenNebula enables deployment on a compact, 2-node ARM64-based edge cluster, supporting AI inference workloads with Hugging Face models and the Ray framework.
When AI Inference with ARM?
For optimized inference workloads, ARM provides a powerful alternative to GPUs—especially in scenarios where low latency, energy efficiency, and a compact deployment footprint are key requirements.
- Power Efficiency and Thermals: ARM64 processors like Ampere Altra offer high performance per watt, making them ideal for power-constrained environments such as edge data centers, telecom sites, or remote industrial locations.
- Cost-Effectiveness: running inference workloads on ARM CPUs can be significantly more cost-efficient than using GPUs, especially for lightweight or medium-complexity models, where GPU acceleration might be overkill.
- Scalability and Density: ARM-based servers often provide higher core counts with lower power consumption, allowing more inference workloads to run in parallel per rack—supporting scalable edge deployments.
- Deployment Simplicity and Portability: ARM-based inference appliances can be fully containerized or virtualized, making them easier to deploy, manage, and replicate across distributed edge sites without requiring specialized infrastructure or drivers.
- Sovereignty and Supply Chain Resilience: ARM64 platforms offer vendor diversity and hardware independence, aligning with sovereign computing goals in Europe and elsewhere—reducing dependency on closed GPU ecosystems.
Edge-Ready Cloud with OneDeploy and Ampere ARM64
OpenNebula can be seamlessly deployed on Ampere-based ARM64 edge nodes using OneDeploy, an automation tool powered by Ansible. This approach treats the entire infrastructure as code, allowing organizations to configure, replicate, and scale edge environments with minimal effort.
From setting up the OpenNebula cluster to configuring networking and storage, everything is managed up front—ensuring a fast, repeatable deployment process suitable for production-grade edge AI operations.
ARM-Compatible Apps from the OpenNebula Marketplace
Once the cluster is live, the next step is application provisioning. OpenNebula’s Marketplace offers a catalog of pre-built appliances, including ARM-compatible images.
The Ray appliance—designed specifically for ARM64—can be downloaded and launched directly within the environment. This appliance comes preconfigured to run AI inference using a model from Hugging Face, demonstrating how easy it is to deploy powerful AI workloads at the edge.
Real-Time AI Inference Using Ray and Hugging Face
After deploying the appliance, the virtual machine is instantiated with fine-tuned parameters. This includes setting the model, temperature levels, model prompts, and attaching the necessary networking interface for the VM.
Once deployed, the Ray-based VM is ready to serve inference requests. The included chatbot interface enables real-time interactions with the Qwen 2.5 model, such as answering general knowledge questions and performing basic math. These are lightweight yet meaningful tasks that demonstrate how compact AI models can run efficiently on ARM-based edge hardware.
Watch the Demo: Deploying AI at the Edge with Ampere and OpenNebula
The demo guides you through the entire process, including
- Setting up an ARM64 edge cluster using OneDeploy
- Downloading the Ray AI inference appliance from the Marketplace
- Configuring and deploying an inference VM
- Running real-time queries via chatbot
- Observing the entire deployment process via CLI tools
Looking Ahead: AI at the Edge Made Easy
This example highlights the growing relevance of cloud repatriation and sovereign edge AI—bringing workloads closer to users, reducing costs, and ensuring full control. With native support for ARM64 and integration with tools like OneDeploy and Ray, OpenNebula empowers organizations to embrace AI at the edge—efficiently, securely, and with complete sovereignty.
Stay tuned for more demos showcasing how OpenNebula enables distributed, real-time AI for next-gen infrastructure needs.
We’d like to thank Ampere for providing the ARM-based servers used in this demo, which are also part of the ongoing certification process for OpenNebula. |
0 Comments