Blog Article:

Revolutionizing Telco Cloud with AI and Automation: Insights from Juan Carlos Garcia

In this exclusive interview, Juan Carlos Garcia, Principal Technology Advisor at OpenNebula Systems, shares his extensive expertise in the telco industry and discusses how AI and automation are driving the transformation of cloud-edge infrastructures, paving the way for advancements in B5G and 6G networks.

Juan Carlos – could you tell us a bit about yourself and your role at OpenNebula Systems?

Sure! I’m a telecom engineer with 35 years of experience in the telco industry and a background in computer science. I started working in 1990 at Telefónica coding software for a packet switch while also teaching computing science at Carlos III University at night. After a few years focusing on infrastructure deployment and service delivery, it’s been great to see telecom operators re-engage with technology development, acquiring new skills in software, data and AI, participating in different communities and ecosystems, and collaborating with technology providers like OpenNebula Systems.

I joined OpenNebula Systems because of its open source approach—something refreshingly different from what I experienced in telecom—and its mission to create a multiprovider edge cloud continuum for hosting complex applications like telecom services or AI. That has been one of the subjects I have been working on in the last years at Telefónica, and my role in OpenNebula Systems, as Principal Technology Advisor, is contributing my experience to drive its technology and business development towards something useful for the telecom industry.

What role do you think AI plays in telco cloud transformation? Any specific use cases?

AI can be a game-changer for the telecom industry, even though it is not something entirely new to the operators. Early AI applications date back to the beginning of the 2000s, with automation of event and alarm monitoring, correlation and fault detection, providing first insights and recommendations for troubleshooting. Today, advances in technology, and especially data availability and accuracy, have significantly improved its applicability and potential benefits.

AI can optimize complex processes like planning, engineering, and operation and maintenance of the telco cloud infrastructure. It can also help to make the best use of scarce and/or expensive resources like typically spectrum, transport link bandwidth, edge compute capacity, or energy. Generative AI adds another layer of value by enabling technicians to access company knowledge—technical documentation, reports and databases—through natural language. This empowers better informed and advised decision-making.

From an operational perspective, AI can greatly enhance telco cloud management by facilitating orchestration, automation, and monetization opportunities. It helps optimize data processing workflows, reduce latency, and improve response times. It also drives more efficient resource management in edge and cloud data centers, cutting costs and energy consumption. Moreover, the infrastructure and tools required for these purposes aren’t just mechanisms for boosting efficiency within telecom operators—they can also be a source for monetization, offering value to their customers.

Presenting to the European Commission the new Telco Cloud Thematic Roadmap produced by the *European Alliance for Industrial Data, Edge and Cloud* (June 2024).

Could you please elaborate on the role of AI in addressing the challenges of automation, optimization, management, and orchestration within the cloud-edge continuum? How relevant are AI and automation for B5G/6G networks?

When it comes to the cloud-edge continuum, AI plays a pivotal role in several areas.

AI needs but also enables advanced data collection techniques, providing more accurate and comprehensive datasets. AI-driven analytics help uncover hidden patterns and insights, facilitating better decision-making and understanding of the dynamic behaviour of the telco cloud and the applications running on it. This is the basis for the other use cases.

AI models identify potential failures or deviations earlier, enabling a faster and proactive reaction (AI-driven anomaly detection), as well as predicting equipment failures and maintenance needs before they impact operations (AI-driven metrics prediction). This reduces downtime (shorter MTTR and improved customer experience) and operational costs (requiring fewer engineers and less customer service staff). For example, AI could be used for predicting CPU, GPU, storage or network traffic, such as individual VM CPU per hour, general CPU/GPU usage, etc.

AI optimizes resource allocation, improving efficiency and productivity (smart scheduling). This is especially important for scarce and expensive resources like edge computing capacity, GPUs, energy, transport capacity, etc. This allows to balance workloads across available infrastructure and assets to try and deliver the highest quality of most cost-effective service. In the future, load balancing will be essential in the massive edge cloud continuum to ensure a good telco cloud service. For example, AI can help with VM allocation suggestions, such as load balancing, managing resource contention, and reducing migrations. AI optimizes processes like:

Infrastructure planning and upgrades: predicting future telco cloud demand and anticipating where future infrastructure capacity will be needed.
Self-optimizing telco cloud resource allocation to address certain policies or objectives like reducing SLA breach risks or prioritizing resources for higher-cost breaches.
Planning firmware and software updates for times that minimize disruptions in customers and dependent infrastructure.
Detecting demand peaks and dynamically scaling up or reassigning resources and capacity.

All these AI-based automation and orchestration mechanisms help optimize both the infrastructure (Telco Cloud and Edge) and the application layers (B5G/6G networks and services deployed on top of it).

How is OpenNebula handling these two crucial topics—AI and automation—that are so important for B5G networks?

OpenNebula is incorporating AI/ML techniques and Zero-Touch resource management for dynamic deployment and operation of 5G Advanced nodes across distributed edge infrastructure environments. This approach aims to optimize aspects like energy efficiency, resource utilization, or latency.

AI is applied to the orchestration of application workflows, like those required in B5G networks. This orchestration enables scalable monitoring, intelligent prediction of metrics, and scheduling algorithms for application placement and automatic scaling.

The AI-Driven Orchestrator implements a closed-loop agent based on three components: monitoring and prediction, planners, and the executor.

The Monitoring and Prediction component collects and stores metrics related to applications and Edge Cluster resources using Monitoring Agents in both application (VNFs) and infrastructure (servers, VMs) layers. It manages ML models to predict demand and traffic, providing an alerting system that can trigger actions such as scaling based on thresholds for metrics values, current or predicted.

As planners, the system considers the Scheduler and the Scaler.

The Scheduler component deploys 5G applications on the Edge Clusters across the cloud-edge continuum, ensuring that the requirements regarding SFC latency and bandwidth are satisfied, while taking into account the capacity of the physical infrastructure, administrative constraints, and optimization criteria such as energy consumption. Using ML models, the Scheduler determines the optimal number and size of the VNFs composing the SFC workflow. It produces a plan for the optimal placement of the VNFs on the most suitable Edge Clusters.
The Scaler component dynamically autoscales 5G applications while they are running to maintain application performance and service quality at predefined levels. It helps avoid under- and over-provision. The scaling decisions are based on predictions rather than just the existing state, allowing for the anticipation of potential performance and quality issues and enabling a timely response. When an alert is triggered by an exceeded threshold, the Scaler receives a scaling request for the corresponding SFC workflow and produces a plan to scale the necessary VNFs up or down.

The Plan Executor is in charge of converting the plans produced by the Scheduler and the Scaler into a sequence of actions, such as creating a VM on a new host, setting the cardinality for an application role, etc.

All this functionality has been developed and tested through EU-funded innovation projects like OneEdge5G, 6G-ENABLERS, and COGNIT, positioning OpenNebula as one of the most advanced virtualization management platforms in the market.

This initiative is funded by the Spanish Ministry for Digital Transformation and Civil Service through the project titled “Zero-Contact Technologies for Adaptation in Intelligent and Distributed 5G Networks – 6GENABLERS: AI for 6G, Ultra Automation and Optimization (TSI-063000-2021-10)” and co-financed by the European Union’s NextGenerationEU program via the Recovery and Resilience Facility (RRF).