Increasing dependencies spanning enterprise IT and network domains will call for closer alignment of Service Assurance processes as well. Unlike the relatively monolithic environments for which legacy Service Assurance platforms and operations were designed, modern telecommunications businesses run on a complex and dynamic ecosystem encompassing multiple in-house, supplier, and partner solutions and technologies. The increasing volumes associated with 5G will add to the stress, further invalidating traditional operations.
With more data and more devices to accommodate, manual or strictly deterministic approaches to Service Assurance will not scale effectively. The evolution of Service Assurance should be seen in the context of broader trends in telecommunications technology. In a rapidly transforming industry, operators need to be able to make an agile response to changing customer needs and emerging business opportunities. This has driven a trend toward more proactive, predictive, and ultimately autonomous operations, where decision-making can be entirely automated and human interaction for mundane tasks can be eliminated.
Service Assurance is an important element of this strategy, as operators seek to support greater automation across the lifecycle, including the creation, assessment, assignment, notification, and remediation of trouble tickets with minimal human involvement. As operators seek to accelerate operations to the velocity and scale needed to deliver modern 5G services, artificial intelligence and machine learning will play a key role in enabling this higher level of automation for use cases including:.
Beyond automation, meeting customer objectives will require operators to increase their focus on interoperability. By promoting easier integration and operation across diverse and complex ecosystems, operators will be able to mature beyond KPI-driven platform management to a more fluid and unified approach to management across services. Delivering the right data to the right people and processes at the right time will be important as well, providing enriched resource and service context to enable greater levels of insight, actionability, and automation.
Now, as autonomous technologies increase the speed of network operations across software-driven environments, operators need to make sure that the service management layer becomes faster and more automated as well. In my earlier blogs, I discussed evolving approaches to Service Assurance and the growing role of artificial intelligence AI , analytics, and automation in technology operations.
The drive toward autonomous networking is advancing at full speed, as initiatives such as the Open Network Automation Platform ONAP seek to enable real-time, policy-driven orchestration and automation of physical and virtual network functions. While enabling operators to respond more quickly to customer requests, and to optimize operations across their increasingly software-driven network environment, this increase in automation will also place new importance on service assurance.
Network automation can be thought of in terms of in-band and out-of-band use cases. For in-band use cases including many performance or capacity issues , things follow a relatively predictable set of patterns, enabling end-to-end automation. For out-of-band use cases, physical infrastructure failures or rare alarm conditions however, it can be either less clear what should happen next or require boot-on-the-ground interaction.
As exceptions arise, an operator may need to get involved to make decisions. At that point, we need to ensure a level of governance across operational processes such as change—though without reverting to a fully manual approach. AIOps offers a solution. AI, big data analytics, and machine learning make it possible to augment, guide, and increasingly replace human decision processes so that operators can ensure service quality more efficiently at scale to meet customer expectations.
Service Assurance offers a variety of suitable use cases for AIOps, with reasonable large data sets to which AI and machine learning can be applied to cluster related faults, identify underlying network problems, prioritize resolution, and so on. High-value use cases include the following. Once a potential problem has been identified, machine learning enables fault clustering of related problems with the same root cause to speed troubleshooting and prioritize resolution.
In some cases, it may even be possible to fix the automatically without human intervention. One involves ticket hierarchy—the relationships among tickets at different layers of the network.
This will generate a ticket at the resource layer. At the same time, the interruption of services provided through that cable, such as customer broadband, will generate a ticket at the service layer. Based on the ticket relationship hierarchy, these two tickets can be correlated and understood as stemming from the same issue. Given this understanding, the NOC agent can gain insight into both the customer and service impact of the fiber cable cut.
The correlation also helps with common cause analysis for these tickets, and helps the agent prioritize which ticket to work on first, i.
This can occur in tandem with temporary remediation work on the service layer trouble ticket, such as rerouting customer broadband traffic while the fiber is being repaired. This more complete understanding can also help NOC agents maintain quality levels for customers. For instance, the internal operational level agreement OLA to fix the fiber may be five days, while the child trouble ticket for the broadband at the service level may have a more aggressive service level agreement SLA of three hours, reflecting its direct impact on customers.
Seeing this relationship, the NOC agent can make an informed decision whether to prioritize the fiber fix at a higher level, perhaps based on the effectiveness of the remediation activities in place for the child ticket. In addition to correlation based on ticket hierarchy, tickets generated around the same time, event, or location can be clustered to make it easier to see potential common causes.
0コメント