How telecoms service providers set out on a journey to specify the next set of network technologies but ended up completely re-thinking the way networks should be built and developed.
Back in 2012, the circulation of a white paper advocating a virtualization framework for network operators and authored by a tight group of top telco executives caused an instant ripple of excitement. Something like this was by then overdue. For several years people in telecoms had been talking about the wonders of SDN and the potential for virtualization. And they'd been watching as the likes of Google and Amazon Web Services specified hundreds (nay thousands) of white box commodity servers to run at so-called 'Web scale' with huge attendant cost-savings for their strategic customer-facing applications.
The software-driven, white box approach was clearly working well for them, but the problem was that it was difficult to see how it might be grafted over to telecoms. Wouldn't telcos be in danger of throwing the service quality baby out with the high cost-per-bit bathwater?
Most of all, Google, AWS & Co were running relatively closed systems. Easy for them to come up with a new approach, test it, specify the components, roll out a secret trial and then go for it by packing a huge data centre with white box servers.
The Telecom industry was different. You were dealing with end-to-end services. You had to interwork whatever you came up with with other telcos' networks and vendors' end-devices. You had to get broad agreement and then standards fully agreed before you could move forward. It was difficult and it always took time. This was the way it was. Nobody sold telcos non-standard gear - they set about getting it standardized first.
Nevertheless the existential threat was clear. Unless the telecoms industry could get itself onto the same 'Moore's curve' as Google, so that infrastructure commoditization and software-driven automation could drag the cost per network bit down far enough to at least stay in touch with bandwidth prices, then the future looked bleak. Eventually, the Webscalers would take the lot.
Urgent change required
So the objective of the 'network functions virtualisation' was, as laid-out in the 'white paper': "To significantly reduce the capital and operational costs associated with deploying communications services." In addition, the founders threw in agility. Once everything was being driven by software, it was reasoned, then operators could be much more nimble when it came to developing services. They would be able to differentiate their offerings, appeal to new constituencies and claw back ground they felt they had lost to so-called 'over the top' players. Costs, both operational and capital, plus agility (the ability to change tack and launch new services) were to be the three legged stool upon which the proposition was to turn.
And turn it did! Any new technology development, certainly one for which the term 'transformational' is not an exaggeration, will almost immediately gather a host of Yaysayers and, as its implications become clearer, Naysayers. One of the prime NFV movers, Don Clarke, explained to TelecomTV why the group decided to underhype the implications of its establishment. Don knew that many in the industry would be keen to climb on-board. But he also knew that this would lead to over-expectation which would likely come back to bite the group at a later stage.
Yaysayers and Naysayers
It's worth remembering at this point, just now much attention the NFV idea attracted almost immediately. According to Don Clark, ETSI had originally 'dimensioned' the ISG (industry specification group formed to orchestrate the agenda outlined in the white paper) to be 20 or so companies. Instead the group grew from seven founding operators to 37 operators, 245 member companies and 1245 delegates subscribed to the mailing list, all in less than a year (there are now more than 290 companies). The predominance of vendors has led to a mistaken view, held in some quarters, that NFV is a vendor club of the old school - coming up with new classes of product to be championed and sold to telcos. In fact the reverse is truer. Telcos very much set and have maintained the pace, but what has emerged isn't one side leading the other, but a new set of relationships built around joint endeavour.
This is one important aspect of what NFV has become - a different open source way of advancing and refining the underlying technology where the old demarcation lines between what's understood to be the roles of vendors and operators have been, if not obliterated, certainly redefined.
To get NFV to work properly it pretty soon became apparent that there had to be a 'decomposition' of network functions. This is where you take what might currently be a large chunk of code, lovingly stitched together and running something important (like the IP Multimedia Subsystem - IMS) and break it up into smaller modules, each with its own API and each acting as a small but perfectly formed program in its own right. Then you can put a string of several of these together to form your network services. There are several reasons why this 'decomposing', 'modularising' or going 'cloud native' as this is also called, is a very good idea.
Reuse of code is just a good idea in its own right. Instead of starting from scratch for different applications, you get to reuse modules that you've already tested and installed. Re-use also grants extra agility since it becomes feasible to easily re-use existing modules to make up new services (with just a module change here and there). The promise is that new service introductions which might before have taken months in 'waterfall development' time (writing, testing, validating etc) could theoretically be implemented in weeks or even days.
Protecting legacy infrastructure
But decomposition hasn't proved to be an unalloyed good for all the players. Vendors and many service providers have always pointed out that whatever happens with NFV, neither side can simply write off the huge investment already made in legacy infrastructure.
So an essential part of the 'art' of NFV integration must be about getting old and new infrastructure interconnected and working side by side. But that is far easier said than done. To engineer the desired resiliency in transforming a software base into VNFs might mean reworking the legacy software - a time-consuming and expensive task.
Companies may have spent many years, perhaps more than a decade, writing their particular pieces of software and embedding all the features that their customers wanted. To start from scratch and write it all in small modules, would not only be hugely costly, but might take years to accomplish. Years that may not be available.
So an initial NFV approach was to copy what had happened in the IT environment by using a 'hypervisor' approach - essentially taking the shortcut of virtualizing the code's original server and running the original application within that.
This approach might have worked in the early days of IT virtualisation when people were consolidating all those old servers and applications. But today it can only be bought at the cost of low speed and an inability to cope gracefully with infrastructure failures, NFV advocates claim.
View full video: Patrick Lopez, VP Networks Innovation, Telefonica
As Patrick (above) points out, once it became apparent in the industry that NFV was the destination, network equipment vendors had two pressing objectives. They needed to protect their existing investments while at the same time moving on to support SDN and NFV as much as they could. The idea of starting from scratch and rewriting all their code in modules didn't appeal greatly - after all, they were also promised in the white paper that part of NFV's 'win win' was that existing vendors would be able to keep the code that they'd spent years developing.
But as time and experimentation ground on, it became apparent that monolithic code and decomposed modules just didn't mix properly.
On their NFV journey telcos wanted, most of all, to avoid the vendor lock-in which had bedeviled them for several decades. So it was made clear to the vendors that they had to work together and 'play nice' with each other. Interoperability would be the major marker of success for them.
So, where are we now?
There has been much working together, recently through open source groups and, by all reports, huge progress is being made. But the actual degree of transformation has broadened and the number of open source groups and the amount of collaboration that they make necessary, has grown as well.
It's all taking time and that's led to an upsurge in Naysaying. Early optimistic estimates of how long it might take before we saw the first major NFV rollouts have been just that... optimistic. While there's no doubt that the development and large scale adoption of NFV by network operators is taking longer than its more optimistic proponents had hoped for, it's also the case that the more realistic proponents - they being the original seven authors and founding members of the ETSI White Paper on NFV published back in October 2012 - were fully aware that what they were proposing wasn't just a new technology generation designed to help do the old things better, but a once in a lifetime switch of approach. As such they were cagey about how long it might take. It would 'probably' they said - nobody could be sure - result in a difficult period of transition with difficult moments on the way.
And it has.
What impediments are we now facing?
For several of the last five years there were persistent doubts about NFV's ability to generate the crucial "5 nines" of reliability in the network. The concern cropped up over and again in our interviews and never seemed to resolve itself - in fact the crux of the problem seemed to relate to disaggregation or rather the lack of it. Here's the reason monolithic code can't cut it.
One of the big advantages of 'cloud native' is the speed with which any one of the small modules can (to borrow a PC metaphor) switch itself off and then back on again in the unlikely event that it falls over. A big old chunk of legacy code, on the other hand, running as a virtual application on a bare metal server, takes considerably longer to get itself back into place should something go wrong.
It turns out, therefore, that cloud native software made up of microservices running in containers and properly distributed across the cloud can back-up and self-repair so fast that service is resumed before the rest of the system is even aware that something has happened.
According to Martin Taylor, CTO of Metaswitch, "if NFV is designed and distributed properly, no single fault of any kind, at any layer in the stack, including the loss of an entire data centre, will stop the service from running."
Putting a big codebase on a virtual machine just didn't work that well. The result often lacked the all-important speed necessary to protect against failures, unless you invoked redundancy (putting in one back-up server for every two say), in which case you were back to where you started in terms of hardware costs. In short, taking a non-native approach might see you gather many of the original drawbacks of legacy infrastructure without quite winning any of the bonuses of NFV.
Going Cloud Native, on the other hand, means you can swap out the virtual machines for containers and run an NFV environment across hybrid (public and private) clouds. Service can be scaled up and out providing capacity on-demand. Because of the small but perfectly-formed modules, each with their own APIs, failovers when things go wrong are organised in a fraction of the time required for a similar manoeuvre in a virtual machine environment. The same modular characteristics mean that individual microservices can be upgraded or swapped out. As important: ideally, a mix of microservices from different vendors can be run together to form new end-user services, or new twists on old ones.
That's the theory, and that's the technology story. It's a remarkable one, with the course of NFV going from a relatively vague notion based on white box servers running virtual machines with vendors able to resuscitate all their legacy code and run it on the new infrastructure as expressed in the original white paper.
View full video: Beth Cohen, New Product Strategist, SDN/NFV Product Management, Verizon
Now five years later, the industry is fast settling on a different mainstream approach, with microservices and the cloud-based OpenStack protocol to glue them together.
Alongside that, 'open source' has really arrived in telco-land and is driving the real change.
An understandable mistake was made early on when, in our first interviews, it was clear that some operators were expecting vendors to step forward with solutions - as they had done in the past. As before, the assumption was that NFV was going to be something vendors would bring to operators, as long as their requirements were properly specified.
As it turned out that was not the way things could go. For a couple of decades, though, many operators had relied increasingly on vendors to provide infrastructure solutions and even managed services for them.
As Patrick Lopez says, with hindsight this was a naive strategy. Both sides needed to engage with each other at the right level and get involved.
Actually It's all about getting to the innovation faster, says Phil Robb, Vice President of Operations for Networking & Orchestration at the Linux Foundation.
With agile you iterate in small steps, says Phil. The balance and the collaboration between open source efforts and the standards is getting clearer and it's essentially all about getting to the innovation faster. With standards, intellectual property is handled in a particular way and the standards bodies have a way to implement IP by sharing it and giving rights to everybody else, but because of the nature of open source (with players needing to use the software in a variety of contexts) there is still a licensing regime, but it's often a free one.
Too many sources?
If the nagging question we kept hearing two to four years ago was 'how can you ensure five nines with NFV'? Today it's often, 'Aren't there just too many competing projects? Surely they're just getting in each other's way?'
Phil Robb of The Linux Foundation takes a more positive view (as he would). He says there's a lot of open source projects in networking, because there's a lot of enthusiasm. When people are enthusiastic they kick off open source projects.
But even open source enthusiasts such as Chris Wright concede that there may be a good case now for some sort of project consolidation or harmonisation so that effort can be focussed on the leading projects.
There is clearly a fine balance to be struck between tapping all the creativity unleashed by the open source process, but then eventually narrowing down the number of overlapping projects to prevent chaos. "Now we're in a consolidation phase in the industry," says Chris." So how do we bring things together, consolidate and create efficiencies?" he asks. OPNFV gives users the opportunity to play with open source components, not in isolation but in the context of an integrated solution and then work out which kind of projects are useful for which kind of use cases. That experience can go on to inform selection. Ultimately he says, it's the users and developers who need to work together to narrow the focus.
One of the big advantages of open source development is that it drops the old 'waterfall model' - that's where you do development, create a product, deliver it, and spend a lot of time hardening it in place. Now we are moving to a model involving continuous integration and continuous deployment involving real collaboration between vendors, service providers and intermediaries. It's test-centric and it gives users confidence that all changes have been validated.
Embracing open
We're now a world away from the old standards-based telco model that looked as immovable as stone in October 2012 when the finishing touches were applied to the NFV White Paper. In striving to meet its requirements over the past five years, the NFV movement has continuously refined its approach as early NFV offerings failed to meet expectations around cost, interoperability and resilience.
Getting on-track saw the telecoms infrastructure playbook given a complete rewrite. The term 'open source', once anathema to many in the industry, has now taken center stage; 'cloud native' once left-of-center has now joined it, so that together they represent the prefered approach to the DevOps (Continuous Integration and Continuous Deployment) driven virtualized network currently being evolved.
Key to this conversion is the collaborative project founded by the Linux Foundation, Open Platform for NFV (OPNFV). Its role is to take existing upstream open source code and integrate and assemble it as a reference platform. CSPs can then test their NFV implementations using OPNFV's test scenarios. OPNFV has boomed. It now boasts 51 member companies, 534 developer members and is on its fourth major software release.
Also crucial now is OpenStack which has become the cloud operating system of choice for NFV. OpenStack is the protocol glue that pulls together and controls the necessary compute, storage, and networking resources for the telco cloud and thus provides the substrate upon which CSPs can place software. Having a consistent OpenStack in every location allows CSPs to roll out new software and services at speed and across geographies.
A recent Heavy Reading survey found that 89 per cent of telecom respondents considered OpenStack to be essential or important to their success and that more than 60 per cent of CSPs were already using or currently testing new use cases with OpenStack for NFV.
To find out more about OPNFV, OpenStack and other key NFV open source groups and initiatives visit the HPE/Intel channel for more information.