OVS Orbit Podcast

Interviews and topics of interest to Open vSwitch developers and users, hosted by Ben Pfaff.

RSS Feed

Use the RSS feed to keep up with new episodes in your favorite podcast listening application.

Episodes

20.Protocol-Independent FIB Architecture, with Ryo Nakamura from University of Tokyo (Nov 29) MP3
19.The Faucet SDN Controller, with Josh Bailey from Google and Shivaram Mysore from ONF (Nov 13) MP3
18.OVN Launch, with Russell Bryant from Red Hat (Oct 28) MP3
17.Debugging OpenStack Problems using a State Graph Approach, with Yong Xiang from Tsinghua University (Oct 13) MP3
16.Tunneling and Encapsulation, with Jesse Gross from VMware (Sep 26) MP3
15.Lagopus, with Yoshihiro Nakajima from NTT (Sep 10) MP3
14.Converging Approaches to Software Switches (Aug 28) MP3
13.Time Capsule, with Jia Rao and Kun Suo from University of Texas at Arlington (Aug 20) MP3
12.Open vSwitch Joins Linux Foundation (Aug 10) MP3
11.P4 on the Edge, with John Fastabend from Intel (Aug 9) MP3
★★ 10.SoftFlow, with Ethan Jackson from Berkeley (Jul 20) MP3
★★ 9.Adding P4 to OVS with PISCES, with Muhammad Shahbaz from Princeton (Jun 25) MP3
8.Mininet, with Bob Lantz and Brian O'Connor from ON.LAB (Jun 18) MP3
7.The OVS Development Process, with Kyle Mestery from IBM (Jun 11) MP3
6.sFlow, with Peter Phaal from InMon (Jun 2) MP3
★★★ 5.nlog, with Teemu Koponen from Styra and Yusheng Wang from VMware (May 26) MP3
★★★★ 4.Cilium, with Thomas Graf from Cisco (May 21) MP3
3.OVS in Production, with Chad Norgan from Rackspace (May 8) MP3
★★★★ 2.OPNFV and OVS, with Dave Neary from Red Hat (May 4) MP3
★★★ 1.Porting OVS to Hyper-V, with Alessandro Pilotti from Cloudbase (May 1) MP3

Episode 20: Protocol-Independent FIB Architecture, with Ryo Nakamura from University of Tokyo (Nov 29, 2016)

Ryo Nakamura is a PhD student at The University of Tokyo, studying IP networking, overlay networking, and network operation. This episode is a recording I made of his talk during APSys 2016, the Asia-Pacific Workshop on Systems, on Aug. 5, based on Protocol-Independent FIB Architecture for Network Overlays, written with co-authors Yohei Kuga (from Keio University), Yuji Sekiya, and Hiroshi Esaki.

The abstract for this paper says:

We introduce a new forwarding information base architecture into the stacked layering model for network overlays. In recent data center networks, network overlay built upon tunneling protocols becomes an essential technology for virtualized environments. However, the tunneling stacks network layers twice in the host OS, so that processing to transmit packets increases and throughput will degrade. First, this paper shows the measurement result of the degradation on a Linux kernel, in which throughputs in 5 tunneling protocols degrade by over 30%. Then, we describe the proposed architecture that enables the shortcut for the second protocol processing for network overlays. In the evaluation with a dummy interface and a modified Intel 10-Gbps NIC driver, transmitting throughput is improved in 5 tunneling protocols and the throughput of the Linux kernel is approximately doubled in particular protocols.

Before the talk, session chair Florin Dinu introduces the speaker. Following the talk, the questions come from Ben Pfaff, Sorav Bansal, and Florin, respectively. Sorav's question refers to my own talk from earlier the same day at the conference, which is published as OVS Orbit Episode 14.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (22 MB, 24 min).

Episode 19: The Faucet SDN Controller, with Josh Bailey from Google and Shivaram Mysore from ONF (Nov 13, 2016)

Faucet is an open source SDN controller developed by a community that includes engineers at Google's New Zealand office, the Open Networking Foundation (ONF), and others. This episode is an interview with Josh Bailey from Google and Shivaram Mysore from the ONF. It was recorded on Nov. 7, at Open vSwitch 2016 Fall Conference.

The episode begins with a description of Faucet's goals. Unlike the higher profile Open Daylight and ONOS controllers, which focus on performance at high scale, Faucet places simplicity, ease of development, and small code size as higher purposes.

Also in contrast to most controllers, Faucet does not contain code specific to individual vendors or models of OpenFlow switch. Rather, it targets any OpenFlow 1.3 switches that fulfill its minimum multi-table and other requirements, using a pipeline of tables designed to be suitable for many purposes. In Josh's words, “The most important one was tables. Once you have tables, you can say `if-then'. If you don't have tables, you can only go `if-and-and-and-and'.”

Faucet development has focused on deployments. Several Faucet users have come forward to publicly talk about their use, with the highest profile of those being the Open Networking Foundation deployment at their own offices. See also a map of public deployments. Shiva describes a temporary deployment at the ONF Member Workdays for conference wi-fi use.

Performance is not a focus for Faucet. Instead, developers encourage users to experiment with deployments and find out whether there is an actual performance in practice. Shivaram reports that this has worked out well.

Faucet can control even very low-end switches, such as the Zodiac, a 4-port switch from Northbound Networks that costs AUD $99 (about USD $75). Faucet itself has low memory and CPU requirements, which mean that it can run on low-end hardware such as Raspberry Pi (about $30), which has actually been deployed as a production controller for enterprise use.

Last summer, the ONF hosted a Faucet hackfest in Bangalore, where each team was supplied its own “Pizod,” a combination of a Zodiac and Raspberry Pi, for development. Hackers at the hackfest were required to have Python experience, but not networking or OpenFlow experience. Each team of 4, which included a documentation and a UX person, chose a project from an assigned list of possibilities.

Faucet records the state of the system, over time, to an InfluxDB database and exposes that for inspection through a Grafana dashboard.

The Faucet code is small, about 2,500 lines of code. About this size, Josh says, “I'd be surprised if it gets about four times the size, because we've got quite a clear idea of its scope... Think of Faucet as your autonomic nervous system, a small important part of your brain but it keeps you breathing and it reacts to high-priority threats before your conscious mind sets in. You keep that code small and you test the heck out of it.”

Josh is working on extending support for distributed switching within Faucet. Troubleshooting large L2 fabrics is especially frustrating, and Josh aims to make it easier. Shiva is encouraging deployments, especially feedback from deployments, and control over wi-fi. Other priorities are better dashboards and better IPv6 support.

For more information on Faucet, visit the Faucet blog, read the ACM Queue article on Faucet, dive into the Faucet Github repo, or search for “Faucet SDN” on Youtube.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (43 MB, 47 min).

Episode 18: OVN Launch, with Russell Bryant from Red Hat (Oct 28, 2016)

OVN is a network virtualization system that has been under development as part of the Open vSwitch project over about the last two years. On this podcast, Ben Pfaff and Russell Bryant, two major contributors to OVN, describe OVN, its architecture, its features, focusing on features that were added in the recent release of Open vSwitch 2.6, and some future directions.

This episode is based on the material presented at the OpenStack Summit in the session titled “OVN - Moving to Production.” The summit talk was recorded and video and slides are available. This podcast follows the structure of the slides pretty closely, making them a useful companion document to look at while listening, but the episode is meant to stand alone.

Resources mentioned in this episode:

Russell Bryant is a software developer in Red Hat's Office of the CTO. You can find him at russellbryant.net or on Twitter as @russellbryant.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (32 MB, 35 min).

Episode 17: Debugging OpenStack Problems using a State Graph Approach, with Yong Xiang from Tsinghua University (Oct 13, 2016)

Yong Xiang is an associate professor at Tsinghua University. This episode is a recording of his talk during APSys 2016, the Asia-Pacific Workshop on Systems, on Aug. 5, based on Debugging OpenStack Problems Using a State Graph Approach, written with co-authors Hu Li, Sen Wang, Charley Peter Chen, and Wei Xu, which was awarded “best paper” at the conference. A preprint of the paper is also available at arxiv.org.

Slides from the talk are available. It is probably easier to follow the talk if you have the slides available, but it is certainly not necessary.

This is a very practical paper that seeks ways to make it easier for non-experts to troubleshoot and debug an OpenStack deployment. Its abstract is:

It is hard to operate and debug systems like OpenStack that integrate many independently developed modules with multiple levels of abstractions. A major challenge is to navigate through the complex dependencies and relationships of the states in different modules or subsystems, to ensure the correctness and consistency of these states. We present a system that captures the runtime states and events from the entire OpenStack-Ceph stack, and automatically organizes these data into a graph that we call system operation state graph (SOSG). With SOSG we can use intuitive graph traversal techniques to solve problems like reasoning about the state of a virtual machine. Also, using a graph-based anomaly detection, we can automatically discover hidden problems in OpenStack. We have a scalable implementation of SOSG, and evaluate the approach on a 125-node production OpenStack cluster, finding a number of interesting problems.

The first question at the end of the talk comes from me, with an answer assisted by the paper's coauthor Wei Xu, and the second one from Sorav Bansal.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (18 MB, 20 min).

Episode 16: Tunneling and Encapsulation, with Jesse Gross from VMware (Sep 26, 2016)

Tunneling and encapsulation, with protocols from GRE to Geneve, have been a key part of Open vSwitch since the early days. Jesse Gross, an early employee at Nicira and major contributor to Open vSwitch, and perhaps most importantly the maintainer of the Open vSwitch kernel module, joins this episode of the podcast to talk about this aspect of OVS.

The conversation begins with a discussion of the reasons for L2-in-L3 tunnels. Jesse's reasons for such tunnels include adding a layer of indirection between physical and virtual networks. VLANs can provide a solution for partitioning networks, but they don't provide the same layer of indirection.

Jesse describes the motivation for designing and implementing STT encapsulation in Open vSwitch. The biggest reason was performance, primarily the cost of losing the network card hardware support for various kinds of offloads, such as checksum and TCP segmentation offload support. Most network cards can only implement these for specific protocols, so that using an encapsulation not specifically supported by the card caused performance degradation. STT worked around this by using (abusing?) TCP as an encapsulation. Since most network cards can offload TCP processing, this allowed STT encapsulation to be very fast on both the send and receive sides. Jesse also describes the challenges in implementing STT and his view of STT's future.

Whereas STT was designed as a performance hack for existing network cards, Geneve, the second encapsulation that Jesse designed and implemented in Open vSwitch, addresses the paucity of metadata that the existing tunneling protocols supported. GRE, for example, supports a 32-bit key, VXLAN supports a 24-bit VNI, STT a 64-bit key, and so on. None of them supported a large or flexible amount. Geneve, on the other hand, supports an almost arbitrary number of type-length-value (TLV) options, intended to be future-proof. Geneve has been working its way through the IETF for about 3 1/2 years.

Jesse talks about NSH (Network Service Header), which is often mentioned in conjunction with Geneve. NSH has some specialization for service function changing, whereas Geneve takes a more general-purpose stance. NSH does support TLVs, but its primary focus is on a fixed number of fixed-size headers that it keeps in the packet, and that is what most implementations of Geneve actually implement. NSH can used inside L2 or L3, whereas Geneve as currently runs only inside L3. Jesse discusses pros and cons to each design.

Jesse discusses MTU issues in tunneling and encapsulation, which come up because they techniques add bytes to each packet, making packets that are maximum length before encapsulation exceed the MTU. Jesse says that the solution to MTU problems depends on the use case: for example, in data center use cases, a simple solution can be to increase the MTU of the physical network. In the pre-1.10 era, Open vSwitch supported path MTU discovery for tunnels, and Jesse describes why it was dropped and what it would take to reintroduce it.

Jesse describes CAPWAP tunneling, why OVS implemented it, and why OVS dropped support.

Jesse describes GTP tunneling and the potential for including it in OVS, as well as ERSPAN encapsulation.

Jesse describes the challenges of encapsulations for IP (as opposed to encapsulations for Ethernet).

Jesse lays out some thoughts on the future of tunneling in Open vSwitch.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (31 MB, 34 min).

Episode 15: Lagopus, with Yoshihiro Nakajima from NTT (Sep 10, 2016)

Lagopus is a high-performance, open source software switch, primarily for DPDK on Linux, developed at NTT in its Network Innovation Laboratories research group. Lagopus features OpenFlow 1.3 conformance, plus extensions to better support NTT's use cases. This episode is a discussion with Yoshihiro Nakajima, one of the switch's developers, about Lagopus, its history, goals, and future.

Lagopus supports protocols that are particularly important to carriers, such as PBB and MPLS, and includes OpenFlow extensions for general-purpose tunnel support with VXLAN, GRE, and other encapsulations. Yoshihiro talks about how, with DPDK, Lagopus implements some protocols, such as ARP and ICMP, by delegating them to the Linux kernel through TAP devices.

Yoshihiro describes the architecture of Lagopus and how it achieves high performance. It has optimizations specific to the flows that each table is expected to contain; for example, a different lookup implementation for L2 and L3 tables. We talk about how the number of tables in a given application affects performance.

Lagopus targets two main application domains: high-performance switching or routing on bare-metal servers, or high-performance virtual switching for NFV. Some of the latter applications are in a testing phase, aiming for ultimate production deployment.

We discuss some philosophy of SDN (some audio was lost at the beginning of this discussion). The important part of SDN, to Yoshihiro, is to avoid the need to use CLIs to configure switches, instead moving to a “service-defined” model.

We discussed how to fit stateful services into the stateless OpenFlow match and action pipeline model, particularly how to handle the need for sequence numbers in some tunneling protocols such as GRE and GTP.

We talk about the difficulties in forming an open source community around a software switch and attracting contributions from a group outside the immediate organization writing the software. Yoshihiro reports receiving reports from several users, including suggestions for improvement.

Lagopus has a growing worldwide community but some of the outreach from the team has focused on Asia in general and Japan in particular because of lower geographical and communication barriers.

The Lagopus team is currently working on a switch and routing control API that works at a higher level than OpenFlow, based on YANG models.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (24 MB, 26 min).

Episode 14: Converging Approaches to Software Switches (Aug 28, 2016)

On Aug. 4 and 5, I attended APSys 2016, the Asia-Pacific Workshop on Systems. This episode is my own “industry talk” from APSys, titled “Converging Approaches in Software Switches: Combining code- and data-driven approaches to achieve a better result.” Slides from the talk are available and may provide a little extra insight, but it is not necessary to view them to follow along with with the talk.

This talk introduces the idea that software switches can be broadly divided in terms of their architecture into two categories: “code-driven” switches that call a series of arbitrary functions on each packet, and “data-driven” switches that use a single engine to apply actions selected from a series of tables. This talk explains the two models and the usefulness of the categorization, and explain how hybrids of the two models can build on the strengths of both.

In the past, people have asked me to compare Open vSwitch to other software switches, both architecture- and performance-wise. This talk is the closest that I plan to come to a direct comparison. In it, I cover a key architectural difference between Open vSwitch and most other software switches, and I explain why that architectural difference makes a difference for benchmarks that authors of many software switches like to tout.

This talk includes a very kind introduction from Sorav Bansal, assistant professor at IIT-Delhi, as well as several questions and answers interleaved, including some from Sorav and some from others' whose names I did not catch.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (26 MB, 28 min).

Episode 13: Time Capsule, with Jia Rao and Kun Suo from University of Texas at Arlington (Aug 20, 2016)

On Aug. 4 and 5, I attended APSys 2016, the Asia-Pacific Workshop on Systems. I was impressed with how many of the papers presented there were relevant to Open vSwitch and virtualization in general. This episode is an interview with Jia Rao and Kun (Tony) Suo of the University of Texas at Arlington, to talk about their APSys paper, Time Capsule: Tracing Packet Latency across Different Layers in Virtualized Systems, which received the conference's Best Paper award.

The paper's abstract is:

Latency monitoring is important for improving user experience and guaranteeing quality-of-service (QoS). Virtualized systems, which have complex I/O stacks spanning multiple layers and often with unpredictable performance, present more challenges in monitoring packet latency and diagnosing performance abnormalities compared to traditional systems. Existing tools either trace network latency at a coarse granularity, or incur considerable overhead, or lack the ability to trace across different boundaries in virtualized environments. To address this issue, we propose Time Capsule (TC), an in-band profiler to trace packet level latency in virtualized systems with acceptable overhead. TC timestamps packets at predefined tracepoints and embeds the timing information into packet payloads. TC decomposes and attributes network latency to various layers in the virtualized network stack, which can help monitor network latency, identify bottlenecks, and locate performance problems.

The interview covers the basic idea behind Time Capsule, the mechanism that it uses, techniques for comparing clocks of different machines across a network, and how it helps users and administrators track down latency issues in a virtual network, with reference to a specific example in the paper that shows the advantage of the fine-grained latency monitoring available in Time Capsule. “You can find some interesting results that are totally different from the results you get from coarse-grained monitoring.”

Other topics include comparison against whole-system profilers such as Perf or Xenoprof, the overhead of using Time Capsule, how many tracepoints are usually needed, how to decide where to put them, and how to insert a tracepoint.

There is a brief discussion of the relationship between Time Capsule and In-Band Network Telemetry (INT). Time Capsule focuses on virtualization, timing, and network processing within computer systems, whereas INT tends to focus more on switching and properties of the network such as queue lengths.

Time Capsule has not yet been released but it will be made available in the future. For now, the best way to learn more is to read the paper. Readers who want to know more can contact the authors at the email addresses listed in the paper.

The authors are using Time Capsule as the basis for continuing research into the performance of virtualized systems.

Time Capsule has some limitations. For example, it is limited to measurements of latency, and it cannot record packet drops. It also, currently, requires tracepoints to be inserted manually, although eBPF might be usable in the future.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (28 MB, 31 min).

Episode 12: Open vSwitch Joins Linux Foundation (Aug 10, 2016)

On August 9, Open vSwitch joined the Linux Foundation as a Linux Foundation Collaborative Project, as previously discussed on ovs-discuss.

This episode is a recording of a conference call held by the Open vSwitch developers on August 10 to talk about this move, what will change and what will not change as a result, and open up for Q&A. Justin Pettit and Ben Pfaff are the main speakers in the call. You will also hear comments and questions from Simon Horman from Netronome and Mike Dolan from the Linux Foundation.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (11 MB, 12 min).

Episode 11: P4 on the Edge, with John Fastabend from Intel (Aug 9, 2016)

Interview with John Fastabend, an engineer at Intel whose work in the Linux kernel has focused on the scheduling core of the networking stack and Intel NIC drivers. John has also been involved in IEEE standardization of 802.1Q and Data Center Bridging (DCB).

The interview focuses on John's recent work on P4 for edge devices, which he presented at the P4 Workshop held at Stanford in May. The slides for his talk are available.

John's work originated in the use of P4 as a language for describing the capabilities of Intel NICs, as an alternative to thousand-page manuals written in English. He moved on to explore ways that software can be offloaded into hardware, to improve performance and of course to make Intel's hardware more valuable. That led to the use of P4 to describe software as well, and eventually to the question that kicked off his talk, “Is P4 a useful abstraction for an edge node?” where an edge node in this case refers to a server running VMs or containers.

The work presented at the P4 conference includes a P4 compiler that generates IR code (that is, portable bytecode) for LLVM, a portable compiler that can generate code for many architectures and that is designed to be easily amenable to extensions. John then used an existing backend to LLVM to generate eBPF code that runs inside the Linux kernel on any architecture through an in-kernel just-in-time (JIT) compiler.

John used this infrastructure to answer a few different questions. Is P4 expressive enough to build a virtual switch? Does eBPF have enough infrastructure to implement a virtual switch? The answer in each case appears to be “yes.”

The runtime interface to the eBPF P4 programs works through eBPF maps. John's work include tools for populating maps, including command-line and NETCONF interfaces.

John is exploring the idea of using Intel AVX instructions to accelerate packet processing. He also points out that the JIT can actually be an asset for performance, rather than a liability, if it can specialize the code to better run on particular hardware. The well-established JITs for Java and Lua might point in the right direction.

John describes the performance advantages of XDP (Express Data Path) for processing packets that do not need to go to the operating system without constructing a full Linux sk_buff data structure.

The main application of this work, so far, has been to experiment with software implementations of hardware. John is also experimenting with a load balancer and a connection tracker.

John's work is all in the context of the Linux kernel. He speculates on how it could be applied to a switch running on DPDK in userspace. In such an environment, it might make sense to have LLVM compile directly to native code instead of via eBPF.

John talks about P4-specific optimizations to improve P4 programs that are written in a way that is difficult to implement efficiently in eBPF.

John and Ben discuss some of the differences between software and hardware implementations of P4.

John describes two models for network processing in software. In the “run-to-completion” model, a packet is processed from ingress to egress on a single core. In the “pipeline” model, the packet passes from one core to another at multiple stages in its processing. DPDK supports both models. John and Ben both have the intuition that the run-to-completion model is likely to be faster because it avoids the overhead of passing packets between cores, and they discuss why there might be exceptions.

The next step is performance testing and optimization, gathering users, and moving to P4 2016.

John and Ben discuss related work in P4 and eBPF. Thomas Graf's eBPF-based work on Cilium, discussed in Episode 4, leans more toward orchestration and scale over a large system than as a general-purpose switch. Ethan Jackson's work on SoftFlow, discussed in Episode 10, is more about how to integrate state with Open vSwitch. Muhammad Shahbaz's work on integrating P4 into Open vSwitch, discussed in Episode 9, can benefit from John's experience using LLVM.

If you're interested in experimenting with the prototype that John has developed, or if you have other questions for him, the best way to contact him is via email.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (37 MB, 41 min).

Episode 10: SoftFlow, with Ethan Jackson from Berkeley (Jul 20, 2016)

Interview with Ethan Jackson, a PhD student at Berkeley advised by Scott Shenker. Before Berkeley, Ethan worked on Open vSwitch as an employee at Nicira Networks and then at VMware. His contributions to Open vSwitch have greatly slowed since he moved on to Berkeley, but as of this writing Ethan is still the second most prolific all-time contributor to Open vSwitch measured in terms of commits, with over 800.

Ethan talks about his experience implementing CFM and BFD protocols in Open vSwitch. He found out that, whenever anything went wrong in a network, the first thing that found the problem was CFM (or BFD), and so that was always reported as the root of the problem:

“Every bug in the company came directly to me, and I got very good at debugging and pointing out that other people's code was broken... That's really how I progressed as an engineer. Being forced to debug things makes you a better systems person.”

The body of the interview is about SoftFlow, a paper published at USENIX ATC about integrating middleboxes into Open vSwitch. The paper looks at the spectrum of ways to implement a software switch, which currently has two main points. At one end of the spectrum is the code-driven Click-like model where each packet passes through a series of black box-like stages. At the other end is the data-driven Open vSwitch model, in which a single code engine applies a series of packet classifier based stages to a packet.

The data-driven model has some important advantages, especially regarding performance, but it's really bad at expressing middleboxes, particularly when state must be maintained between packets. SoftFlow is an attempt to bring Click-like functionality into an Open vSwitch world, where firewalls and NATs can be expressed and OpenFlow functionality can be incorporated where it is appropriate as well.

Part of the problem comes down to safety. It's not reasonable to trust all vendors to put code directly into the Open vSwitch address space, because of code quality and trust issues. The common solution, in an NFV environment, is to put each virtual network function into its own isolated virtual machine, but this has a high cost in performance and other resources.

SoftFlow is an extension to OpenFlow actions. Traditionally, actions are baked into the switch. SoftFlow allows a third party to augment actions in the switch via a well-defined interface. Actions are arbitrary code that can perform pretty much anything, but the intention is that they should integrate in well-defined ways with OpenFlow. For example, a firewall has a need for packet classification, which is easily and naturally implemented in OpenFlow, but a connection tracker, that cannot be expressed in OpenFlow, might be expressed in SoftFlow and then integrated with OpenFlow classifiers. The paper talks about a number of these SoftFlow features.

Ethan contrasts connection tracking via SoftFlow against the Linux kernel based connection tracking that has been recently integrated into Open vSwitch. According to Ethan, the value of SoftFlow for such an action is the infrastructure. Kernel-based connection tracking required a great deal of infrastructure to be built up, and that infrastructure can't necessarily be reused for another stateful action. However, SoftFlow itself provides a reusable framework, simplifying development for each new action built with it.

Ethan explains a firewall example in some detail.

The paper compares the performance of SoftFlow to various alternate implementation, with a focus on Open vSwitch. They measured several pipelines with various traffic patterns and compared a SoftFlow implementation to a more standard NFV implementation with Open vSwitch as a software switch and the virtual functions implemented as virtual machines. SoftFlow provided a significant performance gain in this comparison.

Ethan describes why he is skeptical of performance measurements of NFV systems in general: first, because they generally measure trivial middleboxes, where the overhead of the actual middlebox processing is negligible, and second, because they focus on minimum-length packets, which may not be realistic in the real world.

Ethan talks about hardware classification offload. This is a general Open vSwitch feature, not actually specific to SoftFlow. Open vSwitch does a packet classification for every packet in the datapath, which is expensive and the bulk of the cost of Open vSwitch packet forwarding. NICs from Intel and Broadcom and others have TCAMs that can perform packet classification in hardware much more quickly than software. These TCAMs have significant limitations but the paper describes how these can be overcome to obtain major speedups for software switching. (This is an area where Open vSwitch's architecture gives it a major advantage over one with an architecture like Click.)

Ethan's current project is Quilt, a container orchestration system whose goal is to find the right model for expressing distributed systems. Quilt assumes the flexibility provided by network virtualization systems and explores how a system built on this flexibility should be architected. It uses a declarative programming language to describe a distributed system and includes software to implement and maintain a system described using the language. The system is designed to be easy to deploy and use with popular distributed systems such as Apache Spark.

You can reach Ethan via email at his website, ej2.org.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (35 MB, 38 min).

Episode 9: Adding P4 to OVS with PISCES, with Muhammad Shahbaz from Princeton (Jun 25, 2016)

Interview with Muhammad Shahbaz, a third-year grad student at Princeton advised by Jennifer Rexford and Nick Feamster. Shahbaz talks about his work on PISCES, a version of Open vSwitch modified to add support for P4, a language for programming flexible hardware switches, which will be presented at SIGCOMM in August 2016. Shahbaz is spending this summer as an intern at VMware, where he is working to bring PISCE's features into a form where they can be integrated into an upstream Open vSwitch release.

A P4 program specifies a number of different aspects of a switch: how packets are parsed, their processing as they pass through a series of table, and how the packets are reassembled (“deparsed”) when they egress the switch.

From an Open vSwitch person, the main way that P4 differs from OpenFlow in that it allows the user to specify the protocols to be used. Any given version of Open vSwitch, when controlled over OpenFlow, is essentially a fixed-function switch, in the sense that it supports a specific set of fixed protocols and fields, but when P4 is integrated into Open vSwitch a network developer can easily add and remove and customize the protocols that it supports.

Modifying C source code and modifying P4 source code are both forms of programming, but P4 source code is much smaller and much more in the “problem domain” for network programming, and thus more programmer-efficient. Because P4 programs tend to be simple and problem domain specific, this also allows end users who want special features but don't have strong C programming skills to add features. Shahbaz quotes some measurements on difference in code size: 20x to 40x reduction in code size when a pipeline is implemented in P4 rather than C.

One must trade some costs for these improvements. In particular, it is a challenge to make P4 perform well in Open vSwitch because the P4 abstract forwarding model is not an exact match for the Open vSwitch or OpenFlow abstract forwarding model. For this reason, the initial PISCES prototype had a 40% performance overhead over regular Open vSwitch for a simple L2/L3 routing application. With a number of optimizations, including those around field updates and checksum verification and update, the penalty was reduced to about 3%, and Shahbaz is optimistic that it can be made faster still, perhaps faster than the current OVS code. The optimizations both reduced the cost in the Open vSwitch “fast path” cache and increased the hit rate for the cache.

The quoted 40% and 3% performance hits for PISCES are actually comparisons against Open vSwitch with its microflow cache disabled, which is not a normal way to run Open vSwitch. This is because PISCES does not yet have a way to specify how to compute the hash used for indexing the microflow cache; in plain Open vSwitch, this hash is computed in (protocol-dependent) NIC hardware, whereas in PISCES it would need to be computed in software.

Shahbaz mentioned that PISCES may be used in the next iteration of Nick Feamster's Coursera course on Software-Defined Networking and talks about the target audience for the course.

Work for the summer, besides getting some of this work into OVS, includes looking into some more advanced features like P4 stateful processing features such as counters, meters, and registers. Ethan Jackson's SoftFlow paper recently presented at USENIX ATC is also relevant to stateful processing in OVS.

To find out more about PISCES or to contact Shahbaz, visit its website at Princeton, which includes a preprint of the paper and links to the Git repository with source code. You can also view slides and video that Shahbaz presented about an early version of PISCES at Open vSwitch 2015 Fall Conference.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (29 MB, 31 min).

Episode 8: Mininet, with Bob Lantz and Brian O'Connor from ON.LAB (Jun 18, 2016)

Interview with Bob Lantz and Brian O'Connor of ON.LAB, about Mininet software for simulating networks.

Bob previously gave a talk about Mininet (slides, video) at the Open vSwitch 2015 Fall Conference.

Bob describes the mission of ON.LAB and how he ended up there. He talks about introducing the idea of a network operating system to ON.LAB. He mentioned that his interest in networks arose from a lecture by Nick McKeown in the EE380 lecture series at Stanford, in which Nick stated: “Networks are like hardware without an operating system,” which piqued Bob's interest.

Brian relates his own experience getting involved with SDN, Mininet, and ON.LAB.

Bob describes the genesis of Mininet by analogy to mobile device development. Mobile device development is a pain because no one wants to spend all their time with these tiny devices, so you use a simulator. For network development, you need a simulator too because otherwise you need a huge stack of expensive hardware. Mininet was directly inspired by a network namespaces-based simulator developed in-house at Arista for testing EOS.

Bob compares Mininet to Docker and other container systems. All of these are container orchestration systems that make use of the “namespace” and control group (cgroup) features of the Linux kernel. Mininet gives more control over the network topology than the others.

Bob talks about limitations in OpenStack networking and what he'd like to see OpenStack support in networking.

Brian describes a trend in NFV toward minimization, that is, reducing the amount of overhead due to VMs, often by running in containers instead. He speculates that containers might later be considered too heavyweight. In Mininet, isolation is à la carte: the aspects of network isolation, process isolation, and so on can all be configured independently, so that users do not experience overhead that is not needed for a particular application.

Bob talks about the scale that Mininet can achieve and that users actually want to simulate in practice and contrasts it against the scale (and particular diameter) of real networks. Versus putting each switch in a VM, Bob says that Mininet allows for up to two orders of magnitude scale improvement. His original vision was to simulate the entire Stanford network of 25,000 nodes on a rack of machines. Bob talks about distributed systems built on Mininet, which are not officially integrated into Mininet. Distributed Mininet clusters are a work in progress. In general, Mininet scales better than most controllers.

Bob compares Mininet to ns3. ns3 was originally a cycle-accurate simulator, but this made it hard to connect to real hardware and run in real time, so it has moved in a direction where it works in a mode similar to Mininet.

Bob describes the Mininet development community, based on github pull requests. Bob describes a paradox in which they'd like to accept contributions but most of the patches that they receive are not of adequate quality.

Bob talks about performance in OVS related to Mininet, as a review of his previous talk, and especially related to how Mininet speaks to OVSDB. The scale of Mininet doesn't interact well with the design of the OVS command-line tool for configuring OVS, which doesn't expect thousands of ports or perform well when they are present. Bob reports that creating Linux veth devices is also slow.

Bob describes how to generate traffic with Mininet: however you like! Since you can run any application with Mininet, you can generate traffic with any convenient software.

Brian's wish list: improving the support for clustering Mininet and the ability to “dilate time” to make Mininet simulation more accurate to specific hardware, and the ability to model the control network.

You can contact Brian via email. Bob recommends emailing the Mininet mailing list to get in contact with him.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (45 MB, 49 min).

Episode 7: The OVS Development Process, with Kyle Mestery from IBM (Jun 11, 2016)

Interview with Kyle Mestery, a Distinguished Engineer at IBM who has been involved with Open vSwitch since about 2012, about the Open vSwitch development process. Our conversation was based on Upstream Open Source Networking Development: The Good, The Bad, and the Ugly, a presentation at ONS 2016 given by Kyle along with Justin Pettit from VMware and Russell Bryant from Red Hat. Kyle also gave a version of the talk with Armando Migliaccio at OpenStack Austin. The latter talk was recorded on video.

The focus of the conversation is to present the Open vSwitch development process by comparing it against the process for OpenStack Neutron and OpenDaylight. All three project names begin with “Open,” but there are significant differences in how they develop code!

How do these projects communicate? All of them have mailing lists, although there are subtle differences in how they use them. Open vSwitch has two main lists, ovs-discuss and ovs-dev. OpenStack, despite being a much bigger project, has only a single development mailing list that it divides up using bracketed “topic tags” supported by the GNU Mailman mailing list manager. OpenDaylight, finally, has many mailing lists per subproject. Kyle explains the advantages and disadvantages of each approach.

All of these projects have IRC channels also. Open vSwitch has a single channel #openvswitch and the other projects have multiple, subproject-specific channels.

OpenDaylight stands out as the only project among the three that relies heavily on conference calls.

Are the projects friendly to newcomers? In general, Kyle thinks so. As with any project, regardless of open or closed source, there will be some existing developers who are super-helpful and others who are overworked or overstressed and less helpful initially. In the end, how you cycle through leader and contributors in a project is how the project grows.

The projects handle bugs differently as well. Open vSwitch primarily handles bugs on the mailing list. OpenStack files bugs in Launchpad using a carefully designed template. OpenDaylight has a Bugzilla instance and a wiki with instructions and advice. Kyle thinks that Open vSwitch may need to make heavier use of a bug tracker sometime in the future.

The projects have different approaches to code review. OpenDaylight and OpenStack use Gerrit, a web-based code review system, although many developers do not like and avoid the Gerrit web interface, instead using a command-line tool called Gertty. Open vSwitch primarily uses patches emailed to the ovs-dev mailing list, similar to the Linux kernel patch workflow. In-flight patches can be monitored via Patchwork, although this is only a tracking system and has no direct control over the Open vSwitch repository. Open vSwitch also accepts pull requests via Github.

Kyle mentions some ways that the Open vSwitch development process might benefit from approaches used in other projects, such as by assigning areas to particular reviewers and dividing the project into multiple, finer-grained repositories. OVN, for example, might be appropriate as a separate project in the future.

Kyle's advice: plan ahead, research the projects, give your developers time to become comfortable with the projects, treat everyone with respect, treat everyone equally, and give back to the core of the project. Keep in mind that little maintenance patch are as important as huge new features. Finally, trust your developers: you hired good people, so trust their ability to work upstream.

The interview also touches on:

You can reach Kyle as @mestery on Twitter and follow his blog at siliconloons.com.

OVS Orbit is produced by Ben Pfaff. The intro music in this episode is Drive, featuring cdk and DarrylJ, copyright 2013 by Alex. The bumper music is Yeah Ant featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro music is Space Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (35 MB, 38 min).

Episode 6: sFlow, with Peter Phaal from InMon (Jun 2, 2016)

Interview with Peter Phaal of InMon, about sFlow monitoring and how it is used with Open vSwitch. In summary, an sFlow agent in a switch (such as Open vSwitch or a hardware switch) selects a specified statistical sample of packets that pass through it, along with information on how the packet was treated (e.g. a FIB entry in a conventional switch or OpenFlow actions in Open vSwitch) and sends them across the network to an sFlow collector. sFlow agents also periodically gather up interface counters and other statistics and send them to collectors. Data collected from one or more switches can then be analyzed to learn useful properties of the network.

Peter begins with a description of the history of sFlow, including its pre-history in network monitoring products that Peter was involved in at HP Labs in Bristol. At the time, network monitoring did not require a special protocol such as sFlow, because networks were based on a shared medium to which any station could listen. With the advent of switched networks, the crossbar inside each switch effectively became the shared medium and required a protocol such as sFlow to look inside.

Peter compares the data collected by sFlow to a “ship in a bottle,” a shrunken model of the network on which one can later explore route analytics, load balancing, volumetric billing, load balancing, and more. He says that SDN has empowered users of sFlow by providing a control plane in which one can better act on the information obtained from analytics:

“If you see a DDoS attack, you drop a filter in and it's removed from the network. If you see a large elephant flow taking a path that's congested, you apply a rule to move it to an alternative path. So it really unlocks the value of the analytics, having a control plan that's programmable, and so I think the analytics and control really go hand-in-hand.”

sFlow can be used in real time or for post-facto analysis. The latter is more common historically, but Peter thinks that the potential for real-time control are exciting current developments.

In contrast to NetFlow and IPFIX, sFlow exports relatively raw data for later analysis. Data collected by sFlow can be later converted, approximately, into NetFlow or IPFIX formats.

Other topics:

Further resources on sFlow include sflow.org for the sFlow protocol, sflow.net for the sFlow host agent, and Peter's blog at blog.sflow.com.

You can find Peter on Twitter as @sFlow.

OVS Orbit is produced by Ben Pfaff. The intro and bumper music is Electro Deluxe, featuring Gurdonack, copyright 2014 by My Free Mickey. The outro music is Girls like you, featuring Thespinwires, copyright 2014 by Stefan Kartenberg. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (38 MB, 41 min).

Episode 5: nlog, with Teemu Koponen from Styra and Yusheng Wang from VMware (May 26, 2016)

Interview with Teemu Koponen of Styra and Yusheng Wang of VMware, about the nlog language.

nlog, in this context, is unrelated to the logging platform for .NET. It is a database language, a simplified form of Datalog that lacks recursion and negation. Teemu designed this language for use in Nicira NVP, the forerunner of VMware NSX-MH. Yusheng is now working to implement nlog in OVN.

Teemu and Yusheng begin by describing the nlog language, its name (the “N” stands for “Nicira.”), and its purpose and contrast it with more commonly known languages such as SQL. An nlog (or Datalog) program consists of a series of queries against input table that produce new tables, which can be reused in subsequent queries to eventually produce output tables.

In a network virtualization system such as NVP or OVN, input tables contain information on the configuration or the state of the system. The queries transform this input into flow tables to push down to switches. The nlog program acts a function of the entire contents of the input tables, without reference to a concept of time or order. This simplifies implementation, because it avoids ordering problems found so pervasively in distributed systems. Thus, versus hand-code state machines, nlog offers better hope of correctness and easier quality assurance, since it allows programmers to specify the desired results rather than all of the possible state transitions that could lead there.

Topics include:

You can reach Teemu at koponen@styra.com and Yusheng at yshwang@vmware.com.

OVS Orbit is produced by Ben Pfaff. The intro and bumper music is Electro Deluxe, featuring Gurdonack, copyright 2014 by My Free Mickey. The outro music is Girls like you, featuring Thespinwires, copyright 2014 by Stefan Kartenberg. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (34 MB, 38 min).

Episode 4: Cilium, with Thomas Graf from Cisco (May 21, 2016)

Interview with Thomas Graf of Cisco, regarding the Cilium project.

Cilium is a “science project” that Thomas and others at Cisco and elsewhere are hacking on, to address the question of how to address policy in a legacy-free container environment that scales to millions of endpoints. It's an experiment because the outcome isn't yet certain, and it's a question that hasn't seen much work outside of hyperscale providers.

Cilium is based on eBPF, a Linux kernel technology that introduces the ability for userspace to inject custom programs into the kernel using a bytecode analogous to Java virtual machine bytecode. Cilium uses eBPF-based hooks can intercept packets at various places in their path through the kernel to implement a flexible policy engine.

Topics include:

More information about Cilium: slides and the code repository.

You can find Thomas on the ovs-dev mailing list, @tgraf__ on Twitter, or on Facebook.

OVS Orbit is produced by Ben Pfaff. The intro and bumper music is Electro Deluxe, featuring Gurdonack, copyright 2014 by My Free Mickey. The outro music is Girls like you, featuring Thespinwires, copyright 2014 by Stefan Kartenberg. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (30 MB, 32 min).

Episode 3: OVS in Production, with Chad Norgan from Rackspace (May 8, 2016)

Interview with Chad Norgan of Rackspace, about use of Open vSwitch at Rackspace over the years.

Topics include:

Chad can be contacted at @chadnorgan on Twitter, as BeardyMcBeard on the freenode IRC network

OVS Orbit is produced by Ben Pfaff. The intro and bumper music is Electro Deluxe, featuring Gurdonack, copyright 2014 by My Free Mickey. The outro music is Girls like you, featuring Thespinwires, copyright 2014 by Stefan Kartenberg. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (17 MB, 19 min).

Episode 2: OPNFV and OVS, with Dave Neary from Red Hat (May 4, 2016)

Interview with Dave Neary of Red Hat, concerning OPNFV and its relationship with Open vSwitch.

Topics include:

You can find Dave at @nearyd on Twitter.

OVS Orbit is produced by Ben Pfaff. The intro and bumper music is Electro Deluxe, featuring Gurdonack, copyright 2014 by My Free Mickey. The outro music is Girls like you, featuring Thespinwires, copyright 2014 by Stefan Kartenberg. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (21 MB, 23 min).

Episode 1: Porting OVS to Hyper-V, with Alessandro Pilotti from Cloudbase (May 1, 2016)

An interview with Alessandro Pilotti of Cloudbase, which Alessandro describes as the company that takes care of everything related to Microsoft technologies in OpenStack. The interview focuses on the Open vSwitch port to Hyper-V, to which Cloudbase is a top contributor.

Highlights and topics in this episode include:

OVS Orbit is produced by Ben Pfaff. The intro and bumper music is Electro Deluxe, featuring Gurdonack, copyright 2014 by My Free Mickey. The outro music is Girls like you, featuring Thespinwires, copyright 2014 by Stefan Kartenberg. All content is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.

Listen: MP3 (31 MB, 34 min).