How OCaml Is Democratizing Operating Systems with Packet
OCaml Labs was created with a simple mission: to build reliable, secure, scalable computer systems using understandable programming languages. And by understandable, that means they “don’t require PhDs to write,” quips Anil Madhavapeddy, the University of Cambridge Computer Laboratory professor who runs the group.
Over the past 15 years, the lab has developed and open sourced such technologies as the MirageOS unikernal framework, and done research on using custom kernels to build large-scale systems with minimal amounts of code exposed at runtime. “The idea is that whenever you build an application, you strip away all the pieces of the deployment infrastructure that are necessary, and you end up with a really minimal, specialized operating system,” says Madhavapeddy.
But the work that the lab is best known for is the OCaml language, which was developed in the early 1990s. Designed for functional programming, OCaml “is an industrial-grade language that can be used to build very large-scale systems, but in this mathematical style, so that you can start reasoning about them and applying a bit of rigor to them,” says Madhavapeddy. Due to these benefits, as well as its portability, OCaml’s popularity skyrocketed over the past few years, and today, the language is used worldwide for mission-critical systems by companies such as Jane Street Capital and Facebook. “It’s one of these quiet languages being used to build large-scale systems with very little drama,” he adds.
The Challenges of Supporting a Growing Language
But keeping drama at bay with such a portable language requires a lot of CI and testing on many different architectures. “There’s pretty much no CPU on which OCaml cannot run,” says Madhavapeddy, so the team maintaining it has to “support running it on all kinds of exotic hardware, and it’s really hard to find that hardware for CI purposes.”
As an open source project hosted at universities, OCaml didn’t have easy access to such a large number of resources. “The problem becomes one of scale because you’re now trying everyday to build thousands of packages across a cluster of machines, across multiple CPU architectures, across multiple operating systems,” says Madhavapeddy. “We have to support 14 variances of Linux and Windows and macOS and even Solaris, because we still run there.”
The team approached different cloud providers for help. Some offered short-term free credits, and Rackspace gave OCaml free access to about $5000 worth of virtual machine resources every month. “That was massively appreciated,” he says, “but it became kind of obvious that it’s not enough to run the builds we need.”
Partnering with Packet
Everything changed when Madhavapeddy talked to Packet in 2017. Rackspace had decided to terminate its cloud support program, and OCaml had to find another solution. “Packet was almost unconditional in its support,” he says. “We just need access to drama-free, extremely fast infrastructure where we can have low-level control, and Packet gave us that.”
In 2018, funding for the support began coming through Works on Arm, a collaborative project to expand the ecosystem for Armv8 in the data center.
One great benefit to working with Packet is the access that the OCaml team has to bare metal machines, rather than the virtual machines with other cloud providers. “We work on the lowest level of the system, so we need access to the bare metal machines in order to do accurate benchmarking, accurate profiling, and so on,” Madhavapeddy says. “They have machines that are really big, so we don’t have to deal with managing large clusters of machines. We can often just deploy one Packet machine and build thousands of packages in one go.”
A challenge OCaml regularly faces is getting access to and supporting all kinds of emerging hardware that might eventually become more prevalent. Packet is helping with that, too: Through Works on Arm, OCaml has access to server-class Arm hardware.
Before, Madhavapeddy was thinking of building a cluster of 1,000 Raspberry Pis in order to build at the same scale as he would on x86. “You can imagine what a pain this is to have a 1,000 tiny machines hooked together, and if one of them goes wrong, the end result is madness,” he says. “Now we can support Arm as a first-class citizen in our CI, along with x86. That’s the first time we’ve had a non-x86 architecture supported in the same tier that we support x86, and that’s a big deal.”
As a result, the OCaml team just published a paper showing that OCaml can have feature parity across Arm, x86, and PowerPC. “The access to this hardware given to us by Packet and IBM let us do this top-tier computer science research to make sure that OCaml wouldn’t be pigeonholed into being an x86-only language,” he says.
Contributing Research Back to Works on Arm
On a daily basis, the OCaml team pumps hundreds of thousands of builds through the machines across x86, Arm, and PowerPC, and generates CI results and logs. “We use those results to get Packet and Arm an alternative workload to their conventional workloads,” says Madhavapeddy. “It’s very easy to measure those workloads on x86 vs. Arm, but also to stress test the Arm machines themselves.”
Using Packet infrastructure, OCaml became one of the first languages to push multiarch images into Docker Hub. 'So now, whenever you’re building something using OCaml and Docker, there are multiarch images for x86_64, Arm64, Arm 32, and PowerPC64 as well.'
Through Works on Arm, the team regularly gets access to experimental new machines from vendors. “I’ve personally melted a few of the ThunderX machines, because our workloads are so heavy that they just got too hot and they actually melted!” he reports. “We have direct access to high quality technical support for bare metal machines. This enables us to do the work we need to do, and we can give Packet high quality feedback on new machines as quickly as possible.”
OCaml has given valuable feedback for figuring out how to balance Arm machines. “Often they have too much memory or too much CPU or their disks are too slow, and if any one of these things isn’t quite balanced, then it means that the whole machine slows down and the cost is too high,” Madhavapeddy explains. “So we’ve run unusual-in-memory workloads where we can test the CPU performance or run build tests for the disks along with the memory.”
One test the OCaml team ran showed that reducing the memory on a 96-core ThunderX machine from 128 gigabytes to 32 gigabytes of RAM caused performance to drop by an order of magnitude. “Contributing to that balance process has been interesting,” he says. “There’s no point having a lot of one resource that can’t be used because of a lack of another resource. That’s the basic bare metal design question that I think Packet is trying to figure out for its customers.”
The Databox Project
A Packet feature has also enabled OCaml Lab’s Databox project. Packet is one of the few cloud providers to expose the Border Gateway Protocol (BGP) to users, and the team is experimenting with that access to develop a way to give every single person on earth an individual data store for privacy-sensitive information. The goal: “If you’re sending traffic from your phone to your laptop, it’s not going to any third-party cloud provider,” says Madhavapeddy. “It’s going directly over encrypted lines over the internet to your devices in a peer-to-peer fashion, and in such a way that’s very usable and very resilient, just like the way the internet routes around problems in its infrastructure.”
The team recently published a paper about the project.
“The fact that Packet exposes BGP, and we’re already familiar with the low-level parts of Packet from our existing usage, means that we might be able to do some world-changing stuff based on the learnings we’ve had from the day-to-day stuff,” he adds. “It’s quite good fun.”
Democratizing Operating Systems
Of course, the day-to-day stuff is important, too. Using Packet infrastructure, OCaml became one of the first languages to push multiarch images into Docker Hub. “So now, whenever you’re building something using OCaml and Docker, there are multiarch images for x86_64, Arm64, Arm 32, and PowerPC64 as well,” says Madhavapeddy. “We’ve unlocked the potential of all of these other machines to conveniently use CI based on our programming language. And before, that was just not possible because we didn’t have the physical infrastructure to build all those other variants.”
Packet is currently working with IBM to get PowerPC hardware so that OCaml can use Packet for all of these builds. And OCaml is hoping that in the future, MacOS and Windows can run on Packet as well.
And that’s OCaml’s overarching ambition for its partnership with Packet. “We want a democratization of operating systems and architectures,” Madhavapeddy says. “Our goal is for OCaml to support everything equally, irrespective of market share. We’re just learning to close the gaps in the open source matrix. If we can do this, we can help build a template that other open source projects can use. We view this as a maintainer’s responsibility to keep all the paths open where we can. And Packet is one of the few games in town that recognizes the importance of this as well.”