For optimal reading, please switch to desktop mode.
This analysis was performed using Kata containers version 1.6.2, the latest at
the time of writing.
After attending a Kata Containers workshop
at OpenInfra Days 2019 in London, we were
impressed by their start-up time, only marginally slower compared to ordinary
runC containers in a Kubernetes cluster. We were naturally curious about their
disk I/O bound performance and whether they also live up to the speed claims.
In this article we explore this subject with a view to understanding the
trade offs of using this technology in environments where I/O bound performance
and security are both critical requirements.
What are Kata containers?
Kata containers are lightweight VMs designed to integrate seamlessly with
container orchestration software like Docker and Kubernetes. One envisaged use
case is running untrusted workloads, exploiting the additional isolation gained
by not sharing the Operating System kernel with the host.
However, the unquestioning assumption that using a guest kernel leads to
additional security is challenged in a recent survey of virtual machines and
containers. Kata has roots in Intel Clear
Containers and Hyper runV technology. They are also often mentioned alongside
gVisor, which aims to solve a similar problem by
filtering and redirecting system calls to a separate user space kernel. As a
result gVisor suffers from runtime performance penalties. Further
discussion on gVisor is out of scope in this blog.
Configuring Kubernetes for Kata
Kata containers are OCI conformant which
means that a Container Runtime Interface (CRI) that supports external runtime
classes can use Kata to run workloads. Examples of these CRIs currently include
CRI-O and containerd
which both use runC by default, but this can be swapped for the kata-qemu
runtime. From Kubernetes 1.14+ onwards, the RuntimeClass feature flag has been
promoted to beta, therefore enabled by default. Consequently the setup is relatively
straightforward.
At present Kata supports qemu and firecracker hypervisor
backends, but the support for the latter is considered preliminary,
especially a lack of host to guest file sharing.
This leaves us with kata-qemu as the current option, in which
virtio-9p provides the basic shared filesystem functionalities
critical for this analysis (the test path is a network filesystem
mounted on the host).
This example Gist shows how
to swap runC for Kata runtime in a Minikube cluster. Note that at the time of
writing, Kata containers have additional host requirements:
Without these prerequisites Kata startup will fail silently
(we learnt this the hard way).
For this analysis a baremetal Kubernetes cluster was deployed, using OpenStack Heat to
provision the machines via our appliances playbooks and Kubespray to configure them as a
Kubernetes cluster. Kubespray supports
specification of container runtimes other than Docker, e.g. CRI-O and
containerd, which is required to support the Kata runtime.
Results
Disk I/O Bandwidth
In these results we plot the aggregate bandwidth across all clients,
demonstrating the scale-up bandwidth achievable by a single client and the
scale-out throughput achieved across many clients.
Commit Latency Cumulative Distribution Function (CDF)
In latency-sensitive workloads, I/O latency can dominate. I/O
operation commit latency is plotted on a logarithmic scale, to fit
a very broad range of data points.
Number of clients > |
1 |
8 |
64 |
Mode |
Scenario |
50% |
99% |
50% |
99% |
50% |
99% |
sequential read |
bare |
1581 |
2670 |
2416 |
3378 |
14532 |
47095 |
runC |
2007 |
2506 |
2391 |
3907 |
15062 |
46022 |
Kata |
4112 |
4620 |
12648 |
46464 |
86409 |
563806 |
random read |
bare |
970 |
2342 |
2580 |
3305 |
14935 |
43884 |
runC |
1155 |
2277 |
2506 |
3856 |
15378 |
42229 |
Kata |
5472 |
6586 |
13517 |
31080 |
109805 |
314277 |
sequential write |
bare |
1011 |
1728 |
2592 |
15023 |
3730 |
258834 |
runC |
1011 |
1990 |
2547 |
14892 |
4308 |
233832 |
Kata |
3948 |
4882 |
4102 |
6160 |
14821 |
190742 |
random write |
bare |
1269 |
2023 |
3698 |
11616 |
19722 |
159285 |
runC |
1286 |
1957 |
3928 |
11796 |
19374 |
151756 |
Kata |
4358 |
5275 |
4566 |
14254 |
1780559 |
15343845 |
Table summarising the 50% and the 99% commit latencies (in μs)
corresponding to the figure shown earlier.
Looking Ahead
In an I/O intensive scenario such as this one, Kata containers do not yet match the
performance of conventional containers.
It is clear from the results that there are significant trade offs
to consider when choosing between bare metal, runC and Kata containers.
While runC containers provide valuable abstractions for most use
cases, they still leave the host kernel vulnerable to exploit with
the system call interface as attack surface. Kata containers provide
hardware-supported isolation but currently there is significant
performance overhead, especially for disk I/O bound operations.
Kata's development roadmap and pace of evolution provide substantial
grounds for optimism. The Kata team are aware of the performance
drawbacks of using virtio-9p as the storage driver for sharing
paths between host and guest VMs.
Kata version 1.7 (due on 15 May 2019) is expected to ship with
experimental support for virtio-fs which is expected to improve I/O performance
issues. Preliminary results look encouraging, with
other published benchmarks reporting the virtio-fs driver
demonstrating 2x to 8x disk I/O bandwidth improvement over virtio-9p.
We will repeat our analysis when the new capabilities become available.
In the meantime, if you would like to get in touch we would love to hear
from you, especifically if there is a specific configuration which we may not
have considered. Reach out to us on Twitter
or directly via our contact page.