[k8s][zun] Introduce a new feature in Ussuri - CRI integration - openstack-discuss

22 Mar 2020

      Hi all,

As we are approaching the end of Ussuri cycle, I would like to take a
chance to introduce a new feature the Zun team implemented in this cycle -
CRI integration [1].

As known by people, Zun is an OpenStack Container service. It provides API
for users to create and manage application containers in an OpenStack
cloud. The main concepts in Zun are "Container" and "Capsule". A container
is a single container, while a capsule is a group of co-located and
co-scheduled containers (basically the same as k8s pod).

Probably, the "Container" concept is more widely used. People can use the
/containers API endpoint to create and manage a single container. Under the
hook, a container is a Docker container in a compute node. What is special
is that each Docker container is given a Neutron port so the container is
connected to a tenant network in Neutron. Kuryr-libnetwork is the Docker
network plugin we use to perform the Neutron port binding which basically
connects the container to the virtual switch managed by Neutron.

As mentioned before, the concept of "Capsule" in Zun is basically the same
as pod in k8s. We introduced this concept mainly for k8s integration.
Roughly speaking, the Zun-k8s integration is achieved by (i) registering a
special node in k8s, (ii) watching the k8s API for pods being scheduled to
this node and (iii) invoking Zun's /capsules API endpoint to create a
capsule for each incoming pod. The (i) and (ii) is done by a CNCF sandbox
project called Virtual Kubelet [2]. The (iii) is achieved by providing an
OpenStack provider [3] for Virtual Kubelet. The special node registered by
Virtual Kubelet is called a virtual node because the node doesn't
physically exist. Pods being scheduled to the virtual node is
basically offloaded from the current k8s cluster, eventually landed on an
external platform such as an OpenStack cloud.

In high level, what is offered to end-users is a "serverless kubernetes
pod" [4]. This term basically means the ability to run pods on demand
without planing the capacity (i.e. nodes) upfront. An example of that is
AWS EKS on Fargate [5]. In comparison, the traditional approach is to
create an entire k8s cluster upfront in order to run the workload. Let's
give a simple example. Suppose you want to run a pod, the traditional
approach is to provision a k8s cluster with a worker node. Then, run the
pod on the worker node. In contract, the "serverless" approach is to create
a k8s cluster without any worker node and the pod is offloaded to a cloud
provider that provisions the pods at runtime. This approach works well for
applications who have fluctuated workloads so it is hard to provision a
cluster with the right size for them. Furthermore, from cloud provider's
perspective, if all tenant users offloads their pods to the cloud, the
cloud provider might be able to pack the workload better (i.e. with fewer
physical nodes) thus saving cost.

Under the hook, a capsule is a podsandbox with one or more containers in a
CRI runtime (i.e. containerd). Compared to Docker, a CRI runtime has a
better support for the pod concept so we chose it to implement capsule. A
caveat is that CRI requires a CNI plugin for the networking, so we need to
implement a CNI plugin for Zun (called zun-cni). The role of CNI plugin is
similar as kuryr-libnetwork that we are using for Docker except it
implements a different networking model (CNI). I summaries it as below:

+--------------+------------------------+---------------+
| Concept      | Container              | Capsule (Pod) |
+--------------+------------------------+---------------+
| API endpoint | /containers            | /capsules     |
| Engine       | Docker                 | CRI runtime   |
| Network      | kuryr-libnetwork (CNM) | zun-cni (CNI) |
+--------------+------------------------+---------------+

Typically, a CRI runtime works well with Kata Container which provides
hypervisor-based isolation for neighboring containers in the same node. As
a result, it is secure to consolidate pods from different tenants into a
single node which increases the resource utilization. For deployment, a
typical stack looks like below:

+----------------------------------------------+
| k8s control plane                            |
+----------------------------------------------+
| Virtual Kubelet (OpenStack provider)         |
+----------------------------------------------+
| OpenStack control plane (Zun, Neutron, etc.) |
+----------------------------------------------+
| OpenStack data plane                         |
| (Zun compute agent, Neutron OVS agent, etc.) |
+----------------------------------------------+
| Containerd (with CRI plugin)                 |
+----------------------------------------------+
| Kata Container                               |
+----------------------------------------------+

In this stack, if a user creates a deployment or pod in k8s, the k8s
scheduler will schedule the pod to the virtual node registered by Virtual
Kubelet. Virtual Kubelet will pick up the pod and let the configured cloud
provider to handle it. The cloud provider invokes Zun API to create a
capsule. Upon receiving the API request to create a capsule, Zun scheduler
will schedule the capsule to a compute node. The Zun compute agent in that
node will provision the capsule using a CRI runtime (containerd in this
example). The Zun-CRI runtime communication is done via a gRPC protocol
through a unix socket. The CRI runtime will first create the pod in Kata
Container (or runc as an alternative) that realizes the pod using a
lightweight VM. Furthermore, the CRI runtime will use a CNI plugin, which
is the zun-cni binary, to setup the network. The zun-cni binary is a thin
executable that dispatches the CNI command to a daemon service called
zun-cni-daemon. The community is via HTTP within localhost. The
zun-cni-daemon will look up the Neutron port information from DB and
perform the port binding.

In conclusion, starting from Ussuri, Zun adds support for CRI-compatible
runtime. Zun uses CRI runtime to realize the concept of pod. Using this
feature together with Virtual Kubelet and Kata Container, we can offer
"serverless kubernetes pod" service which is comparable with AWS EKS with
Fargate.

[1] https://blueprints.launchpad.net/zun/+spec/add-support-cri-runtime
<https://github.com/virtual-kubelet/openstack-zun>
[2] https://github.com/virtual-kubelet/virtual-kubelet
[3] https://github.com/virtual-kubelet/openstack-zun
[4]
https://aws.amazon.com/about-aws/whats-new/2019/12/run-serverless-kubernetes...
[5]
https://aws.amazon.com/blogs/aws/amazon-eks-on-aws-fargate-now-generally-ava...

[k8s][zun] Introduce a new feature in Ussuri - CRI integration

Hongbin Lu

Thierry Carrez

mdulko＠redhat.com

Hongbin Lu

mdulko＠redhat.com

Hongbin Lu

mdulko＠redhat.com

Hongbin Lu

mdulko＠redhat.com

tags

participants (3)