On Sun, 2020-03-22 at 13:28 -0400, Hongbin Lu wrote:
> Hi all,
>
> As we are approaching the end of Ussuri cycle, I would like to take a
> chance to introduce a new feature the Zun team implemented in this
> cycle - CRI integration [1].
>
> As known by people, Zun is an OpenStack Container service. It
> provides API for users to create and manage application containers in
> an OpenStack cloud. The main concepts in Zun are "Container" and
> "Capsule". A container is a single container, while a capsule is a
> group of co-located and co-scheduled containers (basically the same
> as k8s pod).
>
> Probably, the "Container" concept is more widely used. People can use
> the /containers API endpoint to create and manage a single container.
> Under the hook, a container is a Docker container in a compute node.
> What is special is that each Docker container is given a Neutron port
> so the container is connected to a tenant network in Neutron. Kuryr-
> libnetwork is the Docker network plugin we use to perform the Neutron
> port binding which basically connects the container to the virtual
> switch managed by Neutron.
>
> As mentioned before, the concept of "Capsule" in Zun is basically the
> same as pod in k8s. We introduced this concept mainly for k8s
> integration. Roughly speaking, the Zun-k8s integration is achieved by
> (i) registering a special node in k8s, (ii) watching the k8s API for
> pods being scheduled to this node and (iii) invoking Zun's /capsules
> API endpoint to create a capsule for each incoming pod. The (i) and
> (ii) is done by a CNCF sandbox project called Virtual Kubelet [2].
> The (iii) is achieved by providing an OpenStack provider [3] for
> Virtual Kubelet. The special node registered by Virtual Kubelet is
> called a virtual node because the node doesn't physically exist. Pods
> being scheduled to the virtual node is basically offloaded from the
> current k8s cluster, eventually landed on an external platform such
> as an OpenStack cloud.
>
> In high level, what is offered to end-users is a "serverless
> kubernetes pod" [4]. This term basically means the ability to run
> pods on demand without planing the capacity (i.e. nodes) upfront. An
> example of that is AWS EKS on Fargate [5]. In comparison, the
> traditional approach is to create an entire k8s cluster upfront in
> order to run the workload. Let's give a simple example. Suppose you
> want to run a pod, the traditional approach is to provision a k8s
> cluster with a worker node. Then, run the pod on the worker node. In
> contract, the "serverless" approach is to create a k8s cluster
> without any worker node and the pod is offloaded to a cloud provider
> that provisions the pods at runtime. This approach works well for
> applications who have fluctuated workloads so it is hard to provision
> a cluster with the right size for them. Furthermore, from cloud
> provider's perspective, if all tenant users offloads their pods to
> the cloud, the cloud provider might be able to pack the workload
> better (i.e. with fewer physical nodes) thus saving cost.
>
> Under the hook, a capsule is a podsandbox with one or more containers
> in a CRI runtime (i.e. containerd). Compared to Docker, a CRI runtime
> has a better support for the pod concept so we chose it to implement
> capsule. A caveat is that CRI requires a CNI plugin for the
> networking, so we need to implement a CNI plugin for Zun (called zun-
> cni). The role of CNI plugin is similar as kuryr-libnetwork that we
> are using for Docker except it implements a different networking
> model (CNI). I summaries it as below:
Hi,
I noticed that Zun's CNI plugin [1] is basically a simplified version
of kuryr-kubernetes code. While it's totally fine you've copied that, I
wonder what modifications had been made to make it suit Zun? Is there a
chance to converge this to make Zun use kuryr-kubernetes directly so
that we won't develop two versions of that code in parallel?
Right. I also investigated the possibilities of reusing the kuryr-kubernetes codebase as well. Definitely, some codes are common among two projects. If we can move the common code to a library (i.e. kuryr-lib), Zun should be able to directly consume the code. In particular, I am interesting to directly consume the CNI binding code (kuryr_kubernetes/cni/binding/) and the VIF versioned object (kuryr_kubernetes/objects).
Most parts of kuryr-kubernetes code is coupling with the "list-and-watch" logic against k8s API. Zun is not able to reuse those part of code. However, I do advocate to move all the common code to kuryr-lib so Zun can reuse it whenever it is appropriate.
Thanks,
Michał
[1] https://github.com/openstack/zun/tree/master/zun/cni
> +--------------+------------------------+---------------+
> | Concept | Container | Capsule (Pod) |
> +--------------+------------------------+---------------+
> | API endpoint | /containers | /capsules |
> | Engine | Docker | CRI runtime |
> | Network | kuryr-libnetwork (CNM) | zun-cni (CNI) |
> +--------------+------------------------+---------------+
>
> Typically, a CRI runtime works well with Kata Container which
> provides hypervisor-based isolation for neighboring containers in the
> same node. As a result, it is secure to consolidate pods from
> different tenants into a single node which increases the resource
> utilization. For deployment, a typical stack looks like below:
>
> +----------------------------------------------+
> | k8s control plane |
> +----------------------------------------------+
> | Virtual Kubelet (OpenStack provider) |
> +----------------------------------------------+
> | OpenStack control plane (Zun, Neutron, etc.) |
> +----------------------------------------------+
> | OpenStack data plane |
> | (Zun compute agent, Neutron OVS agent, etc.) |
> +----------------------------------------------+
> | Containerd (with CRI plugin) |
> +----------------------------------------------+
> | Kata Container |
> +----------------------------------------------+
>
> In this stack, if a user creates a deployment or pod in k8s, the k8s
> scheduler will schedule the pod to the virtual node registered by
> Virtual Kubelet. Virtual Kubelet will pick up the pod and let the
> configured cloud provider to handle it. The cloud provider invokes
> Zun API to create a capsule. Upon receiving the API request to create
> a capsule, Zun scheduler will schedule the capsule to a compute node.
> The Zun compute agent in that node will provision the capsule using a
> CRI runtime (containerd in this example). The Zun-CRI runtime
> communication is done via a gRPC protocol through a unix socket. The
> CRI runtime will first create the pod in Kata Container (or runc as
> an alternative) that realizes the pod using a lightweight VM.
> Furthermore, the CRI runtime will use a CNI plugin, which is the zun-
> cni binary, to setup the network. The zun-cni binary is a thin
> executable that dispatches the CNI command to a daemon service called
> zun-cni-daemon. The community is via HTTP within localhost. The zun-
> cni-daemon will look up the Neutron port information from DB and
> perform the port binding.
>
> In conclusion, starting from Ussuri, Zun adds support for CRI-
> compatible runtime. Zun uses CRI runtime to realize the concept of
> pod. Using this feature together with Virtual Kubelet and Kata
> Container, we can offer "serverless kubernetes pod" service which is
> comparable with AWS EKS with Fargate.
>
> [1] https://blueprints.launchpad.net/zun/+spec/add-support-cri-runtime
> [2] https://github.com/virtual-kubelet/virtual-kubelet
> [3] https://github.com/virtual-kubelet/openstack-zun
> [4] https://aws.amazon.com/about-aws/whats-new/2019/12/run-serverless-kubernetes-pods-using-amazon-eks-and-aws-fargate/
> [5] https://aws.amazon.com/blogs/aws/amazon-eks-on-aws-fargate-now-generally-available/