[openstack-dev] [Nova][blueprint] Accelerate the booting process of a number of vms via VMThunder

Zhi Yan Liu lzy.dev at gmail.com
Thu Apr 17 22:05:23 UTC 2014


Replied as inline comments.

On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>IMO we'd better to use backend storage optimized approach to access
>>remote image from compute node instead of using iSCSI only. And from
>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>workload in product environment, it could causes either VM filesystem
>>to be marked as readonly or VM kernel panic.
>
> Yes, in this situation, the problem lies in the backend storage, so no other
>
> protocol will perform better. However, P2P transferring will greatly reduce
>
> workload on the backend storage, so as to increase responsiveness.
>

It's not 100% true, in my case at last. We fixed this problem by
network interface driver, it causes kernel panic and readonly issues
under heavy networking workload actually.

>
>
>>As I said currently Nova already has image caching mechanism, so in
>>this case P2P is just an approach could be used for downloading or
>>preheating for image caching.
>
> Nova's image caching is file level, while VMThunder's is block-level. And
>
> VMThunder is for working in conjunction with Cinder, not Glance. VMThunder
>
> currently uses facebook's flashcache to realize caching, and dm-cache,
>
> bcache are also options in the future.
>

Hm if you say bcache, dm-cache and flashcache, I'm just thinking if
them could be leveraged by operation/best-practice level.

btw, we are doing some works to make Glance to integrate Cinder as a
unified block storage backend.

>
>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>mentioned as well, but actually for the area I'd like to see something
>>like zero-copy + CoR. On one hand we can leverage the capability of
>>on-demand downloading image bits by zero-copy approach, on the other
>>hand we can prevent to reading data from remote image every time by
>>CoR.
>
> Yes, on-demand transferring is what you mean by "zero-copy", and caching
> is something close to CoR. In fact, we are working on a kernel module called
> foolcache that realize a true CoR. See
> https://github.com/lihuiba/dm-foolcache.
>

Yup. And it's really interesting to me, will take a look, thanks for sharing.

>
>
>
> National Key Laboratory for Parallel and Distributed
> Processing, College of Computer Science, National University of Defense
> Technology, Changsha, Hunan Province, P.R. China
> 410073
>
>
> At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>>>IMHO, zero-copy approach is better
>>> VMThunder's "on-demand transferring" is the same thing as your "zero-copy
>>> approach".
>>> VMThunder is uses iSCSI as the transferring protocol, which is option #b
>>> of
>>> yours.
>>>
>>
>>IMO we'd better to use backend storage optimized approach to access
>>remote image from compute node instead of using iSCSI only. And from
>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>workload in product environment, it could causes either VM filesystem
>>to be marked as readonly or VM kernel panic.
>>
>>>
>>>>Under #b approach, my former experience from our previous similar
>>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>>nodes (general *local SAS disk*, without any storage backend) +
>>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>>VMs in a minute.
>>> suppose booting one instance requires reading 300MB of data, so 500 ones
>>> require 150GB.  Each of the storage server needs to send data at a rate
>>> of
>>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden even
>>> for high-end storage appliances. In production  systems, this request
>>> (booting
>>> 500 VMs in one shot) will significantly disturb  other running instances
>>> accessing the same storage nodes.
>>>

btw, I believe the case/numbers is not true as well, since remote
image bits could be loaded on-demand instead of load them all on boot
stage.

zhiyan

>>> VMThunder eliminates this problem by P2P transferring and on-compute-node
>>> caching. Even a pc server with one 1gb NIC (this is a true pc server!)
>>> can
>>> boot
>>> 500 VMs in a minute with ease. For the first time, VMThunder makes bulk
>>> provisioning of VMs practical for production cloud systems. This is the
>>> essential
>>> value of VMThunder.
>>>
>>
>>As I said currently Nova already has image caching mechanism, so in
>>this case P2P is just an approach could be used for downloading or
>>preheating for image caching.
>>
>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>mentioned as well, but actually for the area I'd like to see something
>>like zero-copy + CoR. On one hand we can leverage the capability of
>>on-demand downloading image bits by zero-copy approach, on the other
>>hand we can prevent to reading data from remote image every time by
>>CoR.
>>
>>zhiyan
>>
>>>
>>>
>>>
>>> ===================================================
>>> From: Zhi Yan Liu <lzy.dev at gmail.com>
>>> Date: 2014-04-17 0:02 GMT+08:00
>>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting
>>> process of a number of vms via VMThunder
>>> To: "OpenStack Development Mailing List (not for usage questions)"
>>> <openstack-dev at lists.openstack.org>
>>>
>>>
>>>
>>> Hello Yongquan Fu,
>>>
>>> My thoughts:
>>>
>>> 1. Currently Nova has already supported image caching mechanism. It
>>> could caches the image on compute host which VM had provisioning from
>>> it before, and next provisioning (boot same image) doesn't need to
>>> transfer it again only if cache-manger clear it up.
>>> 2. P2P transferring and prefacing is something that still based on
>>> copy mechanism, IMHO, zero-copy approach is better, even
>>> transferring/prefacing could be optimized by such approach. (I have
>>> not check "on-demand transferring" of VMThunder, but it is a kind of
>>> transferring as well, at last from its literal meaning).
>>> And btw, IMO, we have two ways can go follow zero-copy idea:
>>> a. when Nova and Glance use same backend storage, we could use storage
>>> special CoW/snapshot approach to prepare VM disk instead of
>>> copy/transferring image bits (through HTTP/network or local copy).
>>> b. without "unified" storage, we could attach volume/LUN to compute
>>> node from backend storage as a base image, then do such CoW/snapshot
>>> on it to prepare root/ephemeral disk of VM. This way just like
>>> boot-from-volume but different is that we do CoW/snapshot on Nova side
>>> instead of Cinder/storage side.
>>>
>>> For option #a, we have already got some progress:
>>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location
>>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
>>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler
>>>
>>> Under #b approach, my former experience from our previous similar
>>> Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>> nodes (general *local SAS disk*, without any storage backend) +
>>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>> VMs in a minute.
>>>
>>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing
>>> is one of optimized approach for image transferring valuably.
>>>
>>> zhiyan
>>>
>>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyongf at gmail.com> wrote:
>>>>
>>>> Dear all,
>>>>
>>>>
>>>>
>>>>  We would like to present an extension to the vm-booting functionality
>>>> of
>>>> Nova when a number of homogeneous vms need to be launched at the same
>>>> time.
>>>>
>>>>
>>>>
>>>> The motivation for our work is to increase the speed of provisioning vms
>>>> for
>>>> large-scale scientific computing and big data processing. In that case,
>>>> we
>>>> often need to boot tens and hundreds virtual machine instances at the
>>>> same
>>>> time.
>>>>
>>>>
>>>>     Currently, under the Openstack, we found that creating a large
>>>> number
>>>> of
>>>> virtual machine instances is very time-consuming. The reason is the
>>>> booting
>>>> procedure is a centralized operation that involve performance
>>>> bottlenecks.
>>>> Before a virtual machine can be actually started, OpenStack either copy
>>>> the
>>>> image file (swift) or attach the image volume (cinder) from storage
>>>> server
>>>> to compute node via network. Booting a single VM need to read a large
>>>> amount
>>>> of image data from the image storage server. So creating a large number
>>>> of
>>>> virtual machine instances would cause a significant workload on the
>>>> servers.
>>>> The servers become quite busy even unavailable during the deployment
>>>> phase.
>>>> It would consume a very long time before the whole virtual machine
>>>> cluster
>>>> useable.
>>>>
>>>>
>>>>
>>>>   Our extension is based on our work on vmThunder, a novel mechanism
>>>> accelerating the deployment of large number virtual machine instances.
>>>> It
>>>> is
>>>> written in Python, can be integrated with OpenStack easily. VMThunder
>>>> addresses the problem described above by following improvements:
>>>> on-demand
>>>> transferring (network attached storage), compute node caching, P2P
>>>> transferring and prefetching. VMThunder is a scalable and cost-effective
>>>> accelerator for bulk provisioning of virtual machines.
>>>>
>>>>
>>>>
>>>>   We hope to receive your feedbacks. Any comments are extremely welcome.
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>> PS:
>>>>
>>>>
>>>>
>>>> VMThunder enhanced nova blueprint:
>>>> https://blueprints.launchpad.net/nova/+spec/thunderboost
>>>>
>>>>  VMThunder standalone project: https://launchpad.net/vmthunder;
>>>>
>>>>  VMThunder prototype: https://github.com/lihuiba/VMThunder
>>>>
>>>>  VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder
>>>>
>>>>  VMThunder portal: http://www.vmthunder.org/
>>>>
>>>> VMThunder paper:
>>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf
>>>>
>>>>
>>>>
>>>>   Regards
>>>>
>>>>
>>>>
>>>>   vmThunder development group
>>>>
>>>>   PDL
>>>>
>>>>   National University of Defense Technology
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>> --
>>> Yongquan Fu
>>> PhD, Assistant Professor,
>>> National Key Laboratory for Parallel and Distributed
>>> Processing, College of Computer Science, National University of Defense
>>> Technology, Changsha, Hunan Province, P.R. China
>>> 410073
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>OpenStack-dev at lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list