[openstack-dev] [Nova][blueprint] Accelerate the booting process of a number of vms via VMThunder

Zhi Yan Liu lzy.dev at gmail.com
Fri Apr 18 04:14:25 UTC 2014


On Fri, Apr 18, 2014 at 10:53 AM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>It's not 100% true, in my case at last. We fixed this problem by
>>network interface driver, it causes kernel panic and readonly issues
>>under heavy networking workload actually.
>
> Network traffic control could help. The point is to ensure no instance
> is starved to death. Traffic control can be done with tc.
>

btw, I see but at the moment we had fixed it by network interface
device driver instead of workaround - to limit network traffic slow
down.

>
>
>>btw, we are doing some works to make Glance to integrate Cinder as a
>>unified block storage
> backend.
> That sounds interesting. Is there some  more materials?
>

There are few works done in Glance
(https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ),
but some work still need to be taken I'm sure. There are something on
drafting, and some dependencies need to be resolved as well.

>
>
> At 2014-04-18 06:05:23,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>Replied as inline comments.
>>
>>On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>>>IMO we'd better to use backend storage optimized approach to access
>>>>remote image from compute node instead of using iSCSI only. And from
>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>>workload in product environment, it could causes either VM filesystem
>>>>to be marked as readonly or VM kernel panic.
>>>
>>> Yes, in this situation, the problem lies in the backend storage, so no
>>> other
>>>
>>> protocol will perform better. However, P2P transferring will greatly
>>> reduce
>>>
>>> workload on the backend storage, so as to increase responsiveness.
>>>
>>
>>It's not 100% true, in my case at last. We fixed this problem by
>>network interface driver, it causes kernel panic and readonly issues
>>under heavy networking workload actually.
>>
>>>
>>>
>>>>As I said currently Nova already has image caching mechanism, so in
>>>>this case P2P is just an approach could be used for downloading or
>>>>preheating for image caching.
>>>
>>> Nova's image caching is file level, while VMThunder's is block-level. And
>>>
>>> VMThunder is for working in conjunction with Cinder, not Glance.
>>> VMThunder
>>>
>>> currently uses facebook's flashcache to realize caching, and dm-cache,
>>>
>>> bcache are also options in the future.
>>>
>>
>>Hm if you say bcache, dm-cache and flashcache, I'm just thinking if
>>them could be leveraged by operation/best-practice level.
>>
>>btw, we are doing some works to make Glance to integrate Cinder as a
>>unified block storage backend.
>>
>>>
>>>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>>>mentioned as well, but actually for the area I'd like to see something
>>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>>on-demand downloading image bits by zero-copy approach, on the other
>>>>hand we can prevent to reading data from remote image every time by
>>>>CoR.
>>>
>>> Yes, on-demand transferring is what you mean by "zero-copy", and caching
>>> is something close to CoR. In fact, we are working on a kernel module
>>> called
>>> foolcache that realize a true CoR. See
>>> https://github.com/lihuiba/dm-foolcache.
>>>
>>
>>Yup. And it's really interesting to me, will take a look, thanks for
>> sharing.
>>
>>>
>>>
>>>
>>> National Key Laboratory for Parallel and Distributed
>>> Processing, College of Computer Science, National University of Defense
>>> Technology, Changsha, Hunan Province, P.R. China
>>> 410073
>>>
>>>
>>> At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>>>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihuiba at 163.com>
>>>> wrote:
>>>>>>IMHO, zero-copy approach is better
>>>>> VMThunder's "on-demand transferring" is the same thing as your
>>>>> "zero-copy
>>>>> approach".
>>>>> VMThunder is uses iSCSI as the transferring protocol, which is option
>>>>> #b
>>>>> of
>>>>> yours.
>>>>>
>>>>
>>>>IMO we'd better to use backend storage optimized approach to access
>>>>remote image from compute node instead of using iSCSI only. And from
>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>>workload in product environment, it could causes either VM filesystem
>>>>to be marked as readonly or VM kernel panic.
>>>>
>>>>>
>>>>>>Under #b approach, my former experience from our previous similar
>>>>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>>>>nodes (general *local SAS disk*, without any storage backend) +
>>>>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>>>>VMs in a minute.
>>>>> suppose booting one instance requires reading 300MB of data, so 500
>>>>> ones
>>>>> require 150GB.  Each of the storage server needs to send data at a rate
>>>>> of
>>>>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden
>>>>> even
>>>>> for high-end storage appliances. In production  systems, this request
>>>>> (booting
>>>>> 500 VMs in one shot) will significantly disturb  other running
>>>>> instances
>>>>> accessing the same storage nodes.
>>>>>
>>
>>btw, I believe the case/numbers is not true as well, since remote
>>image bits could be loaded on-demand instead of load them all on boot
>>stage.
>>
>>zhiyan
>>
>>>>> VMThunder eliminates this problem by P2P transferring and
>>>>> on-compute-node
>>>>> caching. Even a pc server with one 1gb NIC (this is a true pc server!)
>>>>> can
>>>>> boot
>>>>> 500 VMs in a minute with ease. For the first time, VMThunder makes bulk
>>>>> provisioning of VMs practical for production cloud systems. This is the
>>>>> essential
>>>>> value of VMThunder.
>>>>>
>>>>
>>>>As I said currently Nova already has image caching mechanism, so in
>>>>this case P2P is just an approach could be used for downloading or
>>>>preheating for image caching.
>>>>
>>>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>>>mentioned as well, but actually for the area I'd like to see something
>>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>>on-demand downloading image bits by zero-copy approach, on the other
>>>>hand we can prevent to reading data from remote image every time by
>>>>CoR.
>>>>
>>>>zhiyan
>>>>
>>>>>
>>>>>
>>>>>
>>>>> ===================================================
>>>>> From: Zhi Yan Liu <lzy.dev at gmail.com>
>>>>> Date: 2014-04-17 0:02 GMT+08:00
>>>>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting
>>>>> process of a number of vms via VMThunder
>>>>> To: "OpenStack Development Mailing List (not for usage questions)"
>>>>> <openstack-dev at lists.openstack.org>
>>>>>
>>>>>
>>>>>
>>>>> Hello Yongquan Fu,
>>>>>
>>>>> My thoughts:
>>>>>
>>>>> 1. Currently Nova has already supported image caching mechanism. It
>>>>> could caches the image on compute host which VM had provisioning from
>>>>> it before, and next provisioning (boot same image) doesn't need to
>>>>> transfer it again only if cache-manger clear it up.
>>>>> 2. P2P transferring and prefacing is something that still based on
>>>>> copy mechanism, IMHO, zero-copy approach is better, even
>>>>> transferring/prefacing could be optimized by such approach. (I have
>>>>> not check "on-demand transferring" of VMThunder, but it is a kind of
>>>>> transferring as well, at last from its literal meaning).
>>>>> And btw, IMO, we have two ways can go follow zero-copy idea:
>>>>> a. when Nova and Glance use same backend storage, we could use storage
>>>>> special CoW/snapshot approach to prepare VM disk instead of
>>>>> copy/transferring image bits (through HTTP/network or local copy).
>>>>> b. without "unified" storage, we could attach volume/LUN to compute
>>>>> node from backend storage as a base image, then do such CoW/snapshot
>>>>> on it to prepare root/ephemeral disk of VM. This way just like
>>>>> boot-from-volume but different is that we do CoW/snapshot on Nova side
>>>>> instead of Cinder/storage side.
>>>>>
>>>>> For option #a, we have already got some progress:
>>>>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location
>>>>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
>>>>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler
>>>>>
>>>>> Under #b approach, my former experience from our previous similar
>>>>> Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>>> nodes (general *local SAS disk*, without any storage backend) +
>>>>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>>> VMs in a minute.
>>>>>
>>>>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing
>>>>> is one of optimized approach for image transferring valuably.
>>>>>
>>>>> zhiyan
>>>>>
>>>>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyongf at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>>
>>>>>>
>>>>>>  We would like to present an extension to the vm-booting functionality
>>>>>> of
>>>>>> Nova when a number of homogeneous vms need to be launched at the same
>>>>>> time.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The motivation for our work is to increase the speed of provisioning
>>>>>> vms
>>>>>> for
>>>>>> large-scale scientific computing and big data processing. In that
>>>>>> case,
>>>>>> we
>>>>>> often need to boot tens and hundreds virtual machine instances at the
>>>>>> same
>>>>>> time.
>>>>>>
>>>>>>
>>>>>>     Currently, under the Openstack, we found that creating a large
>>>>>> number
>>>>>> of
>>>>>> virtual machine instances is very time-consuming. The reason is the
>>>>>> booting
>>>>>> procedure is a centralized operation that involve performance
>>>>>> bottlenecks.
>>>>>> Before a virtual machine can be actually started, OpenStack either
>>>>>> copy
>>>>>> the
>>>>>> image file (swift) or attach the image volume (cinder) from storage
>>>>>> server
>>>>>> to compute node via network. Booting a single VM need to read a large
>>>>>> amount
>>>>>> of image data from the image storage server. So creating a large
>>>>>> number
>>>>>> of
>>>>>> virtual machine instances would cause a significant workload on the
>>>>>> servers.
>>>>>> The servers become quite busy even unavailable during the deployment
>>>>>> phase.
>>>>>> It would consume a very long time before the whole virtual machine
>>>>>> cluster
>>>>>> useable.
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Our extension is based on our work on vmThunder, a novel mechanism
>>>>>> accelerating the deployment of large number virtual machine instances.
>>>>>> It
>>>>>> is
>>>>>> written in Python, can be integrated with OpenStack easily. VMThunder
>>>>>> addresses the problem described above by following improvements:
>>>>>> on-demand
>>>>>> transferring (network attached storage), compute node caching, P2P
>>>>>> transferring and prefetching. VMThunder is a scalable and
>>>>>> cost-effective
>>>>>> accelerator for bulk provisioning of virtual machines.
>>>>>>
>>>>>>
>>>>>>
>>>>>>   We hope to receive your feedbacks. Any comments are extremely
>>>>>> welcome.
>>>>>> Thanks in advance.
>>>>>>
>>>>>>
>>>>>>
>>>>>> PS:
>>>>>>
>>>>>>
>>>>>>
>>>>>> VMThunder enhanced nova blueprint:
>>>>>> https://blueprints.launchpad.net/nova/+spec/thunderboost
>>>>>>
>>>>>>  VMThunder standalone project: https://launchpad.net/vmthunder;
>>>>>>
>>>>>>  VMThunder prototype: https://github.com/lihuiba/VMThunder
>>>>>>
>>>>>>  VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder
>>>>>>
>>>>>>  VMThunder portal: http://www.vmthunder.org/
>>>>>>
>>>>>> VMThunder paper:
>>>>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Regards
>>>>>>
>>>>>>
>>>>>>
>>>>>>   vmThunder development group
>>>>>>
>>>>>>   PDL
>>>>>>
>>>>>>   National University of Defense Technology
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Yongquan Fu
>>>>> PhD, Assistant Professor,
>>>>> National Key Laboratory for Parallel and Distributed
>>>>> Processing, College of Computer Science, National University of Defense
>>>>> Technology, Changsha, Hunan Province, P.R. China
>>>>> 410073
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>>>>_______________________________________________
>>>>OpenStack-dev mailing list
>>>>OpenStack-dev at lists.openstack.org
>>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>OpenStack-dev at lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list