[openstack-dev] [Nova][blueprint] Accelerate the booting process of a number of vms via VMThunder

lihuiba magazine.lihuiba at 163.com
Fri Apr 18 14:52:28 UTC 2014


>btw, I see but at the moment we had fixed it by network interface>device driver instead of workaround - to limit network traffic slow>down.Which kind of driver, in host kernel, in guest kernel or in openstack?





>There are few works done in Glance
>(https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ),
>but some work still need to be taken I'm sure. There are something on
>drafting, and some dependencies need to be resolved as well.
I read the blueprints carefully, but still have some doubts.
Will it store an image as a single volume in cinder? Or store all image files
in one shared volume (with a file system on the volume, of course)?
Openstack already has support to convert an image to a volume, and to boot
from a volume. Are these features similar to this blueprint?




Huiba Li


National Key Laboratory for Parallel and Distributed
Processing, College of Computer Science, National University of Defense
Technology, Changsha, Hunan Province, P.R. China
410073




At 2014-04-18 12:14:25,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>On Fri, Apr 18, 2014 at 10:53 AM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>>It's not 100% true, in my case at last. We fixed this problem by
>>>network interface driver, it causes kernel panic and readonly issues
>>>under heavy networking workload actually.
>>
>> Network traffic control could help. The point is to ensure no instance
>> is starved to death. Traffic control can be done with tc.
>>
>
>btw, I see but at the moment we had fixed it by network interface
>device driver instead of workaround - to limit network traffic slow
>down.
>
>>
>>
>>>btw, we are doing some works to make Glance to integrate Cinder as a
>>>unified block storage
>> backend.
>> That sounds interesting. Is there some  more materials?
>>
>
>There are few works done in Glance
>(https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ),
>but some work still need to be taken I'm sure. There are something on
>drafting, and some dependencies need to be resolved as well.
>
>>
>>
>> At 2014-04-18 06:05:23,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>>Replied as inline comments.
>>>
>>>On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>>>>IMO we'd better to use backend storage optimized approach to access
>>>>>remote image from compute node instead of using iSCSI only. And from
>>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>>>workload in product environment, it could causes either VM filesystem
>>>>>to be marked as readonly or VM kernel panic.
>>>>
>>>> Yes, in this situation, the problem lies in the backend storage, so no
>>>> other
>>>>
>>>> protocol will perform better. However, P2P transferring will greatly
>>>> reduce
>>>>
>>>> workload on the backend storage, so as to increase responsiveness.
>>>>
>>>
>>>It's not 100% true, in my case at last. We fixed this problem by
>>>network interface driver, it causes kernel panic and readonly issues
>>>under heavy networking workload actually.
>>>
>>>>
>>>>
>>>>>As I said currently Nova already has image caching mechanism, so in
>>>>>this case P2P is just an approach could be used for downloading or
>>>>>preheating for image caching.
>>>>
>>>> Nova's image caching is file level, while VMThunder's is block-level. And
>>>>
>>>> VMThunder is for working in conjunction with Cinder, not Glance.
>>>> VMThunder
>>>>
>>>> currently uses facebook's flashcache to realize caching, and dm-cache,
>>>>
>>>> bcache are also options in the future.
>>>>
>>>
>>>Hm if you say bcache, dm-cache and flashcache, I'm just thinking if
>>>them could be leveraged by operation/best-practice level.
>>>
>>>btw, we are doing some works to make Glance to integrate Cinder as a
>>>unified block storage backend.
>>>
>>>>
>>>>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>>>>mentioned as well, but actually for the area I'd like to see something
>>>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>>>on-demand downloading image bits by zero-copy approach, on the other
>>>>>hand we can prevent to reading data from remote image every time by
>>>>>CoR.
>>>>
>>>> Yes, on-demand transferring is what you mean by "zero-copy", and caching
>>>> is something close to CoR. In fact, we are working on a kernel module
>>>> called
>>>> foolcache that realize a true CoR. See
>>>> https://github.com/lihuiba/dm-foolcache.
>>>>
>>>
>>>Yup. And it's really interesting to me, will take a look, thanks for
>>> sharing.
>>>
>>>>
>>>>
>>>>
>>>> National Key Laboratory for Parallel and Distributed
>>>> Processing, College of Computer Science, National University of Defense
>>>> Technology, Changsha, Hunan Province, P.R. China
>>>> 410073
>>>>
>>>>
>>>> At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>>>>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihuiba at 163.com>
>>>>> wrote:
>>>>>>>IMHO, zero-copy approach is better
>>>>>> VMThunder's "on-demand transferring" is the same thing as your
>>>>>> "zero-copy
>>>>>> approach".
>>>>>> VMThunder is uses iSCSI as the transferring protocol, which is option
>>>>>> #b
>>>>>> of
>>>>>> yours.
>>>>>>
>>>>>
>>>>>IMO we'd better to use backend storage optimized approach to access
>>>>>remote image from compute node instead of using iSCSI only. And from
>>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>>>workload in product environment, it could causes either VM filesystem
>>>>>to be marked as readonly or VM kernel panic.
>>>>>
>>>>>>
>>>>>>>Under #b approach, my former experience from our previous similar
>>>>>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>>>>>nodes (general *local SAS disk*, without any storage backend) +
>>>>>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>>>>>VMs in a minute.
>>>>>> suppose booting one instance requires reading 300MB of data, so 500
>>>>>> ones
>>>>>> require 150GB.  Each of the storage server needs to send data at a rate
>>>>>> of
>>>>>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden
>>>>>> even
>>>>>> for high-end storage appliances. In production  systems, this request
>>>>>> (booting
>>>>>> 500 VMs in one shot) will significantly disturb  other running
>>>>>> instances
>>>>>> accessing the same storage nodes.
>>>>>>
>>>
>>>btw, I believe the case/numbers is not true as well, since remote
>>>image bits could be loaded on-demand instead of load them all on boot
>>>stage.
>>>
>>>zhiyan
>>>
>>>>>> VMThunder eliminates this problem by P2P transferring and
>>>>>> on-compute-node
>>>>>> caching. Even a pc server with one 1gb NIC (this is a true pc server!)
>>>>>> can
>>>>>> boot
>>>>>> 500 VMs in a minute with ease. For the first time, VMThunder makes bulk
>>>>>> provisioning of VMs practical for production cloud systems. This is the
>>>>>> essential
>>>>>> value of VMThunder.
>>>>>>
>>>>>
>>>>>As I said currently Nova already has image caching mechanism, so in
>>>>>this case P2P is just an approach could be used for downloading or
>>>>>preheating for image caching.
>>>>>
>>>>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>>>>mentioned as well, but actually for the area I'd like to see something
>>>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>>>on-demand downloading image bits by zero-copy approach, on the other
>>>>>hand we can prevent to reading data from remote image every time by
>>>>>CoR.
>>>>>
>>>>>zhiyan
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ===================================================
>>>>>> From: Zhi Yan Liu <lzy.dev at gmail.com>
>>>>>> Date: 2014-04-17 0:02 GMT+08:00
>>>>>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting
>>>>>> process of a number of vms via VMThunder
>>>>>> To: "OpenStack Development Mailing List (not for usage questions)"
>>>>>> <openstack-dev at lists.openstack.org>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hello Yongquan Fu,
>>>>>>
>>>>>> My thoughts:
>>>>>>
>>>>>> 1. Currently Nova has already supported image caching mechanism. It
>>>>>> could caches the image on compute host which VM had provisioning from
>>>>>> it before, and next provisioning (boot same image) doesn't need to
>>>>>> transfer it again only if cache-manger clear it up.
>>>>>> 2. P2P transferring and prefacing is something that still based on
>>>>>> copy mechanism, IMHO, zero-copy approach is better, even
>>>>>> transferring/prefacing could be optimized by such approach. (I have
>>>>>> not check "on-demand transferring" of VMThunder, but it is a kind of
>>>>>> transferring as well, at last from its literal meaning).
>>>>>> And btw, IMO, we have two ways can go follow zero-copy idea:
>>>>>> a. when Nova and Glance use same backend storage, we could use storage
>>>>>> special CoW/snapshot approach to prepare VM disk instead of
>>>>>> copy/transferring image bits (through HTTP/network or local copy).
>>>>>> b. without "unified" storage, we could attach volume/LUN to compute
>>>>>> node from backend storage as a base image, then do such CoW/snapshot
>>>>>> on it to prepare root/ephemeral disk of VM. This way just like
>>>>>> boot-from-volume but different is that we do CoW/snapshot on Nova side
>>>>>> instead of Cinder/storage side.
>>>>>>
>>>>>> For option #a, we have already got some progress:
>>>>>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location
>>>>>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
>>>>>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler
>>>>>>
>>>>>> Under #b approach, my former experience from our previous similar
>>>>>> Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>>>> nodes (general *local SAS disk*, without any storage backend) +
>>>>>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>>>> VMs in a minute.
>>>>>>
>>>>>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing
>>>>>> is one of optimized approach for image transferring valuably.
>>>>>>
>>>>>> zhiyan
>>>>>>
>>>>>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyongf at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  We would like to present an extension to the vm-booting functionality
>>>>>>> of
>>>>>>> Nova when a number of homogeneous vms need to be launched at the same
>>>>>>> time.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The motivation for our work is to increase the speed of provisioning
>>>>>>> vms
>>>>>>> for
>>>>>>> large-scale scientific computing and big data processing. In that
>>>>>>> case,
>>>>>>> we
>>>>>>> often need to boot tens and hundreds virtual machine instances at the
>>>>>>> same
>>>>>>> time.
>>>>>>>
>>>>>>>
>>>>>>>     Currently, under the Openstack, we found that creating a large
>>>>>>> number
>>>>>>> of
>>>>>>> virtual machine instances is very time-consuming. The reason is the
>>>>>>> booting
>>>>>>> procedure is a centralized operation that involve performance
>>>>>>> bottlenecks.
>>>>>>> Before a virtual machine can be actually started, OpenStack either
>>>>>>> copy
>>>>>>> the
>>>>>>> image file (swift) or attach the image volume (cinder) from storage
>>>>>>> server
>>>>>>> to compute node via network. Booting a single VM need to read a large
>>>>>>> amount
>>>>>>> of image data from the image storage server. So creating a large
>>>>>>> number
>>>>>>> of
>>>>>>> virtual machine instances would cause a significant workload on the
>>>>>>> servers.
>>>>>>> The servers become quite busy even unavailable during the deployment
>>>>>>> phase.
>>>>>>> It would consume a very long time before the whole virtual machine
>>>>>>> cluster
>>>>>>> useable.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   Our extension is based on our work on vmThunder, a novel mechanism
>>>>>>> accelerating the deployment of large number virtual machine instances.
>>>>>>> It
>>>>>>> is
>>>>>>> written in Python, can be integrated with OpenStack easily. VMThunder
>>>>>>> addresses the problem described above by following improvements:
>>>>>>> on-demand
>>>>>>> transferring (network attached storage), compute node caching, P2P
>>>>>>> transferring and prefetching. VMThunder is a scalable and
>>>>>>> cost-effective
>>>>>>> accelerator for bulk provisioning of virtual machines.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   We hope to receive your feedbacks. Any comments are extremely
>>>>>>> welcome.
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> PS:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> VMThunder enhanced nova blueprint:
>>>>>>> https://blueprints.launchpad.net/nova/+spec/thunderboost
>>>>>>>
>>>>>>>  VMThunder standalone project: https://launchpad.net/vmthunder;
>>>>>>>
>>>>>>>  VMThunder prototype: https://github.com/lihuiba/VMThunder
>>>>>>>
>>>>>>>  VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder
>>>>>>>
>>>>>>>  VMThunder portal: http://www.vmthunder.org/
>>>>>>>
>>>>>>> VMThunder paper:
>>>>>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   Regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   vmThunder development group
>>>>>>>
>>>>>>>   PDL
>>>>>>>
>>>>>>>   National University of Defense Technology
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Yongquan Fu
>>>>>> PhD, Assistant Professor,
>>>>>> National Key Laboratory for Parallel and Distributed
>>>>>> Processing, College of Computer Science, National University of Defense
>>>>>> Technology, Changsha, Hunan Province, P.R. China
>>>>>> 410073
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>
>>>>>_______________________________________________
>>>>>OpenStack-dev mailing list
>>>>>OpenStack-dev at lists.openstack.org
>>>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>_______________________________________________
>>>OpenStack-dev mailing list
>>>OpenStack-dev at lists.openstack.org
>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140418/d8229a88/attachment.html>


More information about the OpenStack-dev mailing list