[openstack-dev] [Nova][blueprint] Accelerate the booting process of a number of vms via VMThunder

lihuiba magazine.lihuiba at 163.com
Fri Apr 18 02:53:08 UTC 2014


>It's not 100% true, in my case at last. We fixed this problem by
>network interface driver, it causes kernel panic and readonly issues
>under heavy networking workload actually.
Network traffic control could help. The point is to ensure no instance
is starved to death. Traffic control can be done with tc.





>btw, we are doing some works to make Glance to integrate Cinder as a
>unified block storage backend.
That sounds interesting. Is there some  more materials?



At 2014-04-18 06:05:23,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>Replied as inline comments.
>
>On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>>IMO we'd better to use backend storage optimized approach to access
>>>remote image from compute node instead of using iSCSI only. And from
>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>workload in product environment, it could causes either VM filesystem
>>>to be marked as readonly or VM kernel panic.
>>
>> Yes, in this situation, the problem lies in the backend storage, so no other
>>
>> protocol will perform better. However, P2P transferring will greatly reduce
>>
>> workload on the backend storage, so as to increase responsiveness.
>>
>
>It's not 100% true, in my case at last. We fixed this problem by
>network interface driver, it causes kernel panic and readonly issues
>under heavy networking workload actually.
>
>>
>>
>>>As I said currently Nova already has image caching mechanism, so in
>>>this case P2P is just an approach could be used for downloading or
>>>preheating for image caching.
>>
>> Nova's image caching is file level, while VMThunder's is block-level. And
>>
>> VMThunder is for working in conjunction with Cinder, not Glance. VMThunder
>>
>> currently uses facebook's flashcache to realize caching, and dm-cache,
>>
>> bcache are also options in the future.
>>
>
>Hm if you say bcache, dm-cache and flashcache, I'm just thinking if
>them could be leveraged by operation/best-practice level.
>
>btw, we are doing some works to make Glance to integrate Cinder as a
>unified block storage backend.
>
>>
>>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>>mentioned as well, but actually for the area I'd like to see something
>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>on-demand downloading image bits by zero-copy approach, on the other
>>>hand we can prevent to reading data from remote image every time by
>>>CoR.
>>
>> Yes, on-demand transferring is what you mean by "zero-copy", and caching
>> is something close to CoR. In fact, we are working on a kernel module called
>> foolcache that realize a true CoR. See
>> https://github.com/lihuiba/dm-foolcache.
>>
>
>Yup. And it's really interesting to me, will take a look, thanks for sharing.
>
>>
>>
>>
>> National Key Laboratory for Parallel and Distributed
>> Processing, College of Computer Science, National University of Defense
>> Technology, Changsha, Hunan Province, P.R. China
>> 410073
>>
>>
>> At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>>>>IMHO, zero-copy approach is better
>>>> VMThunder's "on-demand transferring" is the same thing as your "zero-copy
>>>> approach".
>>>> VMThunder is uses iSCSI as the transferring protocol, which is option #b
>>>> of
>>>> yours.
>>>>
>>>
>>>IMO we'd better to use backend storage optimized approach to access
>>>remote image from compute node instead of using iSCSI only. And from
>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>workload in product environment, it could causes either VM filesystem
>>>to be marked as readonly or VM kernel panic.
>>>
>>>>
>>>>>Under #b approach, my former experience from our previous similar
>>>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>>>nodes (general *local SAS disk*, without any storage backend) +
>>>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>>>VMs in a minute.
>>>> suppose booting one instance requires reading 300MB of data, so 500 ones
>>>> require 150GB.  Each of the storage server needs to send data at a rate
>>>> of
>>>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden even
>>>> for high-end storage appliances. In production  systems, this request
>>>> (booting
>>>> 500 VMs in one shot) will significantly disturb  other running instances
>>>> accessing the same storage nodes.
>>>>
>
>btw, I believe the case/numbers is not true as well, since remote
>image bits could be loaded on-demand instead of load them all on boot
>stage.
>
>zhiyan
>
>>>> VMThunder eliminates this problem by P2P transferring and on-compute-node
>>>> caching. Even a pc server with one 1gb NIC (this is a true pc server!)
>>>> can
>>>> boot
>>>> 500 VMs in a minute with ease. For the first time, VMThunder makes bulk
>>>> provisioning of VMs practical for production cloud systems. This is the
>>>> essential
>>>> value of VMThunder.
>>>>
>>>
>>>As I said currently Nova already has image caching mechanism, so in
>>>this case P2P is just an approach could be used for downloading or
>>>preheating for image caching.
>>>
>>>I think  P2P transferring/pre-caching sounds a  good way to go, as I
>>>mentioned as well, but actually for the area I'd like to see something
>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>on-demand downloading image bits by zero-copy approach, on the other
>>>hand we can prevent to reading data from remote image every time by
>>>CoR.
>>>
>>>zhiyan
>>>
>>>>
>>>>
>>>>
>>>> ===================================================
>>>> From: Zhi Yan Liu <lzy.dev at gmail.com>
>>>> Date: 2014-04-17 0:02 GMT+08:00
>>>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting
>>>> process of a number of vms via VMThunder
>>>> To: "OpenStack Development Mailing List (not for usage questions)"
>>>> <openstack-dev at lists.openstack.org>
>>>>
>>>>
>>>>
>>>> Hello Yongquan Fu,
>>>>
>>>> My thoughts:
>>>>
>>>> 1. Currently Nova has already supported image caching mechanism. It
>>>> could caches the image on compute host which VM had provisioning from
>>>> it before, and next provisioning (boot same image) doesn't need to
>>>> transfer it again only if cache-manger clear it up.
>>>> 2. P2P transferring and prefacing is something that still based on
>>>> copy mechanism, IMHO, zero-copy approach is better, even
>>>> transferring/prefacing could be optimized by such approach. (I have
>>>> not check "on-demand transferring" of VMThunder, but it is a kind of
>>>> transferring as well, at last from its literal meaning).
>>>> And btw, IMO, we have two ways can go follow zero-copy idea:
>>>> a. when Nova and Glance use same backend storage, we could use storage
>>>> special CoW/snapshot approach to prepare VM disk instead of
>>>> copy/transferring image bits (through HTTP/network or local copy).
>>>> b. without "unified" storage, we could attach volume/LUN to compute
>>>> node from backend storage as a base image, then do such CoW/snapshot
>>>> on it to prepare root/ephemeral disk of VM. This way just like
>>>> boot-from-volume but different is that we do CoW/snapshot on Nova side
>>>> instead of Cinder/storage side.
>>>>
>>>> For option #a, we have already got some progress:
>>>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location
>>>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
>>>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler
>>>>
>>>> Under #b approach, my former experience from our previous similar
>>>> Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>> nodes (general *local SAS disk*, without any storage backend) +
>>>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>> VMs in a minute.
>>>>
>>>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing
>>>> is one of optimized approach for image transferring valuably.
>>>>
>>>> zhiyan
>>>>
>>>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyongf at gmail.com> wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>>
>>>>>
>>>>>  We would like to present an extension to the vm-booting functionality
>>>>> of
>>>>> Nova when a number of homogeneous vms need to be launched at the same
>>>>> time.
>>>>>
>>>>>
>>>>>
>>>>> The motivation for our work is to increase the speed of provisioning vms
>>>>> for
>>>>> large-scale scientific computing and big data processing. In that case,
>>>>> we
>>>>> often need to boot tens and hundreds virtual machine instances at the
>>>>> same
>>>>> time.
>>>>>
>>>>>
>>>>>     Currently, under the Openstack, we found that creating a large
>>>>> number
>>>>> of
>>>>> virtual machine instances is very time-consuming. The reason is the
>>>>> booting
>>>>> procedure is a centralized operation that involve performance
>>>>> bottlenecks.
>>>>> Before a virtual machine can be actually started, OpenStack either copy
>>>>> the
>>>>> image file (swift) or attach the image volume (cinder) from storage
>>>>> server
>>>>> to compute node via network. Booting a single VM need to read a large
>>>>> amount
>>>>> of image data from the image storage server. So creating a large number
>>>>> of
>>>>> virtual machine instances would cause a significant workload on the
>>>>> servers.
>>>>> The servers become quite busy even unavailable during the deployment
>>>>> phase.
>>>>> It would consume a very long time before the whole virtual machine
>>>>> cluster
>>>>> useable.
>>>>>
>>>>>
>>>>>
>>>>>   Our extension is based on our work on vmThunder, a novel mechanism
>>>>> accelerating the deployment of large number virtual machine instances.
>>>>> It
>>>>> is
>>>>> written in Python, can be integrated with OpenStack easily. VMThunder
>>>>> addresses the problem described above by following improvements:
>>>>> on-demand
>>>>> transferring (network attached storage), compute node caching, P2P
>>>>> transferring and prefetching. VMThunder is a scalable and cost-effective
>>>>> accelerator for bulk provisioning of virtual machines.
>>>>>
>>>>>
>>>>>
>>>>>   We hope to receive your feedbacks. Any comments are extremely welcome.
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> PS:
>>>>>
>>>>>
>>>>>
>>>>> VMThunder enhanced nova blueprint:
>>>>> https://blueprints.launchpad.net/nova/+spec/thunderboost
>>>>>
>>>>>  VMThunder standalone project: https://launchpad.net/vmthunder;
>>>>>
>>>>>  VMThunder prototype: https://github.com/lihuiba/VMThunder
>>>>>
>>>>>  VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder
>>>>>
>>>>>  VMThunder portal: http://www.vmthunder.org/
>>>>>
>>>>> VMThunder paper:
>>>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf
>>>>>
>>>>>
>>>>>
>>>>>   Regards
>>>>>
>>>>>
>>>>>
>>>>>   vmThunder development group
>>>>>
>>>>>   PDL
>>>>>
>>>>>   National University of Defense Technology
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>>
>>>> --
>>>> Yongquan Fu
>>>> PhD, Assistant Professor,
>>>> National Key Laboratory for Parallel and Distributed
>>>> Processing, College of Computer Science, National University of Defense
>>>> Technology, Changsha, Hunan Province, P.R. China
>>>> 410073
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>_______________________________________________
>>>OpenStack-dev mailing list
>>>OpenStack-dev at lists.openstack.org
>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140418/ef4f9c3d/attachment.html>


More information about the OpenStack-dev mailing list