[openstack-dev] [Nova][blueprint] Accelerate the booting process of a number of vms via VMThunder
lihuiba
magazine.lihuiba at 163.com
Thu Apr 17 13:33:12 UTC 2014
>IMO we'd better to use backend storage optimized approach to access
>remote image from compute node instead of using iSCSI only. And from
>my experience, I'm sure iSCSI is short of stability under heavy I/O
>workload in product environment, it could causes either VM filesystem
>to be marked as readonly or VM kernel panic.Yes, in this situation, the problem lies in the backend storage, so no otherprotocol will perform better. However, P2P transferring will greatly reduceworkload on the backend storage, so as to increase responsiveness.
>As I said currently Nova already has image caching mechanism, so in
>this case P2P is just an approach could be used for downloading or
>preheating for image caching.
Nova's image caching is file level, while VMThunder's is block-level. And
VMThunder is for working in conjunction with Cinder, not Glance. VMThunder
currently uses facebook's flashcache to realize caching, and dm-cache,
bcache are also options in the future.
>I think P2P transferring/pre-caching sounds a good way to go, as I
>mentioned as well, but actually for the area I'd like to see something
>like zero-copy + CoR. On one hand we can leverage the capability of
>on-demand downloading image bits by zero-copy approach, on the other
>hand we can prevent to reading data from remote image every time by
>CoR.
Yes, on-demand transferring is what you mean by "zero-copy", and caching
is something close to CoR. In fact, we are working on a kernel module called
foolcache that realize a true CoR. See https://github.com/lihuiba/dm-foolcache.
National Key Laboratory for Parallel and Distributed
Processing, College of Computer Science, National University of Defense
Technology, Changsha, Hunan Province, P.R. China
410073
At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihuiba at 163.com> wrote:
>>>IMHO, zero-copy approach is better
>> VMThunder's "on-demand transferring" is the same thing as your "zero-copy
>> approach".
>> VMThunder is uses iSCSI as the transferring protocol, which is option #b of
>> yours.
>>
>
>IMO we'd better to use backend storage optimized approach to access
>remote image from compute node instead of using iSCSI only. And from
>my experience, I'm sure iSCSI is short of stability under heavy I/O
>workload in product environment, it could causes either VM filesystem
>to be marked as readonly or VM kernel panic.
>
>>
>>>Under #b approach, my former experience from our previous similar
>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage
>>>nodes (general *local SAS disk*, without any storage backend) +
>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>>>VMs in a minute.
>> suppose booting one instance requires reading 300MB of data, so 500 ones
>> require 150GB. Each of the storage server needs to send data at a rate of
>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden even
>> for high-end storage appliances. In production systems, this request
>> (booting
>> 500 VMs in one shot) will significantly disturb other running instances
>> accessing the same storage nodes.
>>
>> VMThunder eliminates this problem by P2P transferring and on-compute-node
>> caching. Even a pc server with one 1gb NIC (this is a true pc server!) can
>> boot
>> 500 VMs in a minute with ease. For the first time, VMThunder makes bulk
>> provisioning of VMs practical for production cloud systems. This is the
>> essential
>> value of VMThunder.
>>
>
>As I said currently Nova already has image caching mechanism, so in
>this case P2P is just an approach could be used for downloading or
>preheating for image caching.
>
>I think P2P transferring/pre-caching sounds a good way to go, as I
>mentioned as well, but actually for the area I'd like to see something
>like zero-copy + CoR. On one hand we can leverage the capability of
>on-demand downloading image bits by zero-copy approach, on the other
>hand we can prevent to reading data from remote image every time by
>CoR.
>
>zhiyan
>
>>
>>
>>
>> ===================================================
>> From: Zhi Yan Liu <lzy.dev at gmail.com>
>> Date: 2014-04-17 0:02 GMT+08:00
>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting
>> process of a number of vms via VMThunder
>> To: "OpenStack Development Mailing List (not for usage questions)"
>> <openstack-dev at lists.openstack.org>
>>
>>
>>
>> Hello Yongquan Fu,
>>
>> My thoughts:
>>
>> 1. Currently Nova has already supported image caching mechanism. It
>> could caches the image on compute host which VM had provisioning from
>> it before, and next provisioning (boot same image) doesn't need to
>> transfer it again only if cache-manger clear it up.
>> 2. P2P transferring and prefacing is something that still based on
>> copy mechanism, IMHO, zero-copy approach is better, even
>> transferring/prefacing could be optimized by such approach. (I have
>> not check "on-demand transferring" of VMThunder, but it is a kind of
>> transferring as well, at last from its literal meaning).
>> And btw, IMO, we have two ways can go follow zero-copy idea:
>> a. when Nova and Glance use same backend storage, we could use storage
>> special CoW/snapshot approach to prepare VM disk instead of
>> copy/transferring image bits (through HTTP/network or local copy).
>> b. without "unified" storage, we could attach volume/LUN to compute
>> node from backend storage as a base image, then do such CoW/snapshot
>> on it to prepare root/ephemeral disk of VM. This way just like
>> boot-from-volume but different is that we do CoW/snapshot on Nova side
>> instead of Cinder/storage side.
>>
>> For option #a, we have already got some progress:
>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location
>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler
>>
>> Under #b approach, my former experience from our previous similar
>> Cloud deployment (not OpenStack) was that: under 2 PC server storage
>> nodes (general *local SAS disk*, without any storage backend) +
>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500
>> VMs in a minute.
>>
>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing
>> is one of optimized approach for image transferring valuably.
>>
>> zhiyan
>>
>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyongf at gmail.com> wrote:
>>>
>>> Dear all,
>>>
>>>
>>>
>>> We would like to present an extension to the vm-booting functionality of
>>> Nova when a number of homogeneous vms need to be launched at the same
>>> time.
>>>
>>>
>>>
>>> The motivation for our work is to increase the speed of provisioning vms
>>> for
>>> large-scale scientific computing and big data processing. In that case, we
>>> often need to boot tens and hundreds virtual machine instances at the same
>>> time.
>>>
>>>
>>> Currently, under the Openstack, we found that creating a large number
>>> of
>>> virtual machine instances is very time-consuming. The reason is the
>>> booting
>>> procedure is a centralized operation that involve performance bottlenecks.
>>> Before a virtual machine can be actually started, OpenStack either copy
>>> the
>>> image file (swift) or attach the image volume (cinder) from storage server
>>> to compute node via network. Booting a single VM need to read a large
>>> amount
>>> of image data from the image storage server. So creating a large number of
>>> virtual machine instances would cause a significant workload on the
>>> servers.
>>> The servers become quite busy even unavailable during the deployment
>>> phase.
>>> It would consume a very long time before the whole virtual machine cluster
>>> useable.
>>>
>>>
>>>
>>> Our extension is based on our work on vmThunder, a novel mechanism
>>> accelerating the deployment of large number virtual machine instances. It
>>> is
>>> written in Python, can be integrated with OpenStack easily. VMThunder
>>> addresses the problem described above by following improvements: on-demand
>>> transferring (network attached storage), compute node caching, P2P
>>> transferring and prefetching. VMThunder is a scalable and cost-effective
>>> accelerator for bulk provisioning of virtual machines.
>>>
>>>
>>>
>>> We hope to receive your feedbacks. Any comments are extremely welcome.
>>> Thanks in advance.
>>>
>>>
>>>
>>> PS:
>>>
>>>
>>>
>>> VMThunder enhanced nova blueprint:
>>> https://blueprints.launchpad.net/nova/+spec/thunderboost
>>>
>>> VMThunder standalone project: https://launchpad.net/vmthunder;
>>>
>>> VMThunder prototype: https://github.com/lihuiba/VMThunder
>>>
>>> VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder
>>>
>>> VMThunder portal: http://www.vmthunder.org/
>>>
>>> VMThunder paper:
>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> vmThunder development group
>>>
>>> PDL
>>>
>>> National University of Defense Technology
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> --
>> Yongquan Fu
>> PhD, Assistant Professor,
>> National Key Laboratory for Parallel and Distributed
>> Processing, College of Computer Science, National University of Defense
>> Technology, Changsha, Hunan Province, P.R. China
>> 410073
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140417/66824548/attachment.html>
More information about the OpenStack-dev
mailing list