[openstack-dev] [Nova][blueprint] Accelerate the booting process of a number of vms via VMThunder
Sheng Bo Hou
sbhou at cn.ibm.com
Tue Apr 22 02:59:43 UTC 2014
I actually support the idea Huiba has proposed, and I am thinking of how
to optimize the large data transfer(for example, 100G in a short time) as
well.
I registered two blueprints in nova-specs, one is for an image upload
plug-in to upload the image to glance(
https://review.openstack.org/#/c/84671/), the other is a data transfer
plug-in(https://review.openstack.org/#/c/87207/) for data migration among
nova nodes. I would like to see other transfer protocols, like FTP,
bitTorrent, p2p, etc, implemented for data transfer in OpenStack besides
HTTP.
Data transfer may have many use cases. I summarize them into two catalogs.
Please feel free to comment on it.
1. The machines are located in one network, e.g. one domain, one cluster,
etc. The characteristic is the machines can access each other directly via
the IP addresses(VPN is beyond consideration). In this case, data can be
transferred via iSCSI, NFS, and definitive zero-copy as Zhiyan mentioned.
2. The machines are located in different networks, e.g. two data centers,
two firewalls, etc. The characteristic is the machines can not access each
other directly via the IP addresses(VPN is beyond consideration). The
machines are isolated, so they can not be connected with iSCSI, NFS, etc.
In this case, data have to go via the protocols, like HTTP, FTP, p2p, etc.
I am not sure whether zero-copy can work for this case. Zhiyan, please
help me with this doubt.
I guess for data transfer, including image downloading, image uploading,
live migration, etc, OpenStack needs to taken into account the above two
catalogs for data transfer. It is hard to say that one protocol is better
than another, and one approach prevails another(BitTorrent is very cool,
but if there is only one source and only one target, it would not be that
faster than a direct FTP). The key is the use case(FYI:
http://amigotechnotes.wordpress.com/2013/12/23/file-transmission-with-different-sharing-solution-on-nas/
).
Jay Pipes has suggested we figure out a blueprint for a separate library
dedicated to the data(byte) transfer, which may be put in oslo and used by
any projects in need (Hoping Jay can come in:-)). Huiba, Zhiyan, everyone
else, do you think we come up with a blueprint about the data transfer in
oslo can work?
Best wishes,
Vincent Hou (侯胜博)
Staff Software Engineer, Open Standards and Open Source Team, Emerging
Technology Institute, IBM China Software Development Lab
Tel: 86-10-82450778 Fax: 86-10-82453660
Notes ID: Sheng Bo Hou/China/IBM at IBMCN E-mail: sbhou at cn.ibm.com
Address:3F Ring, Building 28 Zhongguancun Software Park, 8 Dongbeiwang
West Road, Haidian District, Beijing, P.R.C.100193
地址:北京市海淀区东北旺西路8号中关村软件园28号楼环宇大厦3层 邮编:100193
Zhi Yan Liu <lzy.dev at gmail.com>
2014/04/18 23:33
Please respond to
"OpenStack Development Mailing List \(not for usage questions\)"
<openstack-dev at lists.openstack.org>
To
"OpenStack Development Mailing List (not for usage questions)"
<openstack-dev at lists.openstack.org>,
cc
Subject
Re: [openstack-dev] [Nova][blueprint] Accelerate the booting process of a
number of vms via VMThunder
On Fri, Apr 18, 2014 at 10:52 PM, lihuiba <magazine.lihuiba at 163.com>
wrote:
>>btw, I see but at the moment we had fixed it by network interface
>>device driver instead of workaround - to limit network traffic slow
>>down.
> Which kind of driver, in host kernel, in guest kernel or in openstack?
>
In compute host kernel, doesn't related with OpenStack.
>
>
>>There are few works done in Glance
>>(https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ),
>>but some work still need to be taken I'm sure. There are something on
>>drafting, and some dependencies need to be resolved as well.
> I read the blueprints carefully, but still have some doubts.
> Will it store an image as a single volume in cinder? Or store all image
Yes
> files
> in one shared volume (with a file system on the volume, of course)?
> Openstack already has support to convert an image to a volume, and to
boot
> from a volume. Are these features similar to this blueprint?
Not similar but it could be leverage for this case.
>
I prefer to talk this details in IRC. (And I had read all VMThunder
code at today early (my timezone), there are some questions from me as
well)
zhiyan
>
> Huiba Li
>
> National Key Laboratory for Parallel and Distributed
> Processing, College of Computer Science, National University of Defense
> Technology, Changsha, Hunan Province, P.R. China
> 410073
>
>
> At 2014-04-18 12:14:25,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>On Fri, Apr 18, 2014 at 10:53 AM, lihuiba <magazine.lihuiba at 163.com>
wrote:
>>>>It's not 100% true, in my case at last. We fixed this problem by
>>>>network interface driver, it causes kernel panic and readonly issues
>>>>under heavy networking workload actually.
>>>
>>> Network traffic control could help. The point is to ensure no instance
>>> is starved to death. Traffic control can be done with tc.
>>>
>>
>>btw, I see but at the moment we had fixed it by network interface
>>device driver instead of workaround - to limit network traffic slow
>>down.
>>
>>>
>>>
>>>>btw, we are doing some works to make Glance to integrate Cinder as a
>>>>unified block storage
>>> backend.
>>> That sounds interesting. Is there some more materials?
>>>
>>
>>There are few works done in Glance
>>(https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ),
>>but some work still need to be taken I'm sure. There are something on
>>drafting, and some dependencies need to be resolved as well.
>>
>>>
>>>
>>> At 2014-04-18 06:05:23,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>>>Replied as inline comments.
>>>>
>>>>On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihuiba at 163.com>
>>>> wrote:
>>>>>>IMO we'd better to use backend storage optimized approach to access
>>>>>>remote image from compute node instead of using iSCSI only. And from
>>>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>>>>workload in product environment, it could causes either VM
filesystem
>>>>>>to be marked as readonly or VM kernel panic.
>>>>>
>>>>> Yes, in this situation, the problem lies in the backend storage, so
no
>>>>> other
>>>>>
>>>>> protocol will perform better. However, P2P transferring will greatly
>>>>> reduce
>>>>>
>>>>> workload on the backend storage, so as to increase responsiveness.
>>>>>
>>>>
>>>>It's not 100% true, in my case at last. We fixed this problem by
>>>>network interface driver, it causes kernel panic and readonly issues
>>>>under heavy networking workload actually.
>>>>
>>>>>
>>>>>
>>>>>>As I said currently Nova already has image caching mechanism, so in
>>>>>>this case P2P is just an approach could be used for downloading or
>>>>>>preheating for image caching.
>>>>>
>>>>> Nova's image caching is file level, while VMThunder's is
block-level.
>>>>> And
>>>>>
>>>>> VMThunder is for working in conjunction with Cinder, not Glance.
>>>>> VMThunder
>>>>>
>>>>> currently uses facebook's flashcache to realize caching, and
dm-cache,
>>>>>
>>>>> bcache are also options in the future.
>>>>>
>>>>
>>>>Hm if you say bcache, dm-cache and flashcache, I'm just thinking if
>>>>them could be leveraged by operation/best-practice level.
>>>>
>>>>btw, we are doing some works to make Glance to integrate Cinder as a
>>>>unified block storage backend.
>>>>
>>>>>
>>>>>>I think P2P transferring/pre-caching sounds a good way to go, as I
>>>>>>mentioned as well, but actually for the area I'd like to see
something
>>>>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>>>>on-demand downloading image bits by zero-copy approach, on the other
>>>>>>hand we can prevent to reading data from remote image every time by
>>>>>>CoR.
>>>>>
>>>>> Yes, on-demand transferring is what you mean by "zero-copy", and
>>>>> caching
>>>>> is something close to CoR. In fact, we are working on a kernel
module
>>>>> called
>>>>> foolcache that realize a true CoR. See
>>>>> https://github.com/lihuiba/dm-foolcache.
>>>>>
>>>>
>>>>Yup. And it's really interesting to me, will take a look, thanks for
>>>> sharing.
>>>>
>>>>>
>>>>>
>>>>>
>>>>> National Key Laboratory for Parallel and Distributed
>>>>> Processing, College of Computer Science, National University of
Defense
>>>>> Technology, Changsha, Hunan Province, P.R. China
>>>>> 410073
>>>>>
>>>>>
>>>>> At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy.dev at gmail.com> wrote:
>>>>>>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihuiba at 163.com>
>>>>>> wrote:
>>>>>>>>IMHO, zero-copy approach is better
>>>>>>> VMThunder's "on-demand transferring" is the same thing as your
>>>>>>> "zero-copy
>>>>>>> approach".
>>>>>>> VMThunder is uses iSCSI as the transferring protocol, which is
option
>>>>>>> #b
>>>>>>> of
>>>>>>> yours.
>>>>>>>
>>>>>>
>>>>>>IMO we'd better to use backend storage optimized approach to access
>>>>>>remote image from compute node instead of using iSCSI only. And from
>>>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O
>>>>>>workload in product environment, it could causes either VM
filesystem
>>>>>>to be marked as readonly or VM kernel panic.
>>>>>>
>>>>>>>
>>>>>>>>Under #b approach, my former experience from our previous similar
>>>>>>>>Cloud deployment (not OpenStack) was that: under 2 PC server
storage
>>>>>>>>nodes (general *local SAS disk*, without any storage backend) +
>>>>>>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning
>>>>>>>> 500
>>>>>>>>VMs in a minute.
>>>>>>> suppose booting one instance requires reading 300MB of data, so
500
>>>>>>> ones
>>>>>>> require 150GB. Each of the storage server needs to send data at a
>>>>>>> rate
>>>>>>> of
>>>>>>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy
burden
>>>>>>> even
>>>>>>> for high-end storage appliances. In production systems, this
request
>>>>>>> (booting
>>>>>>> 500 VMs in one shot) will significantly disturb other running
>>>>>>> instances
>>>>>>> accessing the same storage nodes.
>>>>>>>
>>>>
>>>>btw, I believe the case/numbers is not true as well, since remote
>>>>image bits could be loaded on-demand instead of load them all on boot
>>>>stage.
>>>>
>>>>zhiyan
>>>>
>>>>>>> VMThunder eliminates this problem by P2P transferring and
>>>>>>> on-compute-node
>>>>>>> caching. Even a pc server with one 1gb NIC (this is a true pc
>>>>>>> server!)
>>>>>>> can
>>>>>>> boot
>>>>>>> 500 VMs in a minute with ease. For the first time, VMThunder makes
>>>>>>> bulk
>>>>>>> provisioning of VMs practical for production cloud systems. This
is
>>>>>>> the
>>>>>>> essential
>>>>>>> value of VMThunder.
>>>>>>>
>>>>>>
>>>>>>As I said currently Nova already has image caching mechanism, so in
>>>>>>this case P2P is just an approach could be used for downloading or
>>>>>>preheating for image caching.
>>>>>>
>>>>>>I think P2P transferring/pre-caching sounds a good way to go, as I
>>>>>>mentioned as well, but actually for the area I'd like to see
something
>>>>>>like zero-copy + CoR. On one hand we can leverage the capability of
>>>>>>on-demand downloading image bits by zero-copy approach, on the other
>>>>>>hand we can prevent to reading data from remote image every time by
>>>>>>CoR.
>>>>>>
>>>>>>zhiyan
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ===================================================
>>>>>>> From: Zhi Yan Liu <lzy.dev at gmail.com>
>>>>>>> Date: 2014-04-17 0:02 GMT+08:00
>>>>>>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the
booting
>>>>>>> process of a number of vms via VMThunder
>>>>>>> To: "OpenStack Development Mailing List (not for usage questions)"
>>>>>>> <openstack-dev at lists.openstack.org>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hello Yongquan Fu,
>>>>>>>
>>>>>>> My thoughts:
>>>>>>>
>>>>>>> 1. Currently Nova has already supported image caching mechanism.
It
>>>>>>> could caches the image on compute host which VM had provisioning
from
>>>>>>> it before, and next provisioning (boot same image) doesn't need to
>>>>>>> transfer it again only if cache-manger clear it up.
>>>>>>> 2. P2P transferring and prefacing is something that still based on
>>>>>>> copy mechanism, IMHO, zero-copy approach is better, even
>>>>>>> transferring/prefacing could be optimized by such approach. (I
have
>>>>>>> not check "on-demand transferring" of VMThunder, but it is a kind
of
>>>>>>> transferring as well, at last from its literal meaning).
>>>>>>> And btw, IMO, we have two ways can go follow zero-copy idea:
>>>>>>> a. when Nova and Glance use same backend storage, we could use
>>>>>>> storage
>>>>>>> special CoW/snapshot approach to prepare VM disk instead of
>>>>>>> copy/transferring image bits (through HTTP/network or local copy).
>>>>>>> b. without "unified" storage, we could attach volume/LUN to
compute
>>>>>>> node from backend storage as a base image, then do such
CoW/snapshot
>>>>>>> on it to prepare root/ephemeral disk of VM. This way just like
>>>>>>> boot-from-volume but different is that we do CoW/snapshot on Nova
>>>>>>> side
>>>>>>> instead of Cinder/storage side.
>>>>>>>
>>>>>>> For option #a, we have already got some progress:
>>>>>>>
https://blueprints.launchpad.net/nova/+spec/image-multiple-location
>>>>>>>
https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
>>>>>>>
>>>>>>>
https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler
>>>>>>>
>>>>>>> Under #b approach, my former experience from our previous similar
>>>>>>> Cloud deployment (not OpenStack) was that: under 2 PC server
storage
>>>>>>> nodes (general *local SAS disk*, without any storage backend) +
>>>>>>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning
>>>>>>> 500
>>>>>>> VMs in a minute.
>>>>>>>
>>>>>>> For vmThunder topic I think it sounds a good idea, IMO P2P,
prefacing
>>>>>>> is one of optimized approach for image transferring valuably.
>>>>>>>
>>>>>>> zhiyan
>>>>>>>
>>>>>>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyongf at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> We would like to present an extension to the vm-booting
>>>>>>>> functionality
>>>>>>>> of
>>>>>>>> Nova when a number of homogeneous vms need to be launched at the
>>>>>>>> same
>>>>>>>> time.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The motivation for our work is to increase the speed of
provisioning
>>>>>>>> vms
>>>>>>>> for
>>>>>>>> large-scale scientific computing and big data processing. In that
>>>>>>>> case,
>>>>>>>> we
>>>>>>>> often need to boot tens and hundreds virtual machine instances at
>>>>>>>> the
>>>>>>>> same
>>>>>>>> time.
>>>>>>>>
>>>>>>>>
>>>>>>>> Currently, under the Openstack, we found that creating a
large
>>>>>>>> number
>>>>>>>> of
>>>>>>>> virtual machine instances is very time-consuming. The reason is
the
>>>>>>>> booting
>>>>>>>> procedure is a centralized operation that involve performance
>>>>>>>> bottlenecks.
>>>>>>>> Before a virtual machine can be actually started, OpenStack
either
>>>>>>>> copy
>>>>>>>> the
>>>>>>>> image file (swift) or attach the image volume (cinder) from
storage
>>>>>>>> server
>>>>>>>> to compute node via network. Booting a single VM need to read a
>>>>>>>> large
>>>>>>>> amount
>>>>>>>> of image data from the image storage server. So creating a large
>>>>>>>> number
>>>>>>>> of
>>>>>>>> virtual machine instances would cause a significant workload on
the
>>>>>>>> servers.
>>>>>>>> The servers become quite busy even unavailable during the
deployment
>>>>>>>> phase.
>>>>>>>> It would consume a very long time before the whole virtual
machine
>>>>>>>> cluster
>>>>>>>> useable.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Our extension is based on our work on vmThunder, a novel
mechanism
>>>>>>>> accelerating the deployment of large number virtual machine
>>>>>>>> instances.
>>>>>>>> It
>>>>>>>> is
>>>>>>>> written in Python, can be integrated with OpenStack easily.
>>>>>>>> VMThunder
>>>>>>>> addresses the problem described above by following improvements:
>>>>>>>> on-demand
>>>>>>>> transferring (network attached storage), compute node caching,
P2P
>>>>>>>> transferring and prefetching. VMThunder is a scalable and
>>>>>>>> cost-effective
>>>>>>>> accelerator for bulk provisioning of virtual machines.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> We hope to receive your feedbacks. Any comments are extremely
>>>>>>>> welcome.
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> PS:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> VMThunder enhanced nova blueprint:
>>>>>>>> https://blueprints.launchpad.net/nova/+spec/thunderboost
>>>>>>>>
>>>>>>>> VMThunder standalone project: https://launchpad.net/vmthunder;
>>>>>>>>
>>>>>>>> VMThunder prototype: https://github.com/lihuiba/VMThunder
>>>>>>>>
>>>>>>>> VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder
>>>>>>>>
>>>>>>>> VMThunder portal: http://www.vmthunder.org/
>>>>>>>>
>>>>>>>> VMThunder paper:
>>>>>>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> vmThunder development group
>>>>>>>>
>>>>>>>> PDL
>>>>>>>>
>>>>>>>> National University of Defense Technology
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> OpenStack-dev mailing list
>>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Yongquan Fu
>>>>>>> PhD, Assistant Professor,
>>>>>>> National Key Laboratory for Parallel and Distributed
>>>>>>> Processing, College of Computer Science, National University of
>>>>>>> Defense
>>>>>>> Technology, Changsha, Hunan Province, P.R. China
>>>>>>> 410073
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>>
>>>>>>_______________________________________________
>>>>>>OpenStack-dev mailing list
>>>>>>OpenStack-dev at lists.openstack.org
>>>>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>>>>_______________________________________________
>>>>OpenStack-dev mailing list
>>>>OpenStack-dev at lists.openstack.org
>>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>OpenStack-dev at lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140422/e7a5f34a/attachment-0001.html>
More information about the OpenStack-dev
mailing list