[openstack-dev] [Fuel][Docker] Master node bootstrapping issues

Aleksandr Didenko adidenko at mirantis.com
Mon May 12 11:29:13 UTC 2014


> 4) do lrzipping as weak as possible during the development phase and
lrzip it strongly only when we do release

We create lrzip archive with compression level "2" (from 1-9 range, default
is 7). So we don't have much potential for improvement here. Here are some
tests:

== Bare metal 12 cpus, 32 G RAM ==
level 1 decompression total time: 00:00:27.70
level 2 decompression total time: 00:00:29.64

== Virtual server 1 cpu, 1G RAM ==
level 1 decompression total time: 00:05:17.12
level 2 decompression total time: 00:05:22.86


I did some further research on this matter, increasing RAM to 4G did not
help much as well:

== Virtual server 1 cpu, 4G RAM ==
level 1 decompression total time: 00:05:08.70
level 2 decompression total time: 00:05:12.57

So it looks like 'lrzuntar' command itself causes this problem on a weak
hardware, because it uses "lrzcat | tar" which works very fast if we have
enough memory. If we extract archive in two separate steps (lrzip -d && tar
-xf) instead of single "lrzuntar", then they take only ~1m30s summary
(comparing to 5+ minutes with single lrzuntar command). Here are some test
results for lrzip+tar commands:

== Bare metal 12 cpus, 32 G RAM ==
level 2 decompression total time: 00:00:50.12

== Virtual server 1 cpu, 1G RAM ==
level 2 decompression total time: 00:01:35.95

So "lrzuntar" works faster on a powerful hardware, lrzip+tar works faster
on a weak hardware (like VMs).

I suggest to switch to lrzip+tar.



On Sat, May 10, 2014 at 8:01 PM, Dmitry Borodaenko <dborodaenko at mirantis.com
> wrote:

> FWIW 1GB works fine for me on my laptop, I run the master setup manually.
> So I'm against increasing RAM requirement, we have better things to spend
> that RAM on.
> On May 10, 2014 1:37 AM, "Mike Scherbakov" <mscherbakov at mirantis.com>
> wrote:
>
>> It is not related to RAM or CPU. I run installation on my Mac with 1Gb of
>> RAM for master node, and experience the following:
>>
>>    - yes, it needs time to bootstrap admin node
>>    - As soon as I have message that master node is installed, I
>>    immediately open 10.20.0.2:8000 and try to generate diag snapshot.
>>    And it is failed.
>>    - If I wait a few more minutes, and try again - it is passed.
>>
>> It actually seems to me that we simply still do not have
>> https://bugs.launchpad.net/fuel/+bug/1315865 fixed, I'll add more
>> details there as well as logs.
>>
>> When I checked logs, I saw:
>>
>>    - for about a minute, astute was not able to connect to MQ. It means
>>    it is still started before MQ is ready?
>>    - shotgun -c /tmp/dump_config >> /var/log/dump.log 2>&1 && cat
>>    /var/www/nailgun/dump/last returned 1
>>
>> When I tried to run diag_snapshot for a second time, the command above
>> succeeded with 0 return code.
>>
>> So it obviously needs further debugging and in my opinion even if we need
>> to increase VCPU or RAM, then no more than 2 VCPU / 2 Gb.
>>
>> Vladimir, as you and Matt were changing the code which should run
>> containers in a certain order, I'm looking forward to hear from both of you
>> suggestions on where and how we should hack it.
>>
>> Thanks,
>>
>>
>> On Sat, May 10, 2014 at 1:04 AM, Vladimir Kuklin <vkuklin at mirantis.com>wrote:
>>
>>> Hi all
>>>
>>> We are still experiencing some issues with master node bootstrapping
>>> after moving to container-based installation.
>>>
>>> First of all, these issues are related to our system tests. We have
>>> rather small nodes running as master node - only 1 GB of RAM and 1 virtual
>>> CPU. As we are using strongly lrzipped archive, this seems quite not enough
>>> and leads to timeouts during deployment of the master node.
>>>
>>> I have several suggestions:
>>>
>>> 1) Increase amount of RAM for  master node to at least 8 Gigabytes (or
>>> do some pci virtual memory hotplug during master node bootstrapping) and
>>> add additional vCPU for the master node.
>>> 2) Run system tests with non-containerized environment (env variable
>>> PRODUCTION=prod set)
>>> 3) Split our system tests in that way not allowing more than 2 master
>>> nodes to bootstrap simulteneously on the single hardware node.
>>> 4) do lrzipping as weak as possible during the development phase and
>>> lrzip it strongly only when we do release
>>> 5) increase bootstrap timeout for the master node in system tests
>>>
>>>
>>> Any input would be appreciated.
>>>
>>> --
>>> Yours Faithfully,
>>> Vladimir Kuklin,
>>> Fuel Library Tech Lead,
>>> Mirantis, Inc.
>>> +7 (495) 640-49-04
>>> +7 (926) 702-39-68
>>> Skype kuklinvv
>>> 45bk3, Vorontsovskaya Str.
>>> Moscow, Russia,
>>> www.mirantis.com <http://www.mirantis.ru/>
>>> www.mirantis.ru
>>> vkuklin at mirantis.com
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>>
>> --
>> Mike Scherbakov
>> #mihgen
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140512/eb7b132c/attachment.html>


More information about the OpenStack-dev mailing list