[openstack-dev] [Fuel] Feature Freeze Exception Request: Task Based Deployment in Astute

Vladimir Kuklin vkuklin at mirantis.com
Tue Dec 1 20:01:00 UTC 2015


Hi, Folks

** Intro *

During Iteration 3 our Enhancements Team as long as other folks worked on
the feature called "Task Based Deployment with Astute". Here is a link to
its blueprint:
https://blueprints.launchpad.net/fuel/+spec/task-based-deployment-astute

Major implication of this feature complition is that our deployment process
will be drastically optimized allowing us to decrease deployment time of
typical clusters at least by 2,5 times (for BVT/CI cases) and by order of
magnitude for 100-node clusters.

This is achieved by real parallelization of deployment tasks execution
which assumes that we do not wait for the whole 'deployment group/role' to
deploy, but we only wait for particular tasks to finish. For example, we
could deploy 'database' task on secondary controllers as soon as 'database'
task is ready on the first controller. As our deployment workflow consists
only of a small amount of such synchronization points as 'database' task,
we will be able to deploy majority of deployment tasks in parallel
shrinking deployment time to "time-of-deployment-of-the-longest-node". This
actually means that our standard deployment case for development and
testing will take 30 minutes on our CI servers thus drastically improving
developers and users experience, as well as shrinking down time of overall
acceptance testing, time for bug reproducing and so on. This feature also
allows one to use 7.0 role-as-a-plugin feature in much more effective way
as current split-services-with-plugins feature may lead to very inoptimal
deployment flow which might take up to *6 hours* even for the simplest HA
cluster, while it would take again *30 minutes* with *Task-Based *approach.
Also, when multi-roles were used we ran several tasks for each role each
time it was used, making deployment suboptimal again.


** Short List of Work Items*

As we started a little bit lately during iteration 3 we worked on design
and specification of this feature in a way so that its introduction will
bring in almost zero chance of regression with ability to disable it. Here
is the summary

So far we introduce several pieces of code:
1. New version of tasks format introducing cross-node dependencies between
tasks
2. Changes to Nailgun
  a. deduplication of tasks for roles [In Progress]
  b. support for new tasks format [In Progress]
  c. new engine that generates an array of hashes of tasks info consumable
by new Astute engine [In Progress].
3. Changes to Astute
 a. Tasks dependencies parser and visualizer [Ready for review]
 b. Deployment engine capable of graph traversing and reporting [Read for
Review]
 c. Async wrapper for shell-based tasks [Ready for review]
4. Changes to Fuel Library
 a. Add additional fields into existing Fuel Library deployment tasks for
cross-dependencies [In Progress].

** Ensurance of Little Regression and Backward Compatibility*

As we worked on being backward-compatible from the day one, this engine is
enabled ONLY when 2 requirements are met:

1. It is globally enabled in Nailgun settings.yaml
2. ALL tasks scheduled for deployment execution have v2.0.0

This list seems a little bit huge, but this changes are isolated and
granular and actually affect the sequence in which tasks are executed on
the nodes. This means that there will be actually no difference from the
view of resulting functioning of the cluster. This feature can be safely
disabled if user does not want to use it.

But if user wants to work with it, he can gain enormous improvement in
speed, his own engineering/development/testing velocity as well as in Fuel
user experience.

** Additional Cons of the Feature*

Moreover, this feature improves how the following use cases are also
addressed:

*1. When user deploys a specific set of nodes or tasks*
It will be possible to introduce additional flag for deploy/task run
handler for Nailgun to pick up dependencies of specified tasks, even if
they are currently not in place in current deployment graph. This means
that instead of running

*fuel nodes --node-id 2,3 --deploy  *

and see how it fails as node-1 contains some of the tasks that are required
by nodes 2 and 3, user will be calm about it as he will be able to specify
an option to populate deployment flow with needed tasks. No more

*fuel nodes --node-id 2 --tasks netconfig*  -> Fail, because you forgot to
specify some of the required tasks, e.g. hiera, globals.

*2. Post-deployment plugin installation*

This feature also makes post-deployment plugin installation much easier as
plugin installation will happen almost in matter of minutes instead of
hours.

*3. Cluster re-deployment for some of LCM cases support*

Whenever user can change settings on the nodes and trigger full cluster
redeployment or whenever he wants to get tainted cluster converge back to
the previous state deployed by Fuel, he will get his cluster back into
operational state in 30 minutes.

*4. Better capabilities for separated services plugins*

Task-based approach allows one to deploy things with separate services in
much more flexible ways. E.g one will not have to introduce 2 roles in the
plugin for controller to detach keystone services, e.g.
pre-keystone-controller-tasks and post-keystone-controller-tasks. All he
will need is to introduce "skipped" keystone task for controllers so that
keystone is deployed only on the node with keystone role.

** Merge plan*

Merge Astute changes - ETA Dec 4rd
Merge Nailgun changes - ETA Dec 4rd
Prepare Fuel Library changes - ETA Dec 3rd
Test this feature on Scale Lab and against swarm - ETA SCF
Make decision whether to enable task-based deployment engine by default -
SCF

** Summary*

This feature brings a lot of benefits for everyone. Its current
implementation introduces 0 chances for regressions as it will be disabled
by default and it will require specific actions for a user to start using
this feature. In meanwhile we will test this feature at Scale Lab and
against swarm and custom tests. And by SCF we may decide whether to switch
to it based on the reported results. If it happens before SCF, we will be
able to significantly ramp up our development and bugfixing velocity.

-- 
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
35bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin at mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151201/4716db2e/attachment.html>


More information about the OpenStack-dev mailing list