[openstack-dev] [Fuel] Feature Freeze Exception Request: Task Based Deployment in Astute
Igor Kalnitsky
ikalnitsky at mirantis.com
Wed Dec 2 18:10:52 UTC 2015
Hey folks,
As we decided on today's IRC meeting in #fuel-dev, FFE exception is
granted on the following conditions (if get them right):
* the feature is marked as experimental
* patches should be merged by the end of next week
Thanks,
igor
On Tue, Dec 1, 2015 at 10:01 PM, Vladimir Kuklin <vkuklin at mirantis.com> wrote:
> Hi, Folks
>
> * Intro
>
> During Iteration 3 our Enhancements Team as long as other folks worked on
> the feature called "Task Based Deployment with Astute". Here is a link to
> its blueprint:
> https://blueprints.launchpad.net/fuel/+spec/task-based-deployment-astute
>
> Major implication of this feature complition is that our deployment process
> will be drastically optimized allowing us to decrease deployment time of
> typical clusters at least by 2,5 times (for BVT/CI cases) and by order of
> magnitude for 100-node clusters.
>
> This is achieved by real parallelization of deployment tasks execution which
> assumes that we do not wait for the whole 'deployment group/role' to deploy,
> but we only wait for particular tasks to finish. For example, we could
> deploy 'database' task on secondary controllers as soon as 'database' task
> is ready on the first controller. As our deployment workflow consists only
> of a small amount of such synchronization points as 'database' task, we will
> be able to deploy majority of deployment tasks in parallel shrinking
> deployment time to "time-of-deployment-of-the-longest-node". This actually
> means that our standard deployment case for development and testing will
> take 30 minutes on our CI servers thus drastically improving developers and
> users experience, as well as shrinking down time of overall acceptance
> testing, time for bug reproducing and so on. This feature also allows one to
> use 7.0 role-as-a-plugin feature in much more effective way as current
> split-services-with-plugins feature may lead to very inoptimal deployment
> flow which might take up to 6 hours even for the simplest HA cluster, while
> it would take again 30 minutes with Task-Based approach.
> Also, when multi-roles were used we ran several tasks for each role each
> time it was used, making deployment suboptimal again.
>
>
> * Short List of Work Items
>
> As we started a little bit lately during iteration 3 we worked on design and
> specification of this feature in a way so that its introduction will bring
> in almost zero chance of regression with ability to disable it. Here is the
> summary
>
> So far we introduce several pieces of code:
> 1. New version of tasks format introducing cross-node dependencies between
> tasks
> 2. Changes to Nailgun
> a. deduplication of tasks for roles [In Progress]
> b. support for new tasks format [In Progress]
> c. new engine that generates an array of hashes of tasks info consumable
> by new Astute engine [In Progress].
> 3. Changes to Astute
> a. Tasks dependencies parser and visualizer [Ready for review]
> b. Deployment engine capable of graph traversing and reporting [Read for
> Review]
> c. Async wrapper for shell-based tasks [Ready for review]
> 4. Changes to Fuel Library
> a. Add additional fields into existing Fuel Library deployment tasks for
> cross-dependencies [In Progress].
>
> * Ensurance of Little Regression and Backward Compatibility
>
> As we worked on being backward-compatible from the day one, this engine is
> enabled ONLY when 2 requirements are met:
>
> 1. It is globally enabled in Nailgun settings.yaml
> 2. ALL tasks scheduled for deployment execution have v2.0.0
>
> This list seems a little bit huge, but this changes are isolated and
> granular and actually affect the sequence in which tasks are executed on the
> nodes. This means that there will be actually no difference from the view of
> resulting functioning of the cluster. This feature can be safely disabled if
> user does not want to use it.
>
> But if user wants to work with it, he can gain enormous improvement in
> speed, his own engineering/development/testing velocity as well as in Fuel
> user experience.
>
> * Additional Cons of the Feature
>
> Moreover, this feature improves how the following use cases are also
> addressed:
>
> 1. When user deploys a specific set of nodes or tasks
> It will be possible to introduce additional flag for deploy/task run handler
> for Nailgun to pick up dependencies of specified tasks, even if they are
> currently not in place in current deployment graph. This means that instead
> of running
>
> fuel nodes --node-id 2,3 --deploy
>
> and see how it fails as node-1 contains some of the tasks that are required
> by nodes 2 and 3, user will be calm about it as he will be able to specify
> an option to populate deployment flow with needed tasks. No more
>
> fuel nodes --node-id 2 --tasks netconfig -> Fail, because you forgot to
> specify some of the required tasks, e.g. hiera, globals.
>
> 2. Post-deployment plugin installation
>
> This feature also makes post-deployment plugin installation much easier as
> plugin installation will happen almost in matter of minutes instead of
> hours.
>
> 3. Cluster re-deployment for some of LCM cases support
>
> Whenever user can change settings on the nodes and trigger full cluster
> redeployment or whenever he wants to get tainted cluster converge back to
> the previous state deployed by Fuel, he will get his cluster back into
> operational state in 30 minutes.
>
> 4. Better capabilities for separated services plugins
>
> Task-based approach allows one to deploy things with separate services in
> much more flexible ways. E.g one will not have to introduce 2 roles in the
> plugin for controller to detach keystone services, e.g.
> pre-keystone-controller-tasks and post-keystone-controller-tasks. All he
> will need is to introduce "skipped" keystone task for controllers so that
> keystone is deployed only on the node with keystone role.
>
> * Merge plan
>
> Merge Astute changes - ETA Dec 4rd
> Merge Nailgun changes - ETA Dec 4rd
> Prepare Fuel Library changes - ETA Dec 3rd
> Test this feature on Scale Lab and against swarm - ETA SCF
> Make decision whether to enable task-based deployment engine by default -
> SCF
>
> * Summary
>
> This feature brings a lot of benefits for everyone. Its current
> implementation introduces 0 chances for regressions as it will be disabled
> by default and it will require specific actions for a user to start using
> this feature. In meanwhile we will test this feature at Scale Lab and
> against swarm and custom tests. And by SCF we may decide whether to switch
> to it based on the reported results. If it happens before SCF, we will be
> able to significantly ramp up our development and bugfixing velocity.
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 35bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com
> www.mirantis.ru
> vkuklin at mirantis.com
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list