[User-committee] Feedback on Grizzly
Annie Cheng
anniec at yahoo-inc.com
Fri Apr 5 15:27:32 UTC 2013
Another important topic that you point out we share the same concern is the community's awareness of upgrade path and downtime needed during an upgrade. As OpenStack mature, our end users (the users who uses VM) have higher expectation of SLA on API availability uptime. In case of elasticity use case where users are expecting to increase/decrease capacity by provision/deprovision vms based on traffic/needs, API uptime is extremely important.
If you can share more details on DB migration to the community (ie., what was wrong, what's the fix) I think it will be a great learning experience for the community. For the work you've done on DB migration, do you plan to contribute your code upstream?
Thanks!
Annie
From: Annie Cheng <anniec at yahoo-inc.com<mailto:anniec at yahoo-inc.com>>
Date: Friday, April 5, 2013 8:10 AM
To: "mvnwink at rackspace.com<mailto:mvnwink at rackspace.com>" <mvnwink at rackspace.com<mailto:mvnwink at rackspace.com>>, "user-committee at lists.openstack.org<mailto:user-committee at lists.openstack.org>" <user-committee at lists.openstack.org<mailto:user-committee at lists.openstack.org>>
Cc: "openstack-dev at yahoo-inc.com<mailto:openstack-dev at yahoo-inc.com>" <openstack-dev at yahoo-inc.com<mailto:openstack-dev at yahoo-inc.com>>, Perry Myers <pmyers at redhat.com<mailto:pmyers at redhat.com>>
Subject: Re: [User-committee] Feedback on Grizzly
Hi Matt,
>All in all, I wanted to reach back out to you to follow up from before, because I think this particular experience is an excellent highlight that there is often a disconnect between some of the changes that come through to trunk and use of the code at scale.
Very interested in what your learning and can certainly feel your pain points from our experience here also. Yahoo! Is another company that will be deploying Grizzly at scale. My colleagues and I at Yahoo! love to get together with whoever else you can drag into the room to learn about your experience. More importantly, whether the user-committee can help us drive the scale requirement into design (or during design and implementation, be more aware of scale impact).
Another interesting topic would be whether there can be a performance/scale lab available/set up for the community, where nightly, trunk code can be launched and results can be published, so we can catch perf/scale issue early rather than late. This is quite a standard practice to have perf/scale lab at large scale companies like Yahoo!, as OpenStack matures and more widely adopted to larger scale operators, is this something we can consider for the reliability and scalability of OpenStack releases moving forward.
Annie
From: Matt Van Winkle <mvanwink at rackspace.com<mailto:mvanwink at rackspace.com>>
Date: April 5, 2013, 7:01:26 AM PDT
To: "user-committee at lists.openstack.org<mailto:user-committee at lists.openstack.org>" <user-committee at lists.openstack.org<mailto:user-committee at lists.openstack.org>>
Cc: Rainya Mosher <rainya.mosher at rackspace.com<mailto:rainya.mosher at rackspace.com>>, Paul Voccio <paul.voccio at rackspace.com<mailto:paul.voccio at rackspace.com>>, Gabe Westmaas <gabe.westmaas at rackspace.com<mailto:gabe.westmaas at rackspace.com>>
Subject: [User-committee] Feedback on Grizzly
Hello again, folks!
When I reached out a couple weeks ago, I mentioned that I was hoping that, along with being a large developer of OpenStack, Rackspace, could also contribute the committee's work as one of it's largest users via our public cloud. We just found our first opportunity. This week we deployed an early release of Grizzly code to one of our data centers.
Going in, we knew there were quite a few database migrations. As we studied them, however, they presented some challenges in the manner that they were executed. Using them as they were would have meant extended downtime for the databases given the size of our production data (row counts, etc). That downtime is problematic since it translates to the Public APIs being unavailable – something we aim to impact as minimally as possible during code deploys. Ultimately, we had to rewrite them ourselves to achieve the same out comes with less DB unavailability. There is plenty of work the community can do, and the committee can help guide, around better ways to change database structure while maintaining as much uptime as possible. If you need more details, I'm happy to bring the folks that worked on the rewrite into the conversation. Both will actually be at the summit.
The bigger surprise - and full disclosure, we learned a lot about the things we aren't testing in our deployment pipeline - was the dramatic increase in network traffic following the deploy. The new table structures, increased meta data and new queries in this version translated to about 10X in the amount of data being returned for some queries. Add to that, the fact that compute nodes are regularly querying for certain information or often performing a "check in", and we saw a 3X (or more) increase in network traffic on the management network we have for this particular DC (and it's a smaller one as our various deployments go). For now we have improved things slightly by turning off the following periodic tasks:
reboot_timeout
rescue_timeout
resize_confirm_window
These not running has the potential to create some other issues (zombies and such), but that can be managed.
It does look like the developers are already working on getting some of the queries updated:
https://review.openstack.org/#/c/26136/
https://review.openstack.org/#/c/26109/
All in all, I wanted to reach back out to you to follow up from before, because I think this particular experience is an excellent highlight that there is often a disconnect between some of the changes that come through to trunk and use of the code at scale. Almost everyone who was dealt with the above will be in Oregon week after next, so I'm happy to drag any and all into the mix to discuss further.
Thanks so much!
Matt
_______________________________________________
User-committee mailing list
User-committee at lists.openstack.org<mailto:User-committee at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/user-committee/attachments/20130405/39f08df1/attachment-0001.html>
More information about the User-committee
mailing list