Open Stack

Mon Apr 11 02:52:05 UTC 2011

On Mon, Apr 11, 2011 at 1:43 PM, Elliot Murphy <elliot at canonical.com> wrote:
> Hi!

Thanks for CC'ing me on this Elliot.

>> Launchpad is *EXTREMELY* slow from here in Shanghai, and it should be
>> even worth from the center of China. Even doing a simple thing like "bzr
>> launchpad-login" can even fail because of connectivity, and I hardly can
>> get few KB/s when I do a clone of a bzr repo.
>>
>> I mostly don't mind so much bzr, even though starting has been really
>> annoying, and that I don't know much about advanced usage like I would
>> with Git. But what I welcome the most is the hosting on a platform that
>> has an acceptable speed from Asia, which really, isn't the case at all
>> for Launchpad. Also, the fact that Git doesn't do network connections
>> unless its really needed is very welcome.

bzr shouldn't do network connections except when really needed
*either* : the world is big and networks are slow, so like other DVCS
the strong preference it has is to cache data locally and only talk on
the network when really needed.

I'm located in Oceania (specifically NZ) so I feel every bit of pain
that a high latency connection can cause.

We currently have a known bug with the Launchpad bzr codehosting
service: the time taken to setup a backend on the service for a push
(or pull) operation is about 3 seconds. We're working on fixing this,
but its currently queued behind a critical wrap up of the front and
middle end configuration we use, which has been causing massive delays
for users - I wrote about this in
https://lists.launchpad.net/launchpad-dev/msg06839.html. Thats in
progress now and we should be fully migrated soon (I would say a
specific date, but as I'm not aware of all the logistics involved in
the datacentre, I can't predict all that well).

Beyond that, on performance for Launchpad the website, we've recently
driven our 99th percentile(*) for backend service time down to 2
seconds, and are working hard on driving it further down to 1 second.
The remaining problems that we need to solve to do this are ones we
now have a solid handle on [primarily poor queries due to a web stack
that was built with a different architecture in mind to our
architecture] and 'simply' need to rollout across the site. We do have
200 tables in Launchpad, so there is a fair amount of work remaining
to do this - but on the other hand, we're certainly happy to pick
specific bits that are slow to fix first.

Oh, and don't use the 'edge' servers - they were a sort of beta
testing environment and deliver consistently slower results than the
primary servers. We're phasing 'edge' out. Edge servers have 'edge' in
the URL.

> I am responsible for supporting the teams that develop and operate
> Launchpad and other tools and systems at Canonical that we provide as
> a high tech incubator of sorts for open source projects.
>
> We have heard complaints about slowness from china before, and about
> slowness of Launchpad even outside of china. Over the last few months
> we have made huge improvements to performance outside of china, and
> have begun thinking about options for speeding things up inside china.
> It would be fantastic to get some help from you with specific
> technical detail about what is most painfully slow (perhaps off list),
> and it would also be fantastic to get pointers to sites that you find
> have outstanding performance in china. I have CC'd Rob who is the
> Launchpad technical architect, he has been the driver of our recent
> performance push.

We're desperately short of technical data on the slownesses reported
from China *specifically*.
Things that we'd love to know - how long does SSL handshake take for
you, do you suffer packet loss talking to our servers, whats the peak
bandwidth you can get back to our servers.

In terms of performance... there are a few things going on with LP
performance at the moment.
Firstly, on codehosting specifically:
 - we have this backend startup time issue I mentioned. we have a
patch but its been unstable - to deploy it we need some testing time
and a better high-availability deployment of the codehosting service.
Thats not hugely hard, and is pretty much the next thing in line after
these new mid-tier servers are live.
 - we have some analysis about performance of push and pull itself
which the bzr guys are working on, that will go live as soon as they
cut another release and we upgrade to bzr $thatversion

More generally:
 - we're considering an SSL frontend CDN with a node in asia, but its
not at the very top of the list for performance: we're fixing the
things that have the most impact - that affect everyone - before we
start segmenting and improving performance for just one subset of the
user base.
 - the time it takes to deliver the html/json for a page is a key
metric that we're driving down. 1/2 of the Launchpad developers are
now in maintenance mode doing performance fixes and customer support.
I'm completely confident we'll continue to make massive strides on
this metric in the next 3-6 months. So far, we've dropped the peak
time - the time the slowest pages in Launchpad take to render - by 9
seconds (from a peak of 20 seconds).
 - We're bringing in a shared SSL session cache for the front-end
Apache servers in the next month or so.

Performance is an absolute key characteristic for modern web services,
and I'm utterly utterly dedicated to bringing Launchpad up to par with
the best of them: noone loves using a slow website. I live precisely
opposite to the Launchpad servers, so I'm not going to be happy until
its fast enough for anyone in the world.

I hope you can make some time to correspond with me about the
technical details for the performance you're experiencing in Shanghai
- I've been trying to find a Launchpad user there who can help rule
out whats making things slow. Its my hope that there is something
fairly straight forward we can do to improve the performance of
Launchpad in Shanghai.

> To the open stack community on general, i'd like to say: GitHub
> absolutely rocks, nothing but love for them. But please know that we
> are delighted and proud to have open stack using Launchpad and bazaar,
> and I don't want you to leave without having a chance to make things
> perfect for you. In fact we are sending a Launchpad and bazaar
> developer to the upcoming openstack summit.

(*): We use the 99th percentile as a key metric rather than the mean,
because data set variation can result in a low mean still having low
perceived performance. Our mean service time is about 0.4 seconds, but
thats entirely due to over 50% of our requests being API requests that
we can service trivially.

Cheers,
Rob

Open Stack

[Openstack] Moving code hosting to GitHub

OpenStack

Community

Documentation

Branding & Legal