[openstack-dev] [Neutron] The three API server multi-worker process patches.

Zhongyue Luo zhongyue.nah at intel.com
Fri Nov 22 03:30:20 UTC 2013


Thanks, I'll give it a try.


On Fri, Nov 22, 2013 at 2:35 AM, Carl Baldwin <carl at ecbaldwin.net> wrote:

> Hello,
>
> Please tell me if your experience is similar to what I experienced:
>
> 1.  I would see *at most one* "MySQL server has gone away" error for
> each process that was spawned as an API worker.  I saw them within a
> minute of spawning the workers and then I did not see these errors
> anymore until I restarted the server and spawned new processes.
>
> 2.  I noted in patch set 7 the line of code that completely fixed this
> for me.  Please confirm that you have applied a patch that includes
> this fix.
>
>         https://review.openstack.org/#/c/37131/7/neutron/wsgi.py
>
> 3.  I did not change anything with pool_recycle or idle_interval in my
> config files.  All I did was set api_workers to the number of workers
> that I wanted to spawn.  The line of code with my comment in it above
> was sufficient for me.
>
> It could be that there is another cause for the errors that you're
> seeing.  For example, is there a max connections setting in mysql that
> might be exceeded when you spawn multiple workers?  More detail would
> be helpful.
>
> Cheers,
> Carl
>
> On Wed, Nov 20, 2013 at 7:40 PM, Zhongyue Luo <zhongyue.nah at intel.com>
> wrote:
> > Carl,
> >
> > By 2006 I mean the "MySQL server has gong away" error code.
> >
> > The error message was still appearing when idle_timeout is set to 1 and
> the
> > quantum API server did not work in my case.
> >
> > Could you perhaps share your conf file when applying this patch?
> >
> > Thanks.
> >
> >
> >
> > On Thu, Nov 21, 2013 at 3:34 AM, Carl Baldwin <carl at ecbaldwin.net>
> wrote:
> >>
> >> Hi, sorry for the delay in response.  I'm glad to look at it.
> >>
> >> Can you be more specific about the error?  Maybe paste the error your
> >> seeing in paste.openstack.org?  I don't find any reference to "2006".
> >> Maybe I'm missing something.
> >>
> >> Also, is the patch that you applied the most recent?  With the final
> >> version of the patch it was no longer necessary for me to set
> >> pool_recycle or idle_interval.
> >>
> >> Thanks,
> >> Carl
> >>
> >> On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo <zhongyue.nah at intel.com>
> >> wrote:
> >> > Carl, Yingjun,
> >> >
> >> > I'm still getting the 2006 error even by configuring idle_interval to
> 1.
> >> >
> >> > I applied the patch to the RDO havana dist on centos 6.4.
> >> >
> >> > Are there any other options I should be considering such as min/max
> pool
> >> > size or use_tpool?
> >> >
> >> > Thanks.
> >> >
> >> >
> >> >
> >> > On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron)
> >> > <carl.baldwin at hp.com> wrote:
> >> >>
> >> >> This pool_recycle parameter is already configurable using the
> >> >> idle_timeout
> >> >> configuration variable in neutron.conf.  I tested this with a value
> of
> >> >> 1
> >> >> as suggested and it did get rid of the mysql server gone away
> messages.
> >> >>
> >> >> This is a great clue but I think I would like a long-term solution
> that
> >> >> allows the end-user to still configure this like they were before.
> >> >>
> >> >> I'm currently thinking along the lines of calling something like
> >> >> pool.dispose() in each child immediately after it is spawned.  I
> think
> >> >> this should invalidate all of the existing connections so that when a
> >> >> connection is checked out of the pool a new one will be created
> fresh.
> >> >>
> >> >> Thoughts?  I'll be testing.  Hopefully, I'll have a fixed patch up
> >> >> soon.
> >> >>
> >> >> Cheers,
> >> >> Carl
> >> >>
> >> >> From:  Yingjun Li <liyingjun1988 at gmail.com>
> >> >> Reply-To:  OpenStack Development Mailing List
> >> >> <openstack-dev at lists.openstack.org>
> >> >> Date:  Thursday, September 5, 2013 8:28 PM
> >> >> To:  OpenStack Development Mailing List
> >> >> <openstack-dev at lists.openstack.org>
> >> >> Subject:  Re: [openstack-dev] [Neutron] The three API server
> >> >> multi-worker
> >> >> process patches.
> >> >>
> >> >>
> >> >> +1 for Carl's patch, and i have abandoned my patch..
> >> >>
> >> >> About the `MySQL server gone away` problem, I fixed it by set
> >> >> 'pool_recycle' to 1 in db/api.py.
> >> >>
> >> >> 在 2013年9月6日星期五,Nachi Ueno 写道:
> >> >>
> >> >> Hi Folks
> >> >>
> >> >> We choose https://review.openstack.org/#/c/37131/ <-- This patch to
> go
> >> >> on.
> >> >> We are also discussing in this patch.
> >> >>
> >> >> Best
> >> >> Nachi
> >> >>
> >> >>
> >> >>
> >> >> 2013/9/5 Baldwin, Carl (HPCS Neutron) <carl.baldwin at hp.com>:
> >> >> > Brian,
> >> >> >
> >> >> > As far as I know, no consensus was reached.
> >> >> >
> >> >> > A problem was discovered that happens when spawning multiple
> >> >> > processes.
> >> >> > The mysql connection seems to "go away" after between 10-60 seconds
> >> >> > in
> >> >> > my
> >> >> > testing causing a seemingly random API call to fail.  After that,
> it
> >> >> > is
> >> >> > okay.  This must be due to some interaction between forking the
> >> >> > process
> >> >> > and the mysql connection pool.  This needs to be solved but I
> haven't
> >> >> > had
> >> >> > the time to look in to it this week.
> >> >> >
> >> >> > I'm not sure if the other proposal suffers from this problem.
> >> >> >
> >> >> > Carl
> >> >> >
> >> >> > On 9/4/13 3:34 PM, "Brian Cline" <bcline at softlayer.com> wrote:
> >> >> >
> >> >> >>Was any consensus on this ever reached? It appears both reviews are
> >> >> >> still
> >> >> >>open. I'm partial to review 37131 as it attacks the problem a more
> >> >> >>concisely, and, as mentioned, combined the efforts of the two more
> >> >> >>effective patches. I would echo Carl's sentiments that it's an easy
> >> >> >>review minus the few minor behaviors discussed on the review thread
> >> >> >>today.
> >> >> >>
> >> >> >>We feel very strongly about these making it into Havana -- being
> >> >> >> confined
> >> >> >>to a single neutron-server instance per cluster or region is a huge
> >> >> >>bottleneck--essentially the only controller process with massive
> CPU
> >> >> >>churn in environments with constant instance churn, or sudden large
> >> >> >>batches of new instance requests.
> >> >> >>
> >> >> >>In Grizzly, this behavior caused addresses not to be issued to some
> >> >> >>instances during boot, due to quantum-server thinking the DHCP
> agents
> >> >> >>timed out and were no longer available, when in reality they were
> >> >> >> just
> >> >> >>backlogged (waiting on quantum-server, it seemed).
> >> >> >>
> >> >> >>Is it realistically looking like this patch will be cut for h3?
> >> >> >>
> >> >> >>--
> >> >> >>Brian Cline
> >> >> >>Software Engineer III, Product Innovation
> >> >> >>
> >> >> >>SoftLayer, an IBM Company
> >> >> >>4849 Alpha Rd, Dallas, TX 75244
> >> >> >>214.782.7876 direct  |  bcline at softlayer.com
> >> >> >>
> >> >> >>
> >> >> >>-----Original Message-----
> >> >> >>From: Baldwin, Carl (HPCS Neutron) [mailto:carl.baldwin at hp.com]
> >> >> >>Sent: Wednesday, August 28, 2013 3:04 PM
> >> >> >>To: Mark McClain
> >> >> >>Cc: OpenStack Development Mailing List
> >> >> >>Subject: [openstack-dev] [Neutron] The three API server
> multi-worker
> >> >> >>process patches.
> >> >> >>
> >> >> >>All,
> >> >> >>
> >> >> >>We've known for a while now that some duplication of work happened
> >> >> >> with
> >> >> >>respect to adding multiple worker processes to the neutron-server.
> >> >> >> There
> >> >> >>were a few mistakes made which led to three patches being done
> >> >> >>independently of each other.
> >> >> >>
> >> >> >>Can we settle on one and accept it?
> >> >> >>
> >> >> >>I have changed my patch at the suggestion of one of the other 2
> >> >> >> authors,
> >> >> >>Peter Feiner, in attempt to find common ground.  It now uses
> >> >> >> openstack
> >> >> >>common code and therefore it is more concise than any of the
> original
> >> >> >>three and should be pretty easy to review.  I'll admit to some bias
> >> >> >>toward
> >> >> >>my own implementation but most importantly, I would like for one of
> >> >> >> these
> >> >> >>implementations to land and start seeing broad usage in the
> community
> >> >> >>earlier than later.
> >> >> >>
> >> >> >>Carl Baldwin
> >> >> >>
> >> >> >>PS Here are the two remaining patches.  The third has been
> abandoned.
> >> >> >>
> >> >> >>https://review.openstack.org/#/c/37131/
> >> >> >>https://review.openstack.org/#/c/36487/
> >> >> >>
> >> >> >>
> >> >> >>_______________________________________________
> >> >> >>OpenStack-dev mailing list
> >> >> >>OpenStack-dev at lists.openstack.org
> >> >> >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > OpenStack-dev mailing list
> >> >> > OpenStack-dev at lists.openstack.org
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> OpenStack-dev mailing list
> >> >> OpenStack-dev at lists.openstack.org
> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Intel SSG/STO/DCST/CIT
> >> > 880 Zixing Road, Zizhu Science Park, Minhang District, 200241,
> Shanghai,
> >> > China
> >> > +862161166500
> >> >
> >> > _______________________________________________
> >> > OpenStack-dev mailing list
> >> > OpenStack-dev at lists.openstack.org
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> >
> > --
> > Intel SSG/STO/DCST/CIT
> > 880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
> > China
> > +862161166500
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
*Intel SSG/STO/DCST/CIT*
880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
China
+862161166500
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131122/eab28f53/attachment.html>


More information about the OpenStack-dev mailing list