Hi,

We are running a 5 node galera cluster with haproxy in front.  Haproxy is installed on the same node as cinder and its configuration sends all reads and writes to the first node.  From what I can tell the galera and mariadb rpms did not come from the RDO repository.


/etc/cinder/cinder.conf
[database]
connection = mysql+pymysql://cinder:xxxxxxxx@127.0.0.1/cinder


/etc/haproxy/haproxy.conf
listen galera 127.0.0.1:3306
    maxconn 10000
    mode tcp
    option tcpka
    option tcplog
    option mysql-check user haproxy
    server db1 10.252.173.54:3306 check maxconn 10000
    server db2 10.252.173.55:3306 check backup maxconn 10000
    server db3 10.252.173.56:3306 check backup maxconn 10000
    server db4 10.252.173.57:3306 check backup maxconn 10000
    server db5 10.252.173.58:3306 check backup maxconn 10000


Name        : haproxy
Version     : 1.5.18
Release     : 7.el7
Architecture: x86_64
Install Date: Wed 09 Jan 2019 07:09:01 PM GMT
Group       : System Environment/Daemons
Size        : 2689838
License     : GPLv2+
Signature   : RSA/SHA256, Wed 25 Apr 2018 11:04:31 AM GMT, Key ID 24c6a8a7f4a80eb5
Source RPM  : haproxy-1.5.18-7.el7.src.rpm
Build Date  : Wed 11 Apr 2018 04:28:42 AM GMT
Build Host  : x86-01.bsys.centos.org
Relocations : (not relocatable)
Packager    : CentOS BuildSystem <http://bugs.centos.org>
Vendor      : CentOS
URL         : http://www.haproxy.org/
Summary     : TCP/HTTP proxy and load balancer for high availability environments


Name        : galera
Version     : 25.3.20
Release     : 1.rhel7.el7.centos
Architecture: x86_64
Install Date: Wed 09 Jan 2019 07:07:52 PM GMT
Group       : System Environment/Libraries
Size        : 36383325
License     : GPL-2.0
Signature   : DSA/SHA1, Tue 02 May 2017 04:20:52 PM GMT, Key ID cbcb082a1bb943db
Source RPM  : galera-25.3.20-1.rhel7.el7.centos.src.rpm
Build Date  : Thu 27 Apr 2017 12:58:55 PM GMT
Build Host  : centos70-x86-64
Relocations : (not relocatable)
Packager    : Codership Oy
Vendor      : Codership Oy
URL         : http://www.codership.com/
Summary     : Galera: a synchronous multi-master wsrep provider (replication engine)


Name        : MariaDB-server
Version     : 10.3.2
Release     : 1.el7.centos
Architecture: x86_64
Install Date: Wed 09 Jan 2019 07:08:11 PM GMT
Group       : Applications/Databases
Size        : 511538370
License     : GPLv2
Signature   : DSA/SHA1, Sat 07 Oct 2017 05:51:08 PM GMT, Key ID cbcb082a1bb943db
Source RPM  : MariaDB-server-10.3.2-1.el7.centos.src.rpm
Build Date  : Fri 06 Oct 2017 01:51:16 PM GMT
Build Host  : centos70-x86-64
Relocations : (not relocatable)
Vendor      : MariaDB Foundation
URL         : http://mariadb.org
Summary     : MariaDB: a very fast and robust SQL database server

Thanks

On Mon, Jan 14, 2019 at 8:23 AM Mike Bayer <mike_mp@zzzcomputing.com> wrote:
On Mon, Jan 14, 2019 at 7:38 AM Gorka Eguileor <geguileo@redhat.com> wrote:
>
> On 11/01, Brandon Caulder wrote:
> > Hi,
> >
> > The steps were...
> > - purge
> > - shutdown cinder-scheduler, cinder-api
> > - upgrade software
> > - restart cinder-volume
>
> Hi,
>
> You should not restart cinder volume services before doing the DB sync,
> otherwise the Cinder service is likely to fail.
>
> > - sync (upgrade fails and stops at v114)
> > - sync again (db upgrades to v117)
> > - restart cinder-volume
> > - stacktrace observed in volume.log
> >
>
> At this point this could be a DB issue:
>
> https://bugs.mysql.com/bug.php?id=67926
> https://jira.mariadb.org/browse/MDEV-10558

that's a scary issue, can the reporter please list what MySQL /
MariaDB version is running and if this is Galera/HA or single node?


>
> Cheers,
> Gorka.
>
> > Thanks
> >
> > On Fri, Jan 11, 2019 at 7:23 AM Gorka Eguileor <geguileo@redhat.com> wrote:
> >
> > > On 10/01, Brandon Caulder wrote:
> > > > Hi Iain,
> > > >
> > > > There are 424 rows in volumes which drops down to 185 after running
> > > > cinder-manage db purge 1.  Restarting the volume service after package
> > > > upgrade and running sync again does not remediate the problem, although
> > > > running db sync a second time does bump the version up to 117, the
> > > > following appears in the volume.log...
> > > >
> > > > http://paste.openstack.org/show/Gfbe94mSAqAzAp4Ycwlz/
> > > >
> > >
> > > Hi,
> > >
> > > If I understand correctly the steps were:
> > >
> > > - Run DB sync --> Fail
> > > - Run DB purge
> > > - Restart volume services
> > > - See the log error
> > > - Run DB sync --> version proceeds to 117
> > >
> > > If that is the case, could you restart the services again now that the
> > > migration has been moved to version 117?
> > >
> > > If the cinder-volume service is able to restart please run the online
> > > data migrations with the service running.
> > >
> > > Cheers,
> > > Gorka.
> > >
> > >
> > > > Thanks
> > > >
> > > > On Thu, Jan 10, 2019 at 11:15 AM iain MacDonnell <
> > > iain.macdonnell@oracle.com>
> > > > wrote:
> > > >
> > > > >
> > > > > Different issue, I believe (DB sync vs. online migrations) - it just
> > > > > happens that both pertain to shared targets.
> > > > >
> > > > > Brandon, might you have a very large number of rows in your volumes
> > > > > table? Have you been purging soft-deleted rows?
> > > > >
> > > > >      ~iain
> > > > >
> > > > >
> > > > > On 1/10/19 11:01 AM, Jay Bryant wrote:
> > > > > > Brandon,
> > > > > >
> > > > > > I am thinking you are hitting this bug:
> > > > > >
> > > > >
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_cinder_-2Bbug_1806156&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=FHjmiBaQPWLNzGreplNmZfCZ0MkpV5GLaqD2hcs5hwg&s=AvAoszuVyGkd2_1hyCnQjwGEw9dUNfEoqsUcxdHYZqU&e=
> > > > > >
> > > > > >
> > > > > > I think you can work around it by retrying the migration with the
> > > volume
> > > > > > service running.  You may, however, want to check with Iain
> > > MacDonnell
> > > > > > as he has been looking at this for a while.
> > > > > >
> > > > > > Thanks!
> > > > > > Jay
> > > > > >
> > > > > >
> > > > > > On 1/10/2019 12:34 PM, Brandon Caulder wrote:
> > > > > >> Hi,
> > > > > >>
> > > > > >> I am receiving the following error when performing an offline
> > > upgrade
> > > > > >> of cinder from RDO openstack-cinder-1:11.1.0-1.el7 to
> > > > > >> openstack-cinder-1:12.0.3-1.el7.
> > > > > >>
> > > > > >> # cinder-manage db version
> > > > > >> 105
> > > > > >>
> > > > > >> # cinder-manage --debug db sync
> > > > > >> Error during database migration: (pymysql.err.OperationalError)
> > > (2013,
> > > > > >> 'Lost connection to MySQL server during query') [SQL: u'UPDATE
> > > volumes
> > > > > >> SET shared_targets=%(shared_targets)s'] [parameters:
> > > > > >> {'shared_targets': 1}]
> > > > > >>
> > > > > >> # cinder-manage db version
> > > > > >> 114
> > > > > >>
> > > > > >> The db version does not upgrade to queens version 117.  Any help
> > > would
> > > > > >> be appreciated.
> > > > > >>
> > > > > >> Thank you
> > > > > >
> > > > >
> > > > >
> > >
>