[OpenStack-Infra] translate-dev wildfly/zanata/trove issues
Ian Y. Choi
ianyrchoi at gmail.com
Mon Jan 16 04:15:31 UTC 2017
Thanks a lot for such sharing Ian! - not me :)
I am copying to i18n mailing list and Zanata dev team for FYI purpose
(as what Kato-san kindly RST markups for option on another mail thread).
Note that this issue is re-generatable: I am able recreate the issue on
translate-dev
: When I create a new version from openstack-manuals from master branch
[1] - 20107950 words,
there is no further web responses from translate-dev.o.o around after
3/4 of total words were processed
like [2-4]. It was completely fine for a version creation with 2610150
words.
If someone has a good idea to further investigate on it, please share to
us :)
[1]
https://translate-dev.openstack.org/iteration/view/openstack-manuals/master
[2]
https://translate-dev.openstack.org/iteration/view/openstack-manuals/stable-newton
[3]
https://translate-dev.openstack.org/iteration/view/openstack-manuals/test-version-creation
[4]
https://translate-dev.openstack.org/iteration/view/openstack-manuals/test-version-creation2
With many thanks,
/Ian
Ian Wienand wrote on 1/16/2017 12:03 PM:
> Hi,
>
> I was alerted to translate-dev performance issues today. Indeed, it
> seemed that things were going crazy with the java wildfly process
> sucking up all CPU.
>
> At first there didn't seem to be anything in the logs. Java was
> clearly going mad however, with the following threads going flat-out.
>
> --- gone crazy processes ---
> 14807 wildfly 20 0 9.892g 4.729g 34232 R 93.7 60.7 26:56.73 java
> 14804 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:51.91 java
> 14806 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:53.92 java
> 14808 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:53.97 java
> 14810 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:56.28 java
> 14809 wildfly 20 0 9.892g 4.729g 34232 R 92.1 60.7 26:57.74 java
> 14803 wildfly 20 0 9.892g 4.729g 34232 R 91.1 60.7 26:54.90 java
> 14805 wildfly 20 0 9.892g 4.729g 34232 R 90.4 60.7 26:52.44 java
> ---
>
> Hoping to find a easy smoking-gun, I made the java process dump it's
> threads to see what these are doing
>
> --- thread dump ---
>
> 14807 14804 14806 14808 14810 14809 14803, 14805
> 0x39d7 0x39d4 0x39d6 0x39d8 0x39da 0x39d9 0x39d3 0x39d5
>
> "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable
> "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable
> "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable
> "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable
>
> "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable
> "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable
> "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable
> "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable
>
> "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable
> "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable
> "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable
> "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable
>
> "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable
> "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable
> "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable
> "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable
>
> "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable
> "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable
> "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable
> "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable
>
> "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable
> "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable
> "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable
> "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable
>
> "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable
> "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable
> "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable
> "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable
> ---
>
> Unfortunately, these all appear to be GC threads so there's nothing
> really obvious there.
>
> However, eventually in the log you start getting stuff like
>
> ---
> 2017-01-16T01:58:00,844Z SEVERE [javax.enterprise.resource.webcontainer.jsf.application] (default task-43) Error Rendering View[/project/project.xhtml]: javax.el.ELException: /WEB-INF/layout/project/settings-tab-languages.xhtml @117,88 rendered="#{not projectHome.hasLocaleAlias(locale)}": javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
> ...
> 2017-01-16T01:08:48,805Z WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (pool-5-thread-1) SQL Error: 0, SQLState: null
> 2017-01-16T01:08:48,806Z ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (pool-5-thread-1) javax.resource.ResourceException: IJ000460: Error checking for a transaction
> 2017-01-16T01:08:48,813Z ERROR [org.hibernate.AssertionFailure] (pool-5-thread-1) HHH000099: an assertion failure occured (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session): org.hibernate.exception.GenericJDBCException: Could not open connection
> 2017-01-16T01:08:48,813Z WARN [com.arjuna.ats.jta] (pool-5-thread-1) ARJUNA016029: SynchronizationImple.afterCompletion - failed for org.hibernate.engine.transaction.synchronization.internal.RegisteredSynchronization at 3fc123aa with exception: org.hibernate.AssertionFailure: Exception releasing cache locks
> ---
>
> There's literally hundreds of lines of traceback across various
> exceptions, but at the bottom this really seems to point to mysql
> being at the root of all this?
>
> In this case, mysql is running in RAX's cloud db service. I logged
> into the RAX control-panel and everything for the DB was green. I
> used the same credentials as zanata to authenticate and I can connect
> to the db via translate-dev.o.o and do a few basic things.
>
> ---
> MySQL [(none)]> USE zanata;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
>
> Database changed
> MySQL [zanata]> show tables;
> +----------------------------------+
> | Tables_in_zanata |
> +----------------------------------+
> | Activity |
> | DATABASECHANGELOG |
> ...
> ---
>
> So it's not as simple as "db not working" :( I have copied the console
> log to [1] in case that makes any sense to zanata people.
>
> If this persists, and isn't a known Zanata issue, I think we might
> have to enable root access for the db via the API which might allow us
> to do some thing like debug long-queries. Otherwise, there might be
> something in the mysql logs available to RAX support people that would
> otherwise help identify the issue. OTOH, maybe it's as simple as
> upping java heaps or something.
>
> In the mean time, I have restarted wildfly which seems to get things
> going for a while.
>
> -i
>
> [1] http://people.redhat.com/~iwienand/console.log.translate-dev-2017-01-16.log.gz
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
More information about the OpenStack-Infra
mailing list