Re: [OpenStack-I18n] [OpenStack-Infra] translate-dev wildfly/zanata/trove issues
Thanks a lot for such sharing Ian! - not me :) I am copying to i18n mailing list and Zanata dev team for FYI purpose (as what Kato-san kindly RST markups for option on another mail thread). Note that this issue is re-generatable: I am able recreate the issue on translate-dev : When I create a new version from openstack-manuals from master branch [1] - 20107950 words, there is no further web responses from translate-dev.o.o around after 3/4 of total words were processed like [2-4]. It was completely fine for a version creation with 2610150 words. If someone has a good idea to further investigate on it, please share to us :) [1] https://translate-dev.openstack.org/iteration/view/openstack-manuals/master [2] https://translate-dev.openstack.org/iteration/view/openstack-manuals/stable-... [3] https://translate-dev.openstack.org/iteration/view/openstack-manuals/test-ve... [4] https://translate-dev.openstack.org/iteration/view/openstack-manuals/test-ve... With many thanks, /Ian Ian Wienand wrote on 1/16/2017 12:03 PM:
Hi,
I was alerted to translate-dev performance issues today. Indeed, it seemed that things were going crazy with the java wildfly process sucking up all CPU.
At first there didn't seem to be anything in the logs. Java was clearly going mad however, with the following threads going flat-out.
--- gone crazy processes --- 14807 wildfly 20 0 9.892g 4.729g 34232 R 93.7 60.7 26:56.73 java 14804 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:51.91 java 14806 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:53.92 java 14808 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:53.97 java 14810 wildfly 20 0 9.892g 4.729g 34232 R 92.4 60.7 26:56.28 java 14809 wildfly 20 0 9.892g 4.729g 34232 R 92.1 60.7 26:57.74 java 14803 wildfly 20 0 9.892g 4.729g 34232 R 91.1 60.7 26:54.90 java 14805 wildfly 20 0 9.892g 4.729g 34232 R 90.4 60.7 26:52.44 java ---
Hoping to find a easy smoking-gun, I made the java process dump it's threads to see what these are doing
--- thread dump ---
14807 14804 14806 14808 14810 14809 14803, 14805 0x39d7 0x39d4 0x39d6 0x39d8 0x39da 0x39d9 0x39d3 0x39d5
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb2a8026000 nid=0x39d7 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb2a8021000 nid=0x39d4 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb2a8024800 nid=0x39d6 runnable
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb2a8028000 nid=0x39d8 runnable
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb2a802b800 nid=0x39da runnable
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb2a8029800 nid=0x39d9 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb2a801f000 nid=0x39d3 runnable ---
Unfortunately, these all appear to be GC threads so there's nothing really obvious there.
However, eventually in the log you start getting stuff like
--- 2017-01-16T01:58:00,844Z SEVERE [javax.enterprise.resource.webcontainer.jsf.application] (default task-43) Error Rendering View[/project/project.xhtml]: javax.el.ELException: /WEB-INF/layout/project/settings-tab-languages.xhtml @117,88 rendered="#{not projectHome.hasLocaleAlias(locale)}": javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread! ... 2017-01-16T01:08:48,805Z WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (pool-5-thread-1) SQL Error: 0, SQLState: null 2017-01-16T01:08:48,806Z ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (pool-5-thread-1) javax.resource.ResourceException: IJ000460: Error checking for a transaction 2017-01-16T01:08:48,813Z ERROR [org.hibernate.AssertionFailure] (pool-5-thread-1) HHH000099: an assertion failure occured (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session): org.hibernate.exception.GenericJDBCException: Could not open connection 2017-01-16T01:08:48,813Z WARN [com.arjuna.ats.jta] (pool-5-thread-1) ARJUNA016029: SynchronizationImple.afterCompletion - failed for org.hibernate.engine.transaction.synchronization.internal.RegisteredSynchronization@3fc123aa with exception: org.hibernate.AssertionFailure: Exception releasing cache locks ---
There's literally hundreds of lines of traceback across various exceptions, but at the bottom this really seems to point to mysql being at the root of all this?
In this case, mysql is running in RAX's cloud db service. I logged into the RAX control-panel and everything for the DB was green. I used the same credentials as zanata to authenticate and I can connect to the db via translate-dev.o.o and do a few basic things.
--- MySQL [(none)]> USE zanata; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A
Database changed MySQL [zanata]> show tables; +----------------------------------+ | Tables_in_zanata | +----------------------------------+ | Activity | | DATABASECHANGELOG | ... ---
So it's not as simple as "db not working" :( I have copied the console log to [1] in case that makes any sense to zanata people.
If this persists, and isn't a known Zanata issue, I think we might have to enable root access for the db via the API which might allow us to do some thing like debug long-queries. Otherwise, there might be something in the mysql logs available to RAX support people that would otherwise help identify the issue. OTOH, maybe it's as simple as upping java heaps or something.
In the mean time, I have restarted wildfly which seems to get things going for a while.
-i
[1] http://people.redhat.com/~iwienand/console.log.translate-dev-2017-01-16.log....
_______________________________________________ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
On 01/16/2017 03:15 PM, Ian Y. Choi wrote:
Note that this issue is re-generatable: I am able recreate the issue on translate-dev : When I create a new version from openstack-manuals from master branch [1] - 20107950 words, there is no further web responses from translate-dev.o.o around after 3/4 of total words were processed like [2-4]. It was completely fine for a version creation with 2610150 words.
One thing I noticed was that the trove db for translate-dev seems to be set to 2gb (I'm guessing this means working RAM) and this might be a bit tight? Maybe the remote mysql needs more than that; fails transactions and leaves things in a tricky state? Does anyone know what sort of demands are placed on the db? -i
participants (2)
-
Ian Wienand
-
Ian Y. Choi