Our PO files contain information about location (filename and line numbers) as well as untranslated strings. Dolph suggested to me recently to import into projects only the *translated* strings and I did some investigation and implementation. I don't expect this to have any negative impact but wanted to inform you and ask for feedback. We will continue to push the full location information to transifex and leave it in the POT files that are stored in each repository. Thus if you download a file from transifex, you have all location information. During the import from transifex into the OpenStack git repositories, our scripts remove the location information as well as any untranslated strings thus reducing the files to import significantly. This also reduces the change of an import significantly since a line number change will not cause many location information to be updated. The gettext tools we use can cope fine with this smaller PO file since it contains everything that is needed - just nothing more ;) Also, it's easy to rebuild the full PO file from the date in the repository using the msgmerge command: msgmerge POT-FILE PO-FILE -o FULL-PO-FILE As a first step, I have made this change for documentation projects like openstack-manuals only: https://review.openstack.org/176313 If this works as expected, I plan to do it for other projects as well. Please review and tell me if you have any further ideas or if I overlooked something, Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Nice work! This sounds great to me, and looks good at first glance. Do you have an example change that the modified script might produce? On Wed, Apr 22, 2015 at 8:54 AM, Andreas Jaeger <aj@suse.com> wrote:
Our PO files contain information about location (filename and line numbers) as well as untranslated strings. Dolph suggested to me recently to import into projects only the *translated* strings and I did some investigation and implementation. I don't expect this to have any negative impact but wanted to inform you and ask for feedback.
We will continue to push the full location information to transifex and leave it in the POT files that are stored in each repository. Thus if you download a file from transifex, you have all location information.
During the import from transifex into the OpenStack git repositories, our scripts remove the location information as well as any untranslated strings thus reducing the files to import significantly. This also reduces the change of an import significantly since a line number change will not cause many location information to be updated.
The gettext tools we use can cope fine with this smaller PO file since it contains everything that is needed - just nothing more ;)
Also, it's easy to rebuild the full PO file from the date in the repository using the msgmerge command:
msgmerge POT-FILE PO-FILE -o FULL-PO-FILE
As a first step, I have made this change for documentation projects like openstack-manuals only:
https://review.openstack.org/176313
If this works as expected, I plan to do it for other projects as well.
Please review and tell me if you have any further ideas or if I overlooked something,
Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
On 04/22/2015 04:01 PM, Dolph Mathews wrote:
Nice work! This sounds great to me, and looks good at first glance. Do you have an example change that the modified script might produce?
Let me push one to openstack-manuals: https://review.openstack.org/176322 git shows for this one: [testing 21b2aca] WIP: Test import 31 files changed, 89452 insertions(+), 351463 deletions(-) rewrite doc/arch-design/locale/zh_CN.po (65%) rewrite doc/common/locale/fr.po (95%) rewrite doc/common/locale/ja.po (93%) rewrite doc/common/locale/pt_BR.po (96%) rewrite doc/common/locale/zh_CN.po (94%) rewrite doc/glossary/locale/de.po (93%) rewrite doc/glossary/locale/es.po (76%) rewrite doc/glossary/locale/fr.po (88%) rewrite doc/glossary/locale/ko_KR.po (88%) rewrite doc/glossary/locale/ru.po (96%) rewrite doc/glossary/locale/zh_CN.po (91%) rewrite doc/install-guide/locale/ko_KR.po (71%) rewrite doc/install-guide/locale/pt_BR.po (62%) rewrite doc/install-guide/locale/ru.po (61%) rewrite doc/install-guide/locale/zh_CN.po (62%) Future changes will be far smaller, this is the big cleanup, Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Thank you for the information, Andreas. I'm glad to see that you keep the original pot files untouched. The location information in the pot are quite useful for our translators. The location information could be regarded as "context" while translators translate. Some language teams even allocate their tasks by files. They will search in the whole translation resources by "file name". Keeping location information in pot is quite necessary. As to po files, I guess, the location information is helpful when generating the translated documents. Have you ever tested building translated documents without locations ? I think, the goal of reducing the size of PO files is to decrease the size of the whole manuals project, isn't it ? Do you know how much percentage is the po files in the whole manuals project now ? Is it a big number ? Best regards Ying Chun Guo (Daisy) Andreas Jaeger <aj@suse.com> wrote on 2015/04/22 21:54:55:
From: Andreas Jaeger <aj@suse.com> To: "openstack-i18n@lists.openstack.org" <openstack- i18n@lists.openstack.org>, dolph.mathews@gmail.com Date: 2015/04/22 21:55 Subject: [Openstack-i18n] Decreasing size of PO files
Our PO files contain information about location (filename and line numbers) as well as untranslated strings. Dolph suggested to me recently to import into projects only the *translated* strings and I did some investigation and implementation. I don't expect this to have any negative impact but wanted to inform you and ask for feedback.
We will continue to push the full location information to transifex and leave it in the POT files that are stored in each repository. Thus if you download a file from transifex, you have all location information.
During the import from transifex into the OpenStack git repositories, our scripts remove the location information as well as any untranslated strings thus reducing the files to import significantly. This also reduces the change of an import significantly since a line number change will not cause many location information to be updated.
The gettext tools we use can cope fine with this smaller PO file since it contains everything that is needed - just nothing more ;)
Also, it's easy to rebuild the full PO file from the date in the repository using the msgmerge command:
msgmerge POT-FILE PO-FILE -o FULL-PO-FILE
As a first step, I have made this change for documentation projects like openstack-manuals only:
https://review.openstack.org/176313
If this works as expected, I plan to do it for other projects as well.
Please review and tell me if you have any further ideas or if I overlooked something,
Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
_______________________________________________ Openstack-i18n mailing list Openstack-i18n@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-i18n
On 04/23/2015 10:16 AM, Ying Chun Guo wrote:
Thank you for the information, Andreas. I'm glad to see that you keep the original pot files untouched. The location information in the pot are quite useful for our translators. The location information could be regarded as "context" while translators translate. Some language teams even allocate their tasks by files. They will search in the whole translation resources by "file name". Keeping location information in pot is quite necessary.
As to po files, I guess, the location information is helpful when generating the translated documents. Have you ever tested building translated documents without locations ?
Yes, I did - the "Test Import" change build these manuals without problems: https://review.openstack.org/#/c/176322/
I think, the goal of reducing the size of PO files is to decrease the size of the whole manuals project, isn't it ?
My main goal is to reduce the frequency and size of *changes*. It's changes like this one: https://review.openstack.org/#/c/174213/1/doc/glossary/locale/zh_CN.po where nearly all location information changed and one or two strings get inserted that I'd like to avoid. It should be a tiny change (5 lines or so) change but it's a 2000 line change.
Do you know how much percentage is the po files in the whole manuals project now ? Is it a big number ?
The changes are large, just see https://review.openstack.org/#/c/174213/ - it's 27000 lines changed. And that amount of changes needs to be downloaded and stored, But decreasing overall size is also nice. Just looking at the Install Guide, we currently have 10005 lines as source. Locales before my change: 5262 locale/install-guide.pot 8317 locale/ja.po 8330 locale/ko_KR.po 8325 locale/pt_BR.po 8322 locale/ru.po 8334 locale/zh_CN.po 46890 total Locales after my change: 5282 locale/install-guide.pot 6489 locale/ja.po 4155 locale/ko_KR.po 6908 locale/pt_BR.po 5251 locale/ru.po 6669 locale/zh_CN.po 34754 total If we notice that this does not work as expected, we can always revert - we will not loose anything with this since we keep location information in POT file and transifex and only remove them when importing in our projects. A followup patch for all the other projects will be done once I've seen this in action for a few days. Dolph suggested this when we talked about keystone imports. Since I monitor manuals myself, I wanted to have it tested there first, Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
From: Andreas Jaeger <aj@suse.com> To: Ying Chun Guo/China/IBM@IBMCN Cc: dolph.mathews@gmail.com, "openstack-i18n@lists.openstack.org" <openstack-i18n@lists.openstack.org> Date: 2015/04/23 17:44 Subject: Re: [Openstack-i18n] Decreasing size of PO files
On 04/23/2015 10:16 AM, Ying Chun Guo wrote:
Thank you for the information, Andreas. I'm glad to see that you keep the original pot files untouched. The location information in the pot are quite useful for our
Thank you for the clearly explanation. Now I understand why you want to do that. The changes are fine with me. Thank you. Best regards Ying Chun Guo (Daisy) Andreas Jaeger <aj@suse.com> wrote on 2015/04/23 17:44:16: translators.
The location information could be regarded as "context" while translators translate. Some language teams even allocate their tasks by files. They will search in the whole translation resources by "file name". Keeping location information in pot is quite necessary.
As to po files, I guess, the location information is helpful when generating the translated documents. Have you ever tested building translated documents without locations ?
Yes, I did - the "Test Import" change build these manuals without problems: https://review.openstack.org/#/c/176322/
I think, the goal of reducing the size of PO files is to decrease the size of the whole manuals project, isn't it ?
My main goal is to reduce the frequency and size of *changes*.
It's changes like this one: https://review.openstack.org/#/c/174213/1/doc/glossary/locale/zh_CN.po where nearly all location information changed and one or two strings get inserted that I'd like to avoid.
It should be a tiny change (5 lines or so) change but it's a 2000 line change.
Do you know how much percentage is the po files in the whole manuals project now ? Is it a big number ?
The changes are large, just see https://review.openstack.org/#/c/174213/ - it's 27000 lines changed.
And that amount of changes needs to be downloaded and stored,
But decreasing overall size is also nice. Just looking at the Install Guide, we currently have 10005 lines as source.
Locales before my change: 5262 locale/install-guide.pot 8317 locale/ja.po 8330 locale/ko_KR.po 8325 locale/pt_BR.po 8322 locale/ru.po 8334 locale/zh_CN.po 46890 total
Locales after my change: 5282 locale/install-guide.pot 6489 locale/ja.po 4155 locale/ko_KR.po 6908 locale/pt_BR.po 5251 locale/ru.po 6669 locale/zh_CN.po 34754 total
If we notice that this does not work as expected, we can always revert - we will not loose anything with this since we keep location information in POT file and transifex and only remove them when importing in our projects.
A followup patch for all the other projects will be done once I've seen this in action for a few days. Dolph suggested this when we talked about keystone imports. Since I monitor manuals myself, I wanted to have it tested there first,
Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
participants (3)
-
Andreas Jaeger
-
Dolph Mathews
-
Ying Chun Guo