Re: [Openstack-i18n] [Openstack-docs] On translations and ITS
(CC OpenStack-i18n - consider joining!) A very nice lead in discussion to our two sessions related to this at the summit :) Right now, we have some custom code for 1 that has been customised for the OpenStack docs (for example, it excludes screen elements ;)), giving some of the practical benefits of ITS. It's in tools/generatepot. Looking forward to discussing more next week. Regards, Tom On 02/11/13 05:34, Shaun McCance wrote:
I have some experience (read: bias) in translation tools, so I'm writing up a synopsis in the hopes in will be useful for the documentation translation session at the summit next week.
Document translation generally follows a three-step process:
1) Segmentation: A program takes the XML files and breaks it up into chunks (often paragraphs) that can be individually translated and tracked. These are usually stored in either PO or XLIFF files, but in some systems they might be records in a database.
2) Translation: Translators translate those segments. They might edit the PO or XLIFF files directly. They might use a graphical front-end. They might do it through a web site that hides the files from them, but still presents the individual segments.
3) Merging: A program takes the translated segments, matches them up to the appropriate nodes in the source document, and writes a localized XML file.
Online tools like Transifex, Zanata, and Pootle are really about step 2, but they often include code for steps 1 and 3 to give you an all-in-one package. Unfortunately, to my knowledge, none of them use the W3C Internationalization Tag Set (ITS) to accomplish those steps. Luckily, they let you provide POT files and can give you PO files, which means you can plug your own code in for steps 1 and 3.
ITS is a W3C recommendation that provides a standard way to specify what parts of a document are translatable, what elements are inline, and various other things that are really critical for good segmentation. ITS 2.0 was released this week, and addresses a whole slew of other issues.
ITS lets you assert things about elements on a global level using XPath expression. For example, let's say we don't want any of our screen elements to be translated. We could use a rule like this:
<its:translateRule translate="no" selector="//db:screen"/>
Magically, hundreds of messages will disappear from translators' view, allowing them time to have dinner with their families instead. You can also mark things locally. So for example, if we don't want to exclude all screen elements from translation, then on the ones we do want to exclude, we'd write this:
<screen its:translate="no">
You can also specify which elements are within text (inline), which are space-preserving, where there are references to external resources like images that have to be localized, and lots more.
Biased opinion: If you have an XML translation process that doesn't involve ITS, you're doing something wrong. (Disclosure: I was on the working group that created ITS 2.0, and I'm the developer of itstool.)
There are a number of tools that support ITS. Many of them work with PO or XLIFF files, so you can plug them into most online translation tools. I happen to be fond of my program, itstool, which supports PO files and has a number of extensions that have been useful for other open source projects like GNOME.
If you want a workflow that uses XLIFF files, you should look into Okapi, a fantastic open source framework that supports ITS.
I do have a dog in this race, so I'm trying not to be too pushy. But there are a lot of smart people who spent a lot of time figuring this stuff out. If you have non-ITS segmentation and merging code, you'll just end up chasing problems that have already been solved.
If it's not obvious, I love talking about this stuff. So feel free to ask me questions.
Thanks, Shaun
_______________________________________________ Openstack-docs mailing list Openstack-docs@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs
On Sat, 2013-11-02 at 09:39 +1100, Tom Fifield wrote:
(CC OpenStack-i18n - consider joining!)
A very nice lead in discussion to our two sessions related to this at the summit :)
Right now, we have some custom code for 1 that has been customised for the OpenStack docs (for example, it excludes screen elements ;)), giving some of the practical benefits of ITS. It's in tools/generatepot.
Thanks Tom. I didn't realize we actually had our own segmentation and merging tools in openstack-manuals. That makes it easier to decide on each piece of the puzzle separately. Looking at the scripts, I see they're actually using xml2po for all the heavy lifing. xml2po was written around 2003/2004 by Danilo Šegan with help from Claude Paroz. I inherited maintainership of it around 2009. I tried to bend it to do things it wasn't designed for until I discovered ITS. I began rearchitecting xml2po on top of ITS in 2010. itstool is the result of that work.
Looking forward to discussing more next week.
Regards,
Tom
On 02/11/13 05:34, Shaun McCance wrote:
I have some experience (read: bias) in translation tools, so I'm writing up a synopsis in the hopes in will be useful for the documentation translation session at the summit next week.
Document translation generally follows a three-step process:
1) Segmentation: A program takes the XML files and breaks it up into chunks (often paragraphs) that can be individually translated and tracked. These are usually stored in either PO or XLIFF files, but in some systems they might be records in a database.
2) Translation: Translators translate those segments. They might edit the PO or XLIFF files directly. They might use a graphical front-end. They might do it through a web site that hides the files from them, but still presents the individual segments.
3) Merging: A program takes the translated segments, matches them up to the appropriate nodes in the source document, and writes a localized XML file.
Online tools like Transifex, Zanata, and Pootle are really about step 2, but they often include code for steps 1 and 3 to give you an all-in-one package. Unfortunately, to my knowledge, none of them use the W3C Internationalization Tag Set (ITS) to accomplish those steps. Luckily, they let you provide POT files and can give you PO files, which means you can plug your own code in for steps 1 and 3.
ITS is a W3C recommendation that provides a standard way to specify what parts of a document are translatable, what elements are inline, and various other things that are really critical for good segmentation. ITS 2.0 was released this week, and addresses a whole slew of other issues.
ITS lets you assert things about elements on a global level using XPath expression. For example, let's say we don't want any of our screen elements to be translated. We could use a rule like this:
<its:translateRule translate="no" selector="//db:screen"/>
Magically, hundreds of messages will disappear from translators' view, allowing them time to have dinner with their families instead. You can also mark things locally. So for example, if we don't want to exclude all screen elements from translation, then on the ones we do want to exclude, we'd write this:
<screen its:translate="no">
You can also specify which elements are within text (inline), which are space-preserving, where there are references to external resources like images that have to be localized, and lots more.
Biased opinion: If you have an XML translation process that doesn't involve ITS, you're doing something wrong. (Disclosure: I was on the working group that created ITS 2.0, and I'm the developer of itstool.)
There are a number of tools that support ITS. Many of them work with PO or XLIFF files, so you can plug them into most online translation tools. I happen to be fond of my program, itstool, which supports PO files and has a number of extensions that have been useful for other open source projects like GNOME.
If you want a workflow that uses XLIFF files, you should look into Okapi, a fantastic open source framework that supports ITS.
I do have a dog in this race, so I'm trying not to be too pushy. But there are a lot of smart people who spent a lot of time figuring this stuff out. If you have non-ITS segmentation and merging code, you'll just end up chasing problems that have already been solved.
If it's not obvious, I love talking about this stuff. So feel free to ask me questions.
Thanks, Shaun
_______________________________________________ Openstack-docs mailing list Openstack-docs@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs
_______________________________________________ Openstack-docs mailing list Openstack-docs@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs
Shaun McCance <shaunm@gnome.org> 2013/11/02 09:59
To
Tom Fifield <tom@openstack.org>,
cc
"openstack-i18n@lists.openstack.org" <openstack- i18n@lists.openstack.org>, openstack-docs@lists.openstack.org
Subject
Re: [Openstack-i18n] [Openstack-docs] On translations and ITS
On Sat, 2013-11-02 at 09:39 +1100, Tom Fifield wrote:
(CC OpenStack-i18n - consider joining!)
A very nice lead in discussion to our two sessions related to this at the summit :)
Right now, we have some custom code for 1 that has been customised for the OpenStack docs (for example, it excludes screen elements ;)), giving some of the practical benefits of ITS. It's in tools/generatepot.
Thanks Tom. I didn't realize we actually had our own segmentation and merging tools in openstack-manuals. That makes it easier to decide on each piece of the puzzle separately.
Looking at the scripts, I see they're actually using xml2po for all the heavy lifing. xml2po was written around 2003/2004 by Danilo Šegan with help from Claude Paroz. I inherited maintainership of it around 2009. I tried to bend it to do things it wasn't designed for until I discovered ITS. I began rearchitecting xml2po on top of ITS in 2010. itstool is the result of that work.
Looking forward to discussing more next week.
Regards,
Tom
On 02/11/13 05:34, Shaun McCance wrote:
I have some experience (read: bias) in translation tools, so I'm writing up a synopsis in the hopes in will be useful for the documentation translation session at the summit next week.
Document translation generally follows a three-step process:
1) Segmentation: A program takes the XML files and breaks it up into chunks (often paragraphs) that can be individually translated and tracked. These are usually stored in either PO or XLIFF files, but in some systems they might be records in a database.
2) Translation: Translators translate those segments. They might edit the PO or XLIFF files directly. They might use a graphical front-end. They might do it through a web site that hides the files from them, but still presents the individual segments.
3) Merging: A program takes the translated segments, matches them up to the appropriate nodes in the source document, and writes a localized XML file.
Online tools like Transifex, Zanata, and Pootle are really about step 2, but they often include code for steps 1 and 3 to give you an all-in-one package. Unfortunately, to my knowledge, none of them use the W3C Internationalization Tag Set (ITS) to accomplish those steps. Luckily, they let you provide POT files and can give you PO files, which means you can plug your own code in for steps 1 and 3.
ITS is a W3C recommendation that provides a standard way to specify what parts of a document are translatable, what elements are inline, and various other things that are really critical for good segmentation. ITS 2.0 was released this week, and addresses a whole slew of other issues.
ITS lets you assert things about elements on a global level using XPath expression. For example, let's say we don't want any of our screen elements to be translated. We could use a rule like this:
<its:translateRule translate="no" selector="//db:screen"/>
Magically, hundreds of messages will disappear from translators' view, allowing them time to have dinner with their families instead. You can also mark things locally. So for example, if we don't want to exclude all screen elements from translation, then on the ones we do want to exclude, we'd write this:
<screen its:translate="no">
You can also specify which elements are within text (inline), which are space-preserving, where there are references to external resources
images that have to be localized, and lots more.
Biased opinion: If you have an XML translation process that doesn't involve ITS, you're doing something wrong. (Disclosure: I was on the working group that created ITS 2.0, and I'm the developer of itstool.)
There are a number of tools that support ITS. Many of them work with PO or XLIFF files, so you can plug them into most online translation tools. I happen to be fond of my program, itstool, which supports PO files and has a number of extensions that have been useful for other open
Thank you for your sharing, Shaun. I'm the author of slicing and merging scripts. ITS looks like a useful standard and ITS tools should be helpful. Can you share more about the usage of ITS tools? Which on-line translation tools support embedded ITS tools? Besides itstool.org, are there any other ITS tools that support po files? Regards Ying Chun Guo (Daisy) Shaun McCance <shaunm@gnome.org> wrote on 2013/11/02 09:59:13: like source
projects like GNOME.
If you want a workflow that uses XLIFF files, you should look into Okapi, a fantastic open source framework that supports ITS.
I do have a dog in this race, so I'm trying not to be too pushy. But there are a lot of smart people who spent a lot of time figuring this stuff out. If you have non-ITS segmentation and merging code, you'll just end up chasing problems that have already been solved.
If it's not obvious, I love talking about this stuff. So feel free to ask me questions.
Thanks, Shaun
_______________________________________________ Openstack-docs mailing list Openstack-docs@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs
_______________________________________________ Openstack-docs mailing list Openstack-docs@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs
_______________________________________________ Openstack-i18n mailing list Openstack-i18n@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-i18n
On Mon, 2013-11-04 at 01:28 +0800, Ying Chun Guo wrote: > Thank you for your sharing, Shaun. > I'm the author of slicing and merging scripts. > > ITS looks like a useful standard and ITS tools should be helpful. > Can you share more about the usage of ITS tools? > Which on-line translation tools support embedded ITS tools? > Besides itstool.org, are there any other ITS tools that support po > files? Here's some information about the tools that support ITS 2.0: http://www.w3.org/International/its/wiki/ITS_Implementations As for PO files, I'm not really sure. I was the only person on the working group that was actively working with them. Most of them use XLIFF instead. I believe Okapi actually has PO support, but they don't make a big deal of it. Okapi is primarily an XLIFF-based framework. Using the itstool command line is very similar to using the xml2po command line. I realize our scripts don't actually use the xml2po command line. Instead, they're Python scripts that import the xml2po library and reimplement the wrapper code. I don't know the complete history of the scripts, so I can't speak to everything they do. But I think one of the reasons for them is to override how particular elements are handled, and the only way to do that in xml2po is to write a custom mode in Python. Basically, itstool was written so that people wouldn't have to do what you did to accomplish what you accomplished. Here's the file you'd use to make literallayout, programlisting, and screen non-translatable. <its:rules xmlns:its="http://www.w3.org/2005/11/its" xmlns:db="http://docbook.org/ns/docbook" version="2.0"> <its:translateRule translate="no" selector="//db:literallayout | //db:programlisting | //db:screen"/> </its:rules> I'd be interested to hear what other problems the custom scripts are addressing. A lot of problems are solved by using ITS, but you might have come across issues I haven't. > > Shaun McCance <shaunm@gnome.org> > > 2013/11/02 09:59 > > > > To > > > > Tom Fifield <tom@openstack.org>, > > > > cc > > > > "openstack-i18n@lists.openstack.org" <openstack- > > i18n@lists.openstack.org>, openstack-docs@lists.openstack.org > > > > Subject > > > > Re: [Openstack-i18n] [Openstack-docs] On translations and ITS > > > > On Sat, 2013-11-02 at 09:39 +1100, Tom Fifield wrote: > > > (CC OpenStack-i18n - consider joining!) > > > > > > A very nice lead in discussion to our two sessions related to this > at > > > the summit :) > > > > > > Right now, we have some custom code for 1 that has been customised > for > > > the OpenStack docs (for example, it excludes screen elements ;)), > giving > > > some of the practical benefits of ITS. It's in tools/generatepot. > > > > Thanks Tom. I didn't realize we actually had our own segmentation > and > > merging tools in openstack-manuals. That makes it easier to decide > on > > each piece of the puzzle separately. > > > > Looking at the scripts, I see they're actually using xml2po for all > the > > heavy lifing. xml2po was written around 2003/2004 by Danilo Šegan > with > > help from Claude Paroz. I inherited maintainership of it around > 2009. I > > tried to bend it to do things it wasn't designed for until I > discovered > > ITS. I began rearchitecting xml2po on top of ITS in 2010. itstool is > the > > result of that work. > > > > > Looking forward to discussing more next week. > > > > > > Regards, > > > > > > > > > Tom > > > > > > On 02/11/13 05:34, Shaun McCance wrote: > > > > I have some experience (read: bias) in translation tools, so I'm > writing > > > > up a synopsis in the hopes in will be useful for the > documentation > > > > translation session at the summit next week. > > > > > > > > Document translation generally follows a three-step process: > > > > > > > > 1) Segmentation: A program takes the XML files and breaks it up > into > > > > chunks (often paragraphs) that can be individually translated > and > > > > tracked. These are usually stored in either PO or XLIFF files, > but in > > > > some systems they might be records in a database. > > > > > > > > 2) Translation: Translators translate those segments. They might > edit > > > > the PO or XLIFF files directly. They might use a graphical > front-end. > > > > They might do it through a web site that hides the files from > them, but > > > > still presents the individual segments. > > > > > > > > 3) Merging: A program takes the translated segments, matches > them up to > > > > the appropriate nodes in the source document, and writes a > localized XML > > > > file. > > > > > > > > Online tools like Transifex, Zanata, and Pootle are really about > step 2, > > > > but they often include code for steps 1 and 3 to give you an > all-in-one > > > > package. Unfortunately, to my knowledge, none of them use the > W3C > > > > Internationalization Tag Set (ITS) to accomplish those steps. > Luckily, > > > > they let you provide POT files and can give you PO files, which > means > > > > you can plug your own code in for steps 1 and 3. > > > > > > > > ITS is a W3C recommendation that provides a standard way to > specify what > > > > parts of a document are translatable, what elements are inline, > and > > > > various other things that are really critical for good > segmentation. ITS > > > > 2.0 was released this week, and addresses a whole slew of other > issues. > > > > > > > > http://www.w3.org/TR/its20/ > > > > > > > > ITS lets you assert things about elements on a global level > using XPath > > > > expression. For example, let's say we don't want any of our > screen > > > > elements to be translated. We could use a rule like this: > > > > > > > > <its:translateRule translate="no" selector="//db:screen"/> > > > > > > > > Magically, hundreds of messages will disappear from translators' > view, > > > > allowing them time to have dinner with their families instead. > You can > > > > also mark things locally. So for example, if we don't want to > exclude > > > > all screen elements from translation, then on the ones we do > want to > > > > exclude, we'd write this: > > > > > > > > <screen its:translate="no"> > > > > > > > > You can also specify which elements are within text (inline), > which are > > > > space-preserving, where there are references to external > resources like > > > > images that have to be localized, and lots more. > > > > > > > > Biased opinion: If you have an XML translation process that > doesn't > > > > involve ITS, you're doing something wrong. (Disclosure: I was on > the > > > > working group that created ITS 2.0, and I'm the developer of > itstool.) > > > > > > > > There are a number of tools that support ITS. Many of them work > with PO > > > > or XLIFF files, so you can plug them into most online > translation tools. > > > > I happen to be fond of my program, itstool, which supports PO > files and > > > > has a number of extensions that have been useful for other open > source > > > > projects like GNOME. > > > > > > > > http://itstool.org/ > > > > > > > > If you want a workflow that uses XLIFF files, you should look > into > > > > Okapi, a fantastic open source framework that supports ITS. > > > > > > > > http://okapi.sourceforge.net/ > > > > > > > > I do have a dog in this race, so I'm trying not to be too pushy. > But > > > > there are a lot of smart people who spent a lot of time figuring > this > > > > stuff out. If you have non-ITS segmentation and merging code, > you'll > > > > just end up chasing problems that have already been solved. > > > > > > > > If it's not obvious, I love talking about this stuff. So feel > free to > > > > ask me questions. > > > > > > > > Thanks, > > > > Shaun > > > > > > > > > > > > > > > > _______________________________________________ > > > > Openstack-docs mailing list > > > > Openstack-docs@lists.openstack.org > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs > > > > > > > > > > > > > _______________________________________________ > > > Openstack-docs mailing list > > > Openstack-docs@lists.openstack.org > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs > > > > > > > > _______________________________________________ > > Openstack-i18n mailing list > > Openstack-i18n@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-i18n > > > _______________________________________________ > Openstack-docs mailing list > Openstack-docs@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs
participants (3)
-
Shaun McCance
-
Tom Fifield
-
Ying Chun Guo