(CC OpenStack-i18n - consider joining!) A very nice lead in discussion to our two sessions related to this at the summit :) Right now, we have some custom code for 1 that has been customised for the OpenStack docs (for example, it excludes screen elements ;)), giving some of the practical benefits of ITS. It's in tools/generatepot. Looking forward to discussing more next week. Regards, Tom On 02/11/13 05:34, Shaun McCance wrote:
I have some experience (read: bias) in translation tools, so I'm writing up a synopsis in the hopes in will be useful for the documentation translation session at the summit next week.
Document translation generally follows a three-step process:
1) Segmentation: A program takes the XML files and breaks it up into chunks (often paragraphs) that can be individually translated and tracked. These are usually stored in either PO or XLIFF files, but in some systems they might be records in a database.
2) Translation: Translators translate those segments. They might edit the PO or XLIFF files directly. They might use a graphical front-end. They might do it through a web site that hides the files from them, but still presents the individual segments.
3) Merging: A program takes the translated segments, matches them up to the appropriate nodes in the source document, and writes a localized XML file.
Online tools like Transifex, Zanata, and Pootle are really about step 2, but they often include code for steps 1 and 3 to give you an all-in-one package. Unfortunately, to my knowledge, none of them use the W3C Internationalization Tag Set (ITS) to accomplish those steps. Luckily, they let you provide POT files and can give you PO files, which means you can plug your own code in for steps 1 and 3.
ITS is a W3C recommendation that provides a standard way to specify what parts of a document are translatable, what elements are inline, and various other things that are really critical for good segmentation. ITS 2.0 was released this week, and addresses a whole slew of other issues.
ITS lets you assert things about elements on a global level using XPath expression. For example, let's say we don't want any of our screen elements to be translated. We could use a rule like this:
<its:translateRule translate="no" selector="//db:screen"/>
Magically, hundreds of messages will disappear from translators' view, allowing them time to have dinner with their families instead. You can also mark things locally. So for example, if we don't want to exclude all screen elements from translation, then on the ones we do want to exclude, we'd write this:
<screen its:translate="no">
You can also specify which elements are within text (inline), which are space-preserving, where there are references to external resources like images that have to be localized, and lots more.
Biased opinion: If you have an XML translation process that doesn't involve ITS, you're doing something wrong. (Disclosure: I was on the working group that created ITS 2.0, and I'm the developer of itstool.)
There are a number of tools that support ITS. Many of them work with PO or XLIFF files, so you can plug them into most online translation tools. I happen to be fond of my program, itstool, which supports PO files and has a number of extensions that have been useful for other open source projects like GNOME.
If you want a workflow that uses XLIFF files, you should look into Okapi, a fantastic open source framework that supports ITS.
I do have a dog in this race, so I'm trying not to be too pushy. But there are a lot of smart people who spent a lot of time figuring this stuff out. If you have non-ITS segmentation and merging code, you'll just end up chasing problems that have already been solved.
If it's not obvious, I love talking about this stuff. So feel free to ask me questions.
Thanks, Shaun
_______________________________________________ Openstack-docs mailing list Openstack-docs@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs