[Openstack-docs] On translations and ITS

Shaun McCance shaunm at gnome.org
Fri Nov 1 18:34:34 UTC 2013


I have some experience (read: bias) in translation tools, so I'm writing
up a synopsis in the hopes in will be useful for the documentation
translation session at the summit next week.

Document translation generally follows a three-step process:

1) Segmentation: A program takes the XML files and breaks it up into
chunks (often paragraphs) that can be individually translated and
tracked. These are usually stored in either PO or XLIFF files, but in
some systems they might be records in a database.

2) Translation: Translators translate those segments. They might edit
the PO or XLIFF files directly. They might use a graphical front-end.
They might do it through a web site that hides the files from them, but
still presents the individual segments.

3) Merging: A program takes the translated segments, matches them up to
the appropriate nodes in the source document, and writes a localized XML
file.

Online tools like Transifex, Zanata, and Pootle are really about step 2,
but they often include code for steps 1 and 3 to give you an all-in-one
package. Unfortunately, to my knowledge, none of them use the W3C
Internationalization Tag Set (ITS) to accomplish those steps. Luckily,
they let you provide POT files and can give you PO files, which means
you can plug your own code in for steps 1 and 3.

ITS is a W3C recommendation that provides a standard way to specify what
parts of a document are translatable, what elements are inline, and
various other things that are really critical for good segmentation. ITS
2.0 was released this week, and addresses a whole slew of other issues.

http://www.w3.org/TR/its20/

ITS lets you assert things about elements on a global level using XPath
expression. For example, let's say we don't want any of our screen
elements to be translated. We could use a rule like this:

<its:translateRule translate="no" selector="//db:screen"/>

Magically, hundreds of messages will disappear from translators' view,
allowing them time to have dinner with their families instead. You can
also mark things locally. So for example, if we don't want to exclude
all screen elements from translation, then on the ones we do want to
exclude, we'd write this:

<screen its:translate="no">

You can also specify which elements are within text (inline), which are
space-preserving, where there are references to external resources like
images that have to be localized, and lots more.

Biased opinion: If you have an XML translation process that doesn't
involve ITS, you're doing something wrong. (Disclosure: I was on the
working group that created ITS 2.0, and I'm the developer of itstool.)

There are a number of tools that support ITS. Many of them work with PO
or XLIFF files, so you can plug them into most online translation tools.
I happen to be fond of my program, itstool, which supports PO files and
has a number of extensions that have been useful for other open source
projects like GNOME. 

http://itstool.org/

If you want a workflow that uses XLIFF files, you should look into
Okapi, a fantastic open source framework that supports ITS.

http://okapi.sourceforge.net/

I do have a dog in this race, so I'm trying not to be too pushy. But
there are a lot of smart people who spent a lot of time figuring this
stuff out. If you have non-ITS segmentation and merging code, you'll
just end up chasing problems that have already been solved.

If it's not obvious, I love talking about this stuff. So feel free to
ask me questions.

Thanks,
Shaun





More information about the Openstack-docs mailing list