I Love It When Being Wrong Leads to New Discoveries

Up until recently I had assumed that anyone working as a professional translator would just take a document and... retype it. It never occurred to me that this wouldn’t work very well at all when the documents contain a lot of formatting, such as PDFs, webpages or even Word documents.

CAT tools (computer assisted translation) gives professional (or part-time translators) the ability to feed in a document (with all the lovely formatting), translate the content independently of the formatting (usually by splitting the content into individual sentences) and having the software build a new document with the new translated content. Now it makes sense.

The tools do a lot more cool stuff on top of that, of course.

What Brought This On?

My lovely girlfriend, Dina, was fortunate enough to be born in Italy, growing up speaking both Italian and English. Recently she has been going through the process of becoming a professional translator.

Becoming a language translator is no easy feat. Apart from the obvious (that you have to know two languages very, very well) - there are expensive and difficult exams to go through; few are globally recognised. Even after completing all that, if you choose to be a freelance translator the work itself is certainly not queueing up for you - at least not initially...

Getting Back to the "Making the World A Better Place" Thing?

Wikipedia has a lot of articles, with English leading in terms of most articles. It absolutely blows my mind how much time, detail and information has been published to Wikipedia - with no indication of slowing down.

Wikipedia relies on translators (amongst lots of other people, of course) to help. There is at least one tool available that's similar to a CAT tool, but for the most part translation is a time consuming task.

Making it even more time consuming is that wiki markup (the test based formatting Wikipedia uses) isn’t understood by CAT tools - specially in this case, SmartCAT - and so translating pages whilst maintaining the formatting is difficult, especially when dealing with tables and references.

I’m no good at human language translation, but I’m pretty fluent with the odd computer language. This is what inspired me to start a little side project to help give back:


It is still very early stages and there a lot of minor tweaks to be made, but it’s a start. :)

The more information made available to greater audiences the better off we all are. I would like to thank everyone that has invested their valuable time into helping the Wikipedia project. You are making the world a better place.


comments powered by Disqus