Will you agree that translating a PDF using translation memories remains quite a controversial challenge even when we have superpower SDL Studio, MemoQ, SmartCAT and others that can handle such type of files? Both editable and non-editable. The output quality often leaves much to be desired. It is quite difficult to format the translated text and the CAT segmentation is not always correct.

In order to make the target document look properly, the original file has to be converted to an editable that CAT can easily digest. These include MS Office (Word, Excel, PowerPoint, etc). Precise formatting ensures correct segmentation, which is very important when you work with TMs. However, the effort spent on such task made me look for other opportunities.

Recently I tried Infix PDF Editor as an alternative to Word. This software makes it possible to edit PDF files directly and extract text to translate in SDL Studio, MemoQ and other CATs. The second function looks very much attractive as it allows using the existing TMs. However, as I learned it working with different PDF types I came across several difficulties, which do not let me call this program a universal solution for preparation of PDFs for translation in CATs. Here I would like to share my experience and compare the possibilities for preparation and translation of PDFs using MS Word and Infix.

The PDFs I usually work with are drawings, product catalogs, and regular text files saved as PDF. Such documents came either in editable or non-editable format. All of them were prepared for translation in CAT

I. Preparing files in Infix

1. OCR in FineReader

Optional. Required only if the file arrives in the non-editable format.

2. Fonts

The fonts with which the original file was composed may be not available on your computer, which generates numerous warnings, making you replace such fonts with those existing in the operating system or download them from the Internet.

3. Text markup

Segmenting text blocks in Infix may be challenging. You will have to see that the text is within the borders of the marked up blocks.

4. Export to XML or TXT

It is required for translation in a CAT. When I tried to export a 100-page file to xml, I got an unknown error in SDL that would not let me open the file. I had to export the text to TXT and spend some extra time to prepare it for correct display in Studio.

5. Translation in CAT (Studio, MemoQ, etc.)

6. Import back to PDF

7. Alignment/checking of text blocks after import

The result looks quite well and striking. However, I doubt many translators will have a luxury of time and effort to enjoy such result. There were some positive things, though.

The error with the XML occurred with a big 100-page file. I could manage to export drawings (10 pages max) to XML with minimum effort and time, though. Apparently, that was because of little information in the drawing. I need to say that to translate PDF drawings before, I had to either copy them to MS Word and apply text boxes onto the captions or do them in Photoshop or similar software. Both methods are quite time-consuming because in the first case you have to work with every caption, and for the second method you have to be familiar with Photoshop, GIMP or similar software.

Considering that translation costs for translation of drawings are objectively higher than those for translating PDF text files, also with no proper alternative (I do not consider original dwg drawings), Infix seems to be an option worth considering. It is also possible to export text and send to a translator who does not have skills to work with drawings or PDF files.

Benefits and drawbacks of Infix


1. Translation format almost identical to the original.

2. Possibility to translate a file in a CAT tool.


1. Having to look through the file after translating in CAT is a huge time eater. If you have to handle a massive file, it needs to be taken into account.

2. Another problem is handling tables. Infix treats them as graphic objects; therefore, translated text has to be squeezed to the existing cells. Moreover, if the text comes very close to a table border, it can be recognized as underlined text, which can result in disappearing of this border, if the text moves. The format is distorted.

3. Exporting huge texts to XML can cause problems in SDL Studio. I have no knowledge of XML; therefore, it turned out to be an insurmountable obstacle for me. I had to import the text to txt. It required extra time since I had to manually hide tags and segment the text. Without proper skills of MS Word it is not feasible.

II. Logic of preparing files in MS Word

1. OCR in ABBYY FineReader

Unlike the Infix logic here this step is mandatory. At this point, I mark up the text blocks, recognize and check spelling to avoid losing meaningful pieces of text. After that, I export the text to Word as Plain text (one of the settings in FR). I mark up the text manually because the blocks are not always correctly recognized in automatic mode. Plain text is the best for me as FR cannot reproduce the original formatting as required.

2. Word

The main goal here is to reproduce the formatting. Better in manual mode using styles. You will be able to save time considerably preparing a file for translation if you create a Word template. This is feasible when you have several files with a similar format. It is also suitable for single files as numbering, headings, indentations are available in every document. You will apply the styles changing their format, which in the end is faster than format everything anew. Hotkeys help make the process even faster.

As a result, you will have a document fully prepared for translation in CAT. Its format will remain the same after exporting from SDL studio (for example).

Benefits and drawbacks of Word


1. Easy to prepare


1. Time required for OCR and Word markup

2. Original formatting not always retained


Considering time and difficulties I faced when preparing the text and booklets in Infix as compared the same with Word, I think it is more feasible to use Word for this purpose.

At the same time, due to lack of a decent alternative for preparation drawings for CAT translation, I think it would be better to prepare drawings in Infix.

Paul Filkin recently announced that the latest version of Infix is capable of exporting to xliff. This is a meaningful breakthrough for the company, nevertheless, text markup and checking after import will still be required, maybe in less extent. I think I should try this new function of the software as soon as the chance is there.

