Tips 'n' Tools for Translators

	March '25
26-03-25
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

ABBYY's logo ABBYY, the software company behind FineReader, arguably the best OCR software package in town at the moment, is currently offering two of its products, FineReader 12 and PDF Transformer+, at a 25% discount until 31 December.

FineReader scans paper documents and images and turns them into digital files, which you can save in one of various Windows or Mac formats. Its optical character recognition ('OCR') is highly accurate and can be used on a host of source languages.

PDF Transformer+ is an easy-to-use program for editing PDF files and converting them into editable file formats like Microsoft Word, PowerPoint and Open Office Writer. It also lets you scan paper documents and create editable PDF files.

Both of these programs help translators transform documents that customers send them on paper or as PDFs into digital files that can be processed in common word-processing packages and then translated with CAT tools, for example.

To find out more about this offer, click here to go to ABBYY's website. You can watch short videos there that show you what the programs can do and download a trial version of each program as well if you want to try them out before buying them.

This special offer has been running for a while now and is only valid until 31 December, so be quick! (I bought PDF Transformer+ recently and thoroughly recommend it; it does its job well, is easy to use and is great value for money in my opinion.)

Regards

Carl

PDF files are constantly being created by businesses and non-profit organisations to show colleagues, customers and other interested parties what material has been written or drawn and what its layout will be like once it's printed. Basically, they are exact images of documents and can be viewed on computers running on various operating systems, not just Microsoft Windows.

PDFs can either be created from other electronic file formats such as Word .docx files or they can be generated by a scanner. Depending on what settings have been made in the software, the PDF files that are created may or may not be searchable. If they are, then individual words can be found in them thanks to a processing step called optical character recognition, or OCR for short. It's usually quite easy to create an editable Word file thanks to this kind of data processing; in Adobe Acrobat XI, for example, you just select these items in the 'File' menu to export the contents into a new Word document:

save to Word format

The 'tough nuts', in contrast, are the scanned images of paper documents we sometimes get sent, as it can take a lot of time and effort to create a reasonable editable text from these that can then be typed over and translated. To do this, you will need to use OCR software on the file in question to try and turn the image of the document into a set of legible and hopefully correctly rendered words. Sometimes this can work well, especially if you use high-quality programs such as Acrobat, ABBYY Finereader or Nuance OmniPage, which come with powerful character-recognition software. But things don't always go to plan, and the results of OCR'ing a scanned image can also be very disappointing, requiring copious editing – or even a completely different approach to creating a translatable file.

This is the situation you may also find yourself in if you ever get sent a PDF file that has been protected (i.e. 'secured') in some way – by a password, for example, meaning you can only open it or add comments to it if you enter the password first (providing you are authorised to do so). If you don't have the password, you won't be given the full right to use and process the file. This also means you won't be able to copy its contents and paste them into a blank Word file for translation. And what then?

Asking the customer for the password may be the obvious answer here, but if they don't have it themselves and are unable (or unwilling) to get it, what else should you do? Well, there are various suggestions about this on the internet, some of which I've tried out, but have you ever thought of using a simple work-around with a printer? That may be a faster and simpler way of getting round the password-protection issue.

If you are able to print the file out (this may not be allowed, depending on what properties the PDF has been given – see the screen shot below on how to access these in Adobe Acrobat XI), then do so using the best resolution and clearest print you can. Scan the printout and create a brand-new, multi-page PDF from it yourself. Most types of scanner software will let you do this, including the three I've just mentioned.

PDF properties

When the scanner creates the new PDF file, get it to make the file searchable when you check or adjust the settings beforehand; it will then OCR it (don't forget to tell it which language it should recognise first, though). Once you've got the file, check it to see if the quality of the text is okay, and if it is, export the contents into a new Word file. Now you should find you have a Word document that is straightforward to translate. A little editing may be necessary, but not much (utilities like CodeZapper and TransTools Suite will help you tidy the file up if need be).

Thanks to my German colleague Ludger Giebel for mentioning this idea.

Carl

Related links

- My earlier post on converting PDFs into a translation-friendly format using Wordfast Anywhere

- My earlier post on Acrobat XI and Acrobat Reader

- Kilgray, the maker of memoQ, on converting PDFs using various tools, including their own CAT tool

- Eric le Carre on translating PDFs using various free tools

Comments

Jeremy Angel about Coping with a bug in memoQ 2014

Thu, 16.04.2015 02:29

Thanks very much for this. Jus t saved my day!

Kevin Lossner about Promising new features in memoQ 2013 R2

Sun, 27.10.2013 10:29

Thank you for this one, Carl - you just made my evening. There are so many interesting things coming in this R2 [...]

Entries tagged as ocr

Categories

Calendar

Comments

How to leave a comment

Entries tagged as ocr

29-12-16

Special end-of-year offers on scanner and PDF software

16-03-16

Preparing PDF files for translation - how to handle a protected file

main menu

Blog Administration

Subscribe to this blog

My profile on LinkedIn

Recent entries

Archives

Quick search

Wikipedia search

Links to dictionaries

View with another template