Thank you for this one, Carl -
you just made my evening.
There are so many interesting
things coming in this R2 [...]
How to leave a comment
If you'd like to comment on a post, you can either do so by clicking on "Comments" at the end of the post or by contacting me by e-mail. I look forward to hearing from you.
ABBYY, the software company behind FineReader, arguably the best OCR software package in town at the moment, is currently offering two of its products, FineReader 12 and PDF Transformer+, at a 25% discount until 31 December.
FineReader scans paper documents and images and turns them into digital files, which you can save in one of various Windows or Mac formats. Its optical character recognition ('OCR') is highly accurate and can be used on a host of source languages.
PDF Transformer+ is an easy-to-use program for editing PDF files and converting them into editable file formats like Microsoft Word, PowerPoint and Open Office Writer. It also lets you scan paper documents and create editable PDF files.
Both of these programs help translators transform documents that customers send them on paper or as PDFs into digital files that can be processed in common word-processing packages and then translated with CAT tools, for example.
To find out more about this offer, click here to go to ABBYY's website. You can watch short videos there that show you what the programs can do and download a trial version of each program as well if you want to try them out before buying them.
This special offer has been running for a while now and is only valid until 31 December, so be quick! (I bought PDF Transformer+ recently and thoroughly recommend it; it does its job well, is easy to use and is great value for money in my opinion.)
PDF files are constantly being created by businesses and non-profit organisations to show colleagues, customers and other interested parties what material has been written or drawn and what its layout will be like once it's printed. Basically, they are exact images of documents and can be viewed on computers running on various operating systems, not just Microsoft Windows.
PDFs can either be created from other electronic file formats such as Word .docx files or they can be generated by a scanner. Depending on what settings have been made in the software, the PDF files that are created may or may not be searchable. If they are, then individual words can be found in them thanks to a processing step called optical character recognition, or OCR for short. It's usually quite easy to create an editable Word file thanks to this kind of data processing; in Adobe Acrobat XI, for example, you just select these items in the 'File' menu to export the contents into a new Word document:
The 'tough nuts', in contrast, are the scanned images of paper documents we sometimes get sent, as it can take a lot of time and effort to create a reasonable editable text from these that can then be typed over and translated. To do this, you will need to use OCR software on the file in question to try and turn the image of the document into a set of legible and hopefully correctly rendered words. Sometimes this can work well, especially if you use high-quality programs such as Acrobat, ABBYY Finereader or Nuance OmniPage, which come with powerful character-recognition software. But things don't always go to plan, and the results of OCR'ing a scanned image can also be very disappointing, requiring copious editing – or even a completely different approach to creating a translatable file.
This is the situation you may also find yourself in if you ever get sent a PDF file that has been protected (i.e. 'secured') in some way – by a password, for example, meaning you can only open it or add comments to it if you enter the password first (providing you are authorised to do so). If you don't have the password, you won't be given the full right to use and process the file. This also means you won't be able to copy its contents and paste them into a blank Word file for translation. And what then?
Asking the customer for the password may be the obvious answer here, but if they don't have it themselves and are unable (or unwilling) to get it, what else should you do? Well, there are various suggestions about this on the internet, some of which I've tried out, but have you ever thought of using a simple work-around with a printer? That may be a faster and simpler way of getting round the password-protection issue.
If you are able to print the file out (this may not be allowed, depending on what properties the PDF has been given – see the screen shot below on how to access these in Adobe Acrobat XI), then do so using the best resolution and clearest print you can. Scan the printout and create a brand-new, multi-page PDF from it yourself. Most types of scanner software will let you do this, including the three I've just mentioned.
When the scanner creates the new PDF file, get it to make the file searchable when you check or adjust the settings beforehand; it will then OCR it (don't forget to tell it which language it should recognise first, though). Once you've got the file, check it to see if the quality of the text is okay, and if it is, export the contents into a new Word file. Now you should find you have a Word document that is straightforward to translate. A little editing may be necessary, but not much (utilities like CodeZapper and TransTools Suite will help you tidy the file up if need be).
Thanks to my German colleague Ludger Giebel for mentioning this idea.
Carl
Related links
- My earlier post on converting PDFs into a translation-friendly format using Wordfast Anywhere
I'm a full-time German-to-English translator and proof-reader currently based in Germany. Click here to see my profile on LinkedIn and learn more about my activities. I often write posts there, so if you follow me, you'll get them automatically.
(Please refer to this link on my blog if you want to link up, and personalise your request so I can tell it's not just spam. I look forward to hearing from you.)
There are so many online dictionaries around these days that it's hard to know which ones to use. I've collated a good number of monolingual and bilingual resources I can recommend to translators working in English and German on my website. Click here to view them. A few dictionaries in other languages such as French and Danish are also included.
See this page of the website for links to English glossaries on business, politics, humanities and technical fields.
Links to patent-related terminology are also listed on a page of their own.
Online dictionary of the week:Tureng. This site actually offers four bilingual dictionaries (German - English, Turkish - English, Spanish - English and French - English) plus an English synonym dictionary. They're all free to use. I've often found the German-English one helpful.
Comments