By John Glynn
I wrote an article on Optical Character Recognition (OCR) in the July 2011 Bits'N'Bytes, which detailed how to convert an image file into a single bit TIFF file by means of the Gimp program.
The conversion program tesseract requires a tiff file to produce a text file.
This manual process works well, but can be a little tedious if you want to process a lot of image files to text.
The backend program tesseract has improved and now handles columns so I have written a bash shell script
which automates the whole process.
You can download the scripts here.
The script requires two programs tesseract and convert. The convert command is part of the Imagemagik suite.
1. Install tesseract and Imagemagik on your OS.
2. Make a directory called Convert directly under your home directory. eg, /home/john/Convert
3. Download a copy of the shell script
4. MultiConvert at a Club meeting from the ??? and place it in /home/$USER/bin director
5. Make sure this program is executable!