• Join
  • Meeting Times and Locations
    • Caloundra Technology Education Centre
    • Buderim Technology Education Centre
  • Contact Us
  • Q and A
  • Free Resourses
    • Security Basics
    • WiFi Safety Guide
    • Open Office Tutorials
    • Apple Tutorials
    • Stress Management
    • The Gimp Tutorial Videos
    • Members Website Design Group
  • Interests Supported
    • Linux Operating Systems
    • Apple Mac Operating Systems
    • DIY Website Design and Marketing
    • Graphics & Digital Image Manipulation (Including Slide Shows)
  • Articles
    • Newest Articles
    • Article Index
    • Magazine Bits’N’Bytes Library
    • Health and Environmental Effects of Technology
    • Computer Security
    • Open Office
    • Audio and Sound
    • Linux
    • Buying and Selling on Ebay
    • Web Artist, Basic Web Page Creation
    • The Food Lab
    • Daily Local Background Radiation Levels and Advisories
  • Members Websites

OCR for Linux

By John Glynn

I wrote an article on Optical Character Recognition (OCR) in the July 2011 Bits'N'Bytes, which detailed how to convert an image file into a single bit TIFF file by means of the Gimp program.

The conversion program tesseract requires a tiff file to produce a text file.

This manual process works well, but can be a little tedious if you want to process a lot of image files to text.

The backend program tesseract has improved and now handles columns so I have written a bash shell script
which automates the whole process.

You can download the scripts here.
Installation:

The script requires two programs tesseract and convert. The convert command is part of the Imagemagik suite.

1. Install tesseract and Imagemagik on your OS.
2. Make a directory called Convert directly under your home directory. eg, /home/john/Convert
3. Download a copy of the shell script
4. MultiConvert at a Club meeting from the ??? and place it in /home/$USER/bin director
5. Make sure this program is executable!

 

Sunshine Coast Computer Club
Copyright © 2021 All Rights Reserved
Website Design by Website Design Centre
Powered by WordPress