Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: The best OCR system is now available on Linux

__/ [ BearItAll ] on Thursday 15 June 2006 10:31 \__

> Roy Schestowitz wrote:
> 
>> __/ [ goposter@xxxxxxxxxx ] on Thursday 15 June 2006 09:12 \__
>> 
>>> What a difference a couple of years make...
>>> 
>>> When I first inquired this company about Linux support, they were not
>>> very enthusiastic about it. Now they got the "Penguin Fever" and they
>>> tell me that are getting lots of requests from Linux users.
>>> 
>>> The current versions are 8.0 for Windows and 6.0 for Linux, but the
>>> plan is to jump to 8.0 Linux and from then on, countinue with the same
>>> version number on both platforms.
>>> 
>>>    http://www.abbyy.com/sdk/?param=28804
>>> 
>>> -RFH
>> 
>> There are already some fine programs that are both Free and Open Source.
>> Frankly, having fallen to proprietary traps in the past, I would rather go
>> for one of the projects listed below.
>> 
>>         http://software.newsforge.com/article.pl?sid=05/12/15/1848236
>> 
>> OCR was once said to be a field with deficiencies on the GNU/Linux
>> platform. It is no longer quite the case though.
>> 
>> Best wishes,
>> 
>> Roy
>> 
> 
> The problem for me has been in finding one that is scriptable. So that it
> can be fitted into a system such as this,
> 
> scan -> ocr -> parse for document database -> put document into library.
> 
> They all seem bent on giving you a GUI, but that just slows the whole thing
> down and no real chance of automation. Ten documents is fine using GUI, but
> 20,000 isn't so much fun.

I was once jammed in this type of scenario. I had to process thousands of
images in datestamp-type subdirectories (with multiple levels). While Pixie
Plus could handle pictures in batch mode, they had to reside in the same
directory. So with the help of the local LUG, I got a script to crawl the
directories and create hard links to all the images. I could then modify
everything in batch mode, from the GUI (one effect at a time though, as it
is not scriptable). 

without a doubt, command-line tools surpass many alternatives because they
are scriptable. Here, for instance, is a script that runs on my computer
every 10 minutes:

#!/bin/sh

export DISPLAY=localhost:0.0
   # set display (to make cron job work)


###################################
# Get current virtual desktop

import -window root ~/public_html/screen-temp.jpeg
   # capture display
mogrify -resize 25% -border 3 ~/public_html/screen-temp.jpeg
   # save to temporary file so as to avoid full-sized
   # image from being public for a second
convert ~/public_html/screen-temp.jpeg  -font Bookman-DemiItalic -pointsize
20 -fill gray -stroke white -draw "text 40,20 '`date` - schestowitz.com'"
~/public_h
tml/screen-temp.jpeg
convert ~/public_html/screen-temp.jpeg  -font Bookman-DemiItalic -pointsize
20 -fill darkblue -stroke blue -draw "text 42,22 '`date` - schestowitz.com'"
~/public_html/screen-temp.jpeg
cp ~/public_html/screen.jpeg ~/public_html/screen-previous.jpeg
   # save the previous screenshot
mv ~/public_html/screen-temp.jpeg ~/public_html/screen.jpeg
   # below are bits that write information to a simple text file


###################################
# Grab pager with 8 vitual desktops

import -window root -crop 1000x120+1638+840 ~/public_html/pager-temp.jpeg
   # save to temporary file (unneeded)
mogrify -resize 75% -border 3 ~/public_html/pager-temp.jpeg
convert ~/public_html/pager-temp.jpeg  -font Bookman-DemiItalic -pointsize 16
-fill black -stroke black -draw "text 205,18 '`date` - schestowitz.com'"
~/public_
html/pager-temp.jpeg
convert ~/public_html/pager-temp.jpeg  -font Bookman-DemiItalic -pointsize 16
-fill white -stroke blue -draw "text 206,19 '`date` - schestowitz.com'"
~/public_h
tml/pager-temp.jpeg
cp ~/public_html/pager.jpeg ~/public_html/pager-previous.jpeg
   # save the previous pager screenshot
mv ~/public_html/pager-temp.jpeg ~/public_html/pager.jpeg
   # below are bits that write information to a simple text file

###################################
# Get some text stats

echo "Refresh cycle is currently set to 10 minutes" >
~/public_html/caption.txt
echo "" >> ~/public_html/caption.txt
echo "Image last captured on: " >> ~/public_html/caption.txt
date >> ~/public_html/caption.txt
echo "" >> ~/public_html/caption.txt
TERM=linux
export TERM
top -b -n 1 >> ~/public_html/caption.txt
   # echo "List of processes omitted" >> ~/public_html/caption.txt
echo "" >> ~/public_html/caption.txt
echo "______________________________________________" >>
~/public_html/caption.txt
echo "Scripts set up by Roy Schestowitz, August 2005" >>
~/public_html/caption.txt


-- 
Roy S. Schestowitz      |    "Signature pending approval"
http://Schestowitz.com  |  SuSE GNU/Linux   ¦     PGP-Key: 0x74572E8E
  3:00pm  up 48 days 20:14,  11 users,  load average: 2.31, 2.21, 2.24
      http://iuron.com - help build a non-profit search engine

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index