Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: PDF to text ?

  • Subject: Re: PDF to text ?
  • From: Roy Schestowitz <newsgroups@schestowitz.com>
  • Date: Wed, 15 Jun 2005 03:18:58 +0100
  • Newsgroups: alt.html
  • References: <ba2a7$42af1240$4396a33a$10193@TCSN.NET> <pan.2005.06.14.19.27.51.722335@tobyinkster.co.uk> <AAHre.95$S17.22332@monger.newsread.com>
  • User-agent: KNode/0.7.2
Jonathan N. Little wrote:

> Toby Inkster wrote:
>> J. Muir wrote:
>> 
>> 
>>>I have a pdf file that is set so I can't copy and paste the text. Does
>>>anyone know of a good Mac (OS 9.1) program that will convert a pdf file
>>>to text? I tried the adobe online tool , but it didn't work.
>> 
>> 
>> A combination of the "ps2ascii" and "pdf2ps" tools which form part of
>> Ghostscript should do this. There is almost certainly a port for Mac OS
>> X. Not sure about OS 9 though.
>> 
> Although probably the reason the text is in PDF format is that the
> author did not want your to edit it, but one can always do a screen
> capture and use an OCR program to convert to text in a pinch. Textbridge
> for WinBoxes I am sure some comparable program for Mac

Some PDF conversions that I came across didn't allow highlighting of text.
Even Acrobat Reader 7 (Linux) could not handle it. There is an option under
"File" for exporting the document as text. This does a tremendous job, but
the output is sometimes void.

Screen captures are tedious and they require some demanding software to do
text recognition. They will also suffer from equations, figures, etc. and
will not hold information about structure (e.g. nested bulletpoints).

My advice to you is to get hold of the source. It will take you less time
than that struggle with annoying PDF's. That's why some people prefer
sending the LaTeX around. If you want to view it, compile it on your
machine... as HTML (latex2html), as PostScript, as PDF, or whatever you
fancy.

Roy

-- 
Roy S. Schestowitz
http://Schestowitz.com

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index