Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
71 user(s) are online (52 user(s) are browsing Forums)

Members: 1
Guests: 70

BillE, more...

Headlines






pdf to txt library?
Just popping in
Joined:
2008/11/12 8:44
From Germany
Posts: 161
Is there a simple library on aos4 which allows to extract text from a pdf file in c++ programms?

Thanks,
nexus

   Report Go to top

Re: pdf to txt library?
Quite a regular
Joined:
2010/5/16 12:20
From Grimsby, UK
Posts: 950
@nexus

Are you trying to do something as a one off or coding something new?

   Report Go to top

Re: pdf to txt library?
Just can't stay away
Joined:
2009/5/1 18:57
From Czech Republic
Posts: 1337
@nexus

Quote:

Is there a simple library on aos4 which allows to extract text from a pdf file in c++ programms?

AFAIK no. There is libharu on OS4depot but it can only put text to PDF, not the other way round.

EDIT: Perhaps you can coax someone into porting Poppler for you: http://en.wikipedia.org/wiki/Poppler_%28software%29


Edited by trixie on 2010/11/23 13:22:26
_________________
The Rear Window blog

AmigaOne X5000 @ 2GHz / 4GB RAM / Radeon RX 560 / ESI Juli@ / AmigaOS 4.1 Final Edition
SAM440ep-flex @ 667MHz / 1GB RAM / Radeon 9250 / AmigaOS 4.1 Final Edition
   Report Go to top

Re: pdf to txt library?
Just popping in
Joined:
2008/11/12 8:44
From Germany
Posts: 161
hm. ok, i see.

poppler .. yes, it's a fork of xpdf and as we already have amipdf, which is a xpdf clone, i hoped, that maybe there's also an appropriate library.

thanks,
nexus

   Report Go to top

Re: pdf to txt library?
Just can't stay away
Joined:
2006/11/24 18:52
From Gloucestershire, UK.
Posts: 1172
@nexus

Not that I know of, but I had a similar problem converting pdf's for my kindle. After hunting through aminet and os4depot I found a couple of programs that let me do it:

pdftohtml: http://os4depot.net/download.php?file ... ext/convert/pdftohtml.lha

html2txt: http://uk.aminet.net/text/hyper/html2txt.lha

together with a little amiblitz program I wrote: http://www.am1ga.net/Left$

I then wrote the following dos script to convert pdf to html:

.bra {
.ket }
.key filename

set title `left$ "{filename}" -4` ; removes the .pdf extension
if exists "${title}.txt"
echo "File already converted."
quit
endif

echo "Converting to html."
pdftohtml -q "{filename}"
echo "converting to txt."
html2txt "${title}s.html" "${title}.txt" >nil:
;clean up the files created by pdftohtml and set
delete "${title}.html" "${title}s.html" "${title}_ind.html" quiet
unset title
echo "Done."

not very elegent and the conversion is a bit dodgy with some pdf's but it does the job.

_________________
Amiga user since 1985
AOS4, A-EON, IBrowse & Alinea Betatester

Ps. I hate the new amigans website. <shudder>
   Report Go to top

Re: pdf to txt library?
Just popping in
Joined:
2008/11/12 8:44
From Germany
Posts: 161
@Severin

hm. that's somehow unsatisfiable. I thought that at least the xpdf tools like pdftotxt were available but I've just found out that amipdf doesn't have them included. With my question about the library, i wanted to avoid the inelegant way of using an external tools to convert a pdf on dis and using then a file reader.

Your suggestion showed me that it's even more inelegant because one have to use a complete chain of external tools and hope that in the end, the result is sufficient "useful" for further processing.

Anyway, thanks..
nexus

   Report Go to top





[Advanced Search]



Powered by XOOPS 2.0 © 2001-2016 The XOOPS Project