Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
128 user(s) are online (64 user(s) are browsing Forums)

Members: 1
Guests: 127

imagodespira, more...

Headlines

 
  Register To Post  

pdf to txt library?
Just popping in
Just popping in


See User information
Is there a simple library on aos4 which allows to extract text from a pdf file in c++ programms?

Thanks,
nexus

Go to top
Re: pdf to txt library?
Quite a regular
Quite a regular


See User information
@nexus

Are you trying to do something as a one off or coding something new?

Go to top
Re: pdf to txt library?
Just can't stay away
Just can't stay away


See User information
@nexus

Quote:

Is there a simple library on aos4 which allows to extract text from a pdf file in c++ programms?

AFAIK no. There is libharu on OS4depot but it can only put text to PDF, not the other way round.

EDIT: Perhaps you can coax someone into porting Poppler for you: http://en.wikipedia.org/wiki/Poppler_%28software%29


Edited by trixie on 2010/11/23 13:22:26
The Rear Window blog

AmigaOne X5000 @ 2GHz / 4GB RAM / Radeon RX 560 / ESI Juli@ / AmigaOS 4.1 Final Edition
SAM440ep-flex @ 667MHz / 1GB RAM / Radeon 9250 / AmigaOS 4.1 Final Edition
Go to top
Re: pdf to txt library?
Just popping in
Just popping in


See User information
hm. ok, i see.

poppler .. yes, it's a fork of xpdf and as we already have amipdf, which is a xpdf clone, i hoped, that maybe there's also an appropriate library.

thanks,
nexus

Go to top
Re: pdf to txt library?
Just can't stay away
Just can't stay away


See User information
@nexus

Not that I know of, but I had a similar problem converting pdf's for my kindle. After hunting through aminet and os4depot I found a couple of programs that let me do it:

pdftohtml: http://os4depot.net/download.php?file ... ext/convert/pdftohtml.lha

html2txt: http://uk.aminet.net/text/hyper/html2txt.lha

together with a little amiblitz program I wrote: http://www.am1ga.net/Left$

I then wrote the following dos script to convert pdf to html:

.bra {
.ket }
.key filename

set title `left$ "{filename}" -4` ; removes the .pdf extension
if exists "${title}.txt"
echo "File already converted."
quit
endif

echo "Converting to html."
pdftohtml -q "{filename}"
echo "converting to txt."
html2txt "${title}s.html" "${title}.txt" >nil:
;clean up the files created by pdftohtml and set
delete "${title}.html" "${title}s.html" "${title}_ind.html" quiet
unset title
echo "Done."

not very elegent and the conversion is a bit dodgy with some pdf's but it does the job.

Amiga user since 1985
AOS4, A-EON, IBrowse & Alinea Betatester

Ps. I hate the new amigans website. <shudder>
Go to top
Re: pdf to txt library?
Just popping in
Just popping in


See User information
@Severin

hm. that's somehow unsatisfiable. I thought that at least the xpdf tools like pdftotxt were available but I've just found out that amipdf doesn't have them included. With my question about the library, i wanted to avoid the inelegant way of using an external tools to convert a pdf on dis and using then a file reader.

Your suggestion showed me that it's even more inelegant because one have to use a complete chain of external tools and hope that in the end, the result is sufficient "useful" for further processing.

Anyway, thanks..
nexus

Go to top

  Register To Post

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project