Text Extraction Problem
[Logo]
ICEsoft.org Forums: ICEfaces, ICEmobile, ICEpdf
[Search] Search   [Recent Topics] Recent Topics   [Groups] Home Page | www.icefaces.org  [Register] Register  [Login] Login 
Text Extraction Problem  XML
Forum Index -> ICEpdf General
Author Message
banu

Joined: 12/02/2010 00:00:00
Messages: 5
Offline


I am using Icepdf OS version. I am having trouble in extracting special characters from the page. When I try to extract text from page, all texts extracted fine but not the special characters like "" or '.

But the page renders fine and only text extraction is giving me this problem

Can some one help me tp fix this issue
[Email]
patrick.corless

Joined: 26/10/2004 00:00:00
Messages: 1097
Offline


Is there any change you can post an example file?
[Email]
banu

Joined: 12/02/2010 00:00:00
Messages: 5
Offline


I am attaching extracted text and actual page image.
[Thumb - page1.png]
 Filename page1.png [Disk] Download
 Description Text from this page
 Filesize 52 Kbytes
 Downloaded:  56 time(s)

 Filename extractedtext.txt [Disk] Download
 Description Text extracted
 Filesize 3 Kbytes
 Downloaded:  139 time(s)

[Email]
patrick.corless

Joined: 26/10/2004 00:00:00
Messages: 1097
Offline


I can see the missing ' on "Berkeley (UCB) as part of UCB?s public ". Any chance you can post the original PDF file?
[Email]
banu

Joined: 12/02/2010 00:00:00
Messages: 5
Offline


thanks for your quick response. It also happens to the page II of the attached pdf
 Filename ICEfacesDevelopersGuide.pdf [Disk] Download
 Description pdf file
 Filesize 2279 Kbytes
 Downloaded:  61 time(s)

 Filename extractedtext.txt [Disk] Download
 Description extracted text file
 Filesize 1 Kbytes
 Downloaded:  37 time(s)

[Email]
patrick.corless

Joined: 26/10/2004 00:00:00
Messages: 1097
Offline


I've created bug http://jira.icefaces.org/browse/PDF-167 for this issue. I've included a quick fix if your able to compile from source. A more formal fix will be in the next release.
[Email]
banu

Joined: 12/02/2010 00:00:00
Messages: 5
Offline


Fix works.. Thank you very much for your help and quick reponse.
[Email]
 
Forum Index -> ICEpdf General
Go to:   
Powered by JForum 2.1.7ice © JForum Team