I have some PDFs and I am trying to cut and paste text they contain from Acrobat Reader into an HTML form. It seems that some of these files use (I suspect) unicode for text encoding, so when I try to paste into the HTML form (on firefox) I get the little boxes with hex chars in them rather than readable text. The problem is not that the PDF has not been OCRed -- when I try to do that in Acrobat Pro it says it can't because the file already contains renderable text. Is there any way to deal with this? For example could I add some sort of javascript to the form that would do conversion?
相关问题
- Correctly parse PDF paragraphs with Python
- Set BaseUrl of an existing Pdf Document
- UrlEncodeUnicode and browser navigation errors
- Unicode issue with makemessages --all Django 1.6.2
- How can I get all text from a PDF in Swift?
相关文章
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- Python Sendgrid send email with PDF attachment fil
- UnicodeEncodeError when saving ImageField containi
- C# MVC website PDF file in stored in byte array, d
- How To Programmatically Enable/Disable 'Displa
- How to reduce PDF file size programmatically in Ja
- Why is TextView showing the unicode right arrow (\
It is quite possible that the text contains characters that get copied correctly but your browser is unable to display them, due to lack of suitable font. A PDF document may contain embedded fonts, so Adobe Reader displays the characters OK, but a browser lacks access to those fonts.
You can check whether this is the reason by trying to copy and paste the characters here (it might be useful info about the problem anyway). You could also download and install the Code200x fonts, which contain pretty much any character you can normally expect to encounter. (It is not guaranteed, but probable, that Firefox will be able to use those fonts automatically when needed.)
I have the same problem... Indeed it is explained here: http://forums.adobe.com/thread/915012
My solution was to convert the pdf to Word using the Exporting Tool of Acrobat and then extract the information I need from it.
It's frustrating but that work.
Another solution that I find is to convert the pdf in images (jpeg, png, etc) and then run an OCR process.
You can export from acrobat as jpeg, then open the jpeg in acrobat (not reader) then run the OCR tool. From there you should be able to copy/paste.
We had similar problem trying to copy/paste cyrillics from a PDF file into Excel.
The easiest solution we found was to open the .pdf with a browser (Chrome, Mozilla or Opera) and copy/paste the text in Word, Excel.
It didn't work with IE, as expected.
I had the same problem but I solved it by opening the PDF file with the web-browser (chrome in my case). Copy-and-pasting non-ASCII encoding works fine in chrome.