How to cut-paste from PDF with non-ASCII encoding?

2020-03-17 03:34发布

I have some PDFs and I am trying to cut and paste text they contain from Acrobat Reader into an HTML form. It seems that some of these files use (I suspect) unicode for text encoding, so when I try to paste into the HTML form (on firefox) I get the little boxes with hex chars in them rather than readable text. The problem is not that the PDF has not been OCRed -- when I try to do that in Acrobat Pro it says it can't because the file already contains renderable text. Is there any way to deal with this? For example could I add some sort of javascript to the form that would do conversion?

7条回答
地球回转人心会变
2楼-- · 2020-03-17 04:16

It is quite possible that the text contains characters that get copied correctly but your browser is unable to display them, due to lack of suitable font. A PDF document may contain embedded fonts, so Adobe Reader displays the characters OK, but a browser lacks access to those fonts.

You can check whether this is the reason by trying to copy and paste the characters here (it might be useful info about the problem anyway). You could also download and install the Code200x fonts, which contain pretty much any character you can normally expect to encounter. (It is not guaranteed, but probable, that Firefox will be able to use those fonts automatically when needed.)

查看更多
Fickle 薄情
3楼-- · 2020-03-17 04:19

I have the same problem... Indeed it is explained here: http://forums.adobe.com/thread/915012

My solution was to convert the pdf to Word using the Exporting Tool of Acrobat and then extract the information I need from it.

It's frustrating but that work.

Another solution that I find is to convert the pdf in images (jpeg, png, etc) and then run an OCR process.

查看更多
姐就是有狂的资本
4楼-- · 2020-03-17 04:22

You can export from acrobat as jpeg, then open the jpeg in acrobat (not reader) then run the OCR tool. From there you should be able to copy/paste.

查看更多
smile是对你的礼貌
5楼-- · 2020-03-17 04:24
  1. Select the text in Acrobat.
  2. Right-click and select "Copy with formatting" from the context menu.
  3. Wait for the progress bar to process the text.
  4. Paste in the Word document.
查看更多
疯言疯语
6楼-- · 2020-03-17 04:29

We had similar problem trying to copy/paste cyrillics from a PDF file into Excel.

The easiest solution we found was to open the .pdf with a browser (Chrome, Mozilla or Opera) and copy/paste the text in Word, Excel.

It didn't work with IE, as expected.

查看更多
forever°为你锁心
7楼-- · 2020-03-17 04:34

I had the same problem but I solved it by opening the PDF file with the web-browser (chrome in my case). Copy-and-pasting non-ASCII encoding works fine in chrome.

查看更多
登录 后发表回答