How to cut-paste from PDF with non-ASCII encoding?

I have some PDFs and I am trying to cut and paste text they contain from Acrobat Reader into an HTML form. It seems that some of these files use (I suspect) unicode for text encoding, so when I try to paste into the HTML form (on firefox) I get the little boxes with hex chars in them rather than readable text. The problem is not that the PDF has not been OCRed -- when I try to do that in Acrobat Pro it says it can't because the file already contains renderable text. Is there any way to deal with this? For example could I add some sort of javascript to the form that would do conversion?

标签： pdf unicode acrobat

7条回答

地球回转人心会变

2楼-- · 2020-03-17 04:16

It is quite possible that the text contains characters that get copied correctly but your browser is unable to display them, due to lack of suitable font. A PDF document may contain embedded fonts, so Adobe Reader displays the characters OK, but a browser lacks access to those fonts.

You can check whether this is the reason by trying to copy and paste the characters here (it might be useful info about the problem anyway). You could also download and install the Code200x fonts, which contain pretty much any character you can normally expect to encounter. (It is not guaranteed, but probable, that Firefox will be able to use those fonts automatically when needed.)

0人赞添加讨论(0) 举报

Fickle 薄情

3楼-- · 2020-03-17 04:19

I have the same problem... Indeed it is explained here: http://forums.adobe.com/thread/915012

My solution was to convert the pdf to Word using the Exporting Tool of Acrobat and then extract the information I need from it.

It's frustrating but that work.

Another solution that I find is to convert the pdf in images (jpeg, png, etc) and then run an OCR process.

0人赞添加讨论(0) 举报

姐就是有狂的资本

4楼-- · 2020-03-17 04:22

You can export from acrobat as jpeg, then open the jpeg in acrobat (not reader) then run the OCR tool. From there you should be able to copy/paste.

0人赞添加讨论(0) 举报

smile是对你的礼貌

5楼-- · 2020-03-17 04:24

Select the text in Acrobat.
Right-click and select "Copy with formatting" from the context menu.
Wait for the progress bar to process the text.
Paste in the Word document.

0人赞添加讨论(0) 举报

疯言疯语

6楼-- · 2020-03-17 04:29

We had similar problem trying to copy/paste cyrillics from a PDF file into Excel.

The easiest solution we found was to open the .pdf with a browser (Chrome, Mozilla or Opera) and copy/paste the text in Word, Excel.

It didn't work with IE, as expected.

0人赞添加讨论(0) 举报

forever°为你锁心

7楼-- · 2020-03-17 04:34

I had the same problem but I solved it by opening the PDF file with the web-browser (chrome in my case). Copy-and-pasting non-ASCII encoding works fine in chrome.

0人赞添加讨论(0) 举报

1 2 下一页

How to cut-paste from PDF with non-ASCII encoding?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间