When trying to OCR a pdf file, I got an error message that "Acrobat could not perform recognition (OCR) on this page because This page contains renderable text".
I search what is renderable text and there was an old post explaining that renderable text is vector format shapes over an image, link.
when i copied the text from the file to a word document i would only get illegible charecters.
is it posible to extract the font in a renderable text pdf or extract the vector shapes and make from that a font?
21 Answer
PDF is not a "document" format, PDF is a format to render a print page. It contains commands that say what graphical elements to put where on the page. Many of these elements are glyphs (elemental symbols that make up letters) coming from some fonts. These glyphs may or may not (directly) correspond to some sequence of characters you can copy and paste, or put into a word document. The PDF can contain additional tables for the fonts that describe this correspondence.
The graphical elements can also be images, for example an image of a scanned page.
So if Acrobat says "this page contains renderable text", it means "this page is not an image of a scanned page. It's a collection of glyphs. I cannot OCR those, because it's not an image".
When you copy it to a word document, and the correspondence tables are missing, or the encoding is non-standard, then the result is gibberish, because there's no way for the computer to guess which glyph or glyph combination stands for which character.
is it posible to extract the font in a renderable text pdf or extract the vector shapes and make from that a font?
Yes, it's easy to extract the font file. Have e.g. a look at mutools. You still need an application that can deal with that font file.