How to Recognize Text in a Scanned PDF (OCR)

A scanned PDF is a set of images. To select, search, or copy text, you need OCR. After processing, an invisible text layer is added on top of the image.

OCR analyses the pixel layout of a page and identifies characters. Recognition quality depends on scan clarity: 200–300 DPI is sufficient; text must be straight and high-contrast. Skewed or overexposed pages will be recognized less accurately.

After processing you get a PDF with 'dual' content: the page image remains, but a text layer sits beneath it. This lets you search (Ctrl+F), copy snippets, and allows screen readers to read the document aloud.

The tool supports Russian and English. For mixed-language documents, selecting the primary language is usually sufficient for accurate recognition.

Run OCR