I want to extract the data from scaned pdf

Hii Team,

I want to Extract the Scaned Pdf using the “Read Pdf with OCR” i want to write Text in same pdf formet but i am not able to find in property “Preserve Formating” like which comes under the Read text file.
so my Question is how i will Extract the scaned pdf in correct formet.
please help me thanks

Hi @Yogeshwar_Singh

Please have a look,


Hi @suraj.setty thanks for Reply
i know that but my Question is when we use the “Read text pdf” in propertis there is option “Preserve Formatting” if it is true then text is Written as pdf same formet in text but when we use “Read Pdf with ocr” in propertis “preserve Formatting” not available so what i need to do for scaned one and text into same pdf formet?

Hi @Yogeshwar_Singh

yes ,there is no such property using “Read Pdf with ocr”, you can try using different OCR engine and check the variations in the output.


@Yogeshwar_Singh For the scanned pdf if you are using Read pdf with OCR, it may vary the format based on the pdf. Also you cannot preserve the formating.

Could you pleasse expalin the actual requirement? After reading the pdf what are you going to do with that data?

Hii @Rahul_Unnikrishnan thanks for reply
I want to use regex and want to need only items data from the pdf

@Yogeshwar_Singh Ok…If you want to extract data based on regex, do you need to preserve the format?

If you just want to extract data with regex, you don’t need to consider the formatting.

@Rahul_Unnikrishnan I atteched pdf highlighed part i need to Extract from “Marks & Nos.” to “TOTAL”
7803136858_DELIVERY_DOCS.pdf (39.1 KB)

in this pdf using Regular Expression and want to Extract the the item part from “Marks & Nos.” to “TOTAL” only

if you are reading with pdf with ocr, whats the output that you are getting?.. Can you share that output here?

@Rahul_Unnikrishnan ok
Exraction omini.txt (1.8 KB)
i got this as output using omini ocr Engine

I think here you can use the Split method to extract the data btw “Marks & Nos.” to “TOTAL”

String Var1=Split(strIn,“TOTAL”)[0]
StringOutVal=Split(Var1,“Marks & Nos.”)[1]