Converting Pdf to text File

0bb4628e217fd43ac86ac9294 · December 26, 2023, 6:02am

Hi all,
I am new here. I am working on a process where I need to convert a pdf file with 5 to7 pages to text file.

I am using Tesseract OCR but there is stamp in one or two pages and the words in the stamp are also getting extracted which I don’t want.

Can anyone help me with this please.

pavan_kumar5 · December 26, 2023, 6:10am

Hi @0bb4628e217fd43ac86ac9294 ,

Can you please share the sample pdf file so will try to share the solution.

From Pdf we need to extract specific data or entire data from pdf?

Regards,
Pavan Kumar

0bb4628e217fd43ac86ac9294 · December 26, 2023, 6:11am

Hi @pavan_kumar5
We need to extract entire data

mkankatala · December 26, 2023, 6:11am

Hi @0bb4628e217fd43ac86ac9294

After extracting the data from pdf file store it in a String Variable. After that use the replace function to replace the stamp words with the Empty. Then Write the String Variable data to the Text file.

Hope it helps!!

vrdabberu · December 26, 2023, 6:19am

Hi @0bb4628e217fd43ac86ac9294

While using Read PDF with OCR after storing the output into text file if the stamp words are static you can use Replace function to replace the text and store it in the same variable. After that, you can write that output variable into text file using Write Text file activity.

Regards,

sanjay3 · December 26, 2023, 9:04am

Hi @0bb4628e217fd43ac86ac9294

In the properties of Tesseract OCR there is something called Allowed Characters and Denied Characters

Hope this helps

Nguyen_Van_Luong1 · December 26, 2023, 12:00pm

Hi @0bb4628e217fd43ac86ac9294 ,
You can try

or

regards,

Topic		Replies	Views
Convert PDF to Text File Activities uiautomation , studio , question , activities_panel	8	251	December 28, 2023
Convert pdf to textfile Activities pdf , studio , question , activities_panel	10	726	December 26, 2023
Looping pdf files in the folder and extracting particular data from each pdf file Help	9	3779	October 17, 2019
About OCR Engines Activities ocr , activities , question	8	1387	July 4, 2023
Screenshot pdf data extraction Studio studio	16	487	January 18, 2024

Most Active Users - Yesterday
Anil_G
mkankatala
V_Roboto_V
avinashy
Vhierdy_Hafidz
Simon1
SenorChang
Llessur
postwick
sharu_priya
More details...

Converting Pdf to text File

Related topics