Convert PDF to Text File

0bb4628e217fd43ac86ac9294 · December 28, 2023, 7:04am

Hi all,
I am stuck at one point please help me.

I have a pdf file with 4 to 5 pages. I need to read all the pages and convert it to .txt file. There is a stamp in the pdf file as well.

Please help me

Yoichi · December 28, 2023, 7:12am

Hi,

How about using ReadPdfText activity or ReadPDFWithOcr activity?

Regards,

0bb4628e217fd43ac86ac9294 · December 28, 2023, 7:16am

HI @Yoichi Thank you for the quick reply.
I have used ReadPDF activity but the output is coming empty.
I have also tried with Read PDF with OCR activity and I have used Tesseract OCR engine, I am getting the output but it is not as expected.

As there is a stamp at the end of the page unwanted commas, fullstops, numbers and alphabets are coming

mkankatala · December 28, 2023, 7:17am

Hi @0bb4628e217fd43ac86ac9294

You can use the Read PDF File activity to read the Structured files.
You can use the Read PDF with OCR Activity to read the Unstructured files.

The Output of these activities is String. Then you can write the String data to notepad by using Write Text file activity.

Hope it helps!!

Nguyen_Van_Luong1 · December 28, 2023, 7:18am

Hi @0bb4628e217fd43ac86ac9294 ,
You can try

or

Regards,

mkankatala · December 28, 2023, 7:19am

Each OCR will extract in different formats. If the format is the proper for you with the Tesserract OCR then change the scale value of OCR in the properties from 0 to 5.

Change the scale for every run until you get the proper output as expected.

Hope it helps!!

Yoichi · December 28, 2023, 7:20am

It seems the pdf contains not text but image.

It may be better to use OminPage OCR or Coud OCR such as UiPathDocumentOCR, Google Cloud Vision OCR, Azure Computer Vision OCR etc.

Regards,

0bb4628e217fd43ac86ac9294 · December 28, 2023, 7:32am

@Yoichi How do I get the API keys for OminPage OCR or Coud OCR such as UiPathDocumentOCR, Google Cloud Vision OCR, Azure Computer Vision OCR

Yoichi · December 28, 2023, 7:41am

HI,

We can use OmniPageOCR without API key. Please check the following document.

Google Cloud Vision or Azure Computer Vision OCR are required API key to use. Please check web site of each service. (It can be used free of charge up to a certain amount.）
If you use Community Edition, UiPathDocumentOCR also can be used with free.

Regards,

Topic		Replies	Views
Converting Pdf to text File Activities pdf , studio , question , activities_panel	6	490	December 26, 2023
Convert pdf to textfile Activities pdf , studio , question , activities_panel	10	841	December 26, 2023
About OCR Engines Activities ocr , activities , question	8	1400	July 4, 2023
OCR Without Extracting Data Help activities	2	906	March 7, 2019
UiPath Tutorial \|\| Day 58 : Read PDF with OCR Activity \|\| PDF Automation Activities Video Tutorials uiautomation , studio	0	731	March 17, 2021

Convert PDF to Text File

Related topics