Read PDF FILE and get text

Both showing same output

whether your PDF is scanned one? @Kuldeep_Pandey

I don’t think so it give you the incorrect output.

Share the XAML file

Hello @Kuldeep_Pandey

Are you using the native pdf or the scanned pdf?

If its a native pdf, then you can use

Read pdf text activity

. Else if it is a sanned pdf, you can try using

Read Pdf With Ocr

If its not working with one ocr, try changing the ocr engine and check the output.

Thanks

Hi @Kuldeep_Pandey

If you need only month and year … then use read pdf text and once all the text is there use month = str.Split(“Month”,2,stringsplitoptions.none)(1).trim .Split({Environment.Newline(),” “},2,stringsplitoptions.none)(0)

Replace month with year and repeat same

Cheers

I Have to put this in the read pdf activity properties of range?

Pls tell me the properties of ocr engine and ocr activity

Hi @Kuldeep_Pandey

In read pdf text five the putput variable as str and then use assign activity and add a string variable and add thia as the value

In the formula str is the string variable where the output of read pdf is stored

Range is for specifying how many pages you want to read feom pdf

Cheers


Its giving same experssion as output not giving text

HI @Kuldeep_Pandey

What is the split condition ?

image

After Read PDF Text activity can you Use Write Text File activity to check the value are printing correctly?

Regards
Gokul

“Month = Data1.Split(“Month”,2,stringsplitoptions.none)(1).trim .Split({Environment.Newline(),” “},2,stringsplitoptions.none)(0)” condition
And its not giving correct output write text activity

Can you please share the sample input for the Read PDF Text activity? @Kuldeep_Pandey

image
I cant share full pdf with you

HI @Kuldeep_Pandey

You can try with Regex Expression instead of Split?

System.Text.RegularExpressions.Regex.Match(Data1,"(?<=Month\s+)\S.*").Tostring

Output -> October
System.Text.RegularExpressions.Regex.Match(Data1,"(?<=Year\s+)\d+").Tostring

Output -> 2022

image

Hi @Kuldeep_Pandey

Please remove month equals and the inverted commas. And give only the part starting from data1

Cheers

image
I want unit also so whats the expression

Can you type it as a Text here? @Kuldeep_Pandey

Unit /Plant

Hi @Kuldeep_Pandey

How about this expression

System.Text.RegularExpressions.Regex.Match(Data1,"(?<=Unit\s\WPlant\s+)\S.*").Tostring

Output -> Portchester

image

Can You pls tell the explanation of this expression

Check out the Images

image

image

image

image

You can play with this link : https://regex101.com/