PDF: Get text activity selecting entire page

Hello All,

I am trying to develop a PDF automation, I have like 5 types of pdf documents. Out of which 1 of the document is not able to pick text element, instead selecting whole page when get text activity is used and indicated.

The text can be copied directly, it is not hand written or image, it is digital only. But Uipath is not able to pull the text, i don’t know why , can anyone help??

P.s : other 4 documents work fine, I can pick individual elements, only this document is bit of hassle.

Any help is appreciated.

1 Like

Hey @Rajesh_Shet

I hope you would have already set the accessibility settings as recommended by UiPath to do PDF scraping operations.

Also, just curious to know why you are going with UiPath instead of native PDF automation which is background with good accuracy.

Thanks
#nK

@Rajesh_Shet

Check below link for your reference

Hope this may help you

Thanks

Thanks so much @Nithinkrishna , yes I have set recommend accessible settings. I’m not sure, I was thinking of using achor base and get text.

What would you suggest me? Document understanding is out of option. I am bit new to Regex also.

2 Likes

Hey @Rajesh_Shet

Could you please show a PDF and also show what fields to extract will be easy to suggest?

Thanks
#nK

Thank you very much @Srini84 , I will check and that get back you.

@Rajesh_Shet Did you try with screen scraping . Also, give a try with CV activities

Umm Well the file is a invoice, containing order related details, it’s actually confidential, sorry to say but i will not be able to share that.

1 Like

This sounds interesting, I’ll give a try for sure @ushu . Thank you so much.

Hey @Rajesh_Shet

It’s okay if you can’t share the original file.

But just the simulated one should be fine !

Thanks
#nK

Sure, I’ll try to get that. Thanks for your amazing support though @Nithinkrishna

1 Like

Hi @ushu , I think computer vision stores screenshot for Development or analysis purpose I saw a pop-up while added scope, as the documents are confidential, think it is not a good match in this case.

Hi @Srini84 , checked the accessibility settings, doesn’t seem to work. It still highlights whole page as element.

Hi @Nithinkrishna , by simulated means, what exactly do I need to share? Because i do not have dummy data file, even the samples provided are actual data and they are confidential.

1 Like

Hey @Rajesh_Shet

Just open your PDF with Word → Update with dummy values → Save as PDF and please share

Thanks
#nK

Hello @Rajesh_Shet ,

Did you tried Read PDF file or Read PDF with OCR activities???

I think you can use this and split it with Regex to extract the values of the files are of standard format. It means if the labels are stable.

Hey @Nithinkrishna,

Yes, But sorry to say, I cannot transfer any files, it’s on clients system, moving files or uploading is not allowed.

Yes I think this is the only way. Right now I’m using excel, I copy PDF text and paste special values using hotkey and loop through each rows and use string manipulation to get the value.

I’m bit new to Regex , so I thought of giving a try to see if any new methods are there.

Thanks for the help @Rahul_Unnikrishnan

Yes, I think Regex can work better on your scenario.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.