PDF: Get text activity selecting entire page

Rajesh_Shet · May 2, 2022, 5:59pm

Hello All,

I am trying to develop a PDF automation, I have like 5 types of pdf documents. Out of which 1 of the document is not able to pick text element, instead selecting whole page when get text activity is used and indicated.

The text can be copied directly, it is not hand written or image, it is digital only. But Uipath is not able to pull the text, i don’t know why , can anyone help??

P.s : other 4 documents work fine, I can pick individual elements, only this document is bit of hassle.

Any help is appreciated.

Nithinkrishna · May 2, 2022, 6:02pm

Hey @Rajesh_Shet

I hope you would have already set the accessibility settings as recommended by UiPath to do PDF scraping operations.

Also, just curious to know why you are going with UiPath instead of native PDF automation which is background with good accuracy.

Thanks
#nK

Srini84 · May 2, 2022, 6:14pm

@Rajesh_Shet

Check below link for your reference

Hope this may help you

Thanks

Rajesh_Shet · May 2, 2022, 6:28pm

Thanks so much @Nithinkrishna , yes I have set recommend accessible settings. I’m not sure, I was thinking of using achor base and get text.

What would you suggest me? Document understanding is out of option. I am bit new to Regex also.

Nithinkrishna · May 2, 2022, 6:29pm

Hey @Rajesh_Shet

Could you please show a PDF and also show what fields to extract will be easy to suggest?

Thanks
#nK

Rajesh_Shet · May 2, 2022, 6:29pm

Thank you very much @Srini84 , I will check and that get back you.

ushu · May 2, 2022, 6:31pm

@Rajesh_Shet Did you try with screen scraping . Also, give a try with CV activities

Rajesh_Shet · May 2, 2022, 6:31pm

Umm Well the file is a invoice, containing order related details, it’s actually confidential, sorry to say but i will not be able to share that.

Rajesh_Shet · May 2, 2022, 6:32pm

This sounds interesting, I’ll give a try for sure @ushu . Thank you so much.

Nithinkrishna · May 2, 2022, 6:32pm

Hey @Rajesh_Shet

It’s okay if you can’t share the original file.

But just the simulated one should be fine !

Thanks
#nK

Rajesh_Shet · May 2, 2022, 6:46pm

Sure, I’ll try to get that. Thanks for your amazing support though @Nithinkrishna

Rajesh_Shet · May 3, 2022, 3:02pm

Hi @ushu , I think computer vision stores screenshot for Development or analysis purpose I saw a pop-up while added scope, as the documents are confidential, think it is not a good match in this case.

Rajesh_Shet · May 3, 2022, 3:04pm

Hi @Srini84 , checked the accessibility settings, doesn’t seem to work. It still highlights whole page as element.

Rajesh_Shet · May 3, 2022, 3:06pm

Hi @Nithinkrishna , by simulated means, what exactly do I need to share? Because i do not have dummy data file, even the samples provided are actual data and they are confidential.

Nithinkrishna · May 3, 2022, 3:17pm

Hey @Rajesh_Shet

Just open your PDF with Word → Update with dummy values → Save as PDF and please share

Thanks
#nK

Rahul_Unnikrishnan · May 3, 2022, 4:08pm

Hello @Rajesh_Shet ,

Did you tried Read PDF file or Read PDF with OCR activities???

I think you can use this and split it with Regex to extract the values of the files are of standard format. It means if the labels are stable.

Rajesh_Shet · May 4, 2022, 3:26am

Hey @Nithinkrishna,

Yes, But sorry to say, I cannot transfer any files, it’s on clients system, moving files or uploading is not allowed.

Rajesh_Shet · May 4, 2022, 3:30am

Yes I think this is the only way. Right now I’m using excel, I copy PDF text and paste special values using hotkey and loop through each rows and use string manipulation to get the value.

I’m bit new to Regex , so I thought of giving a try to see if any new methods are there.

Thanks for the help @Rahul_Unnikrishnan

Rahul_Unnikrishnan · May 4, 2022, 3:33am

Yes, I think Regex can work better on your scenario.

system · June 7, 2022, 12:44pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PDF get text is not finding Activities pdf , activities , question	4	898	September 13, 2021
UiPath not recognizing elements in Adobe Acrobat Reader 22.3.2031.0 Studio studio , question , activities_panel	5	1220	March 8, 2023
SELECTOR is NOT WORKING for TARGET TEXT Activities uiautomation , activities , question	9	970	November 28, 2022
Unable to select Individual components in PDF Activities pdf , activities , question	7	1362	July 13, 2021
Click and Drag Help activities	4	1229	January 30, 2019

PDF: Get text activity selecting entire page

Related topics