Get text from PDF span in two rows on the same cell

RPA_Dev09 · July 2, 2020, 9:53am

Hi Guys am getting an issue were i need to get text from pdf. It gets the first row in the cell, but the infomation extend to the second row. I used: System.Text.RegularExpressions.Regex.Match(DataFound,“(?<=Customer:).*(?=Company:)”).value
But for some reason the other information is not extracted.

Thanks in advance

Pratik_Wavhal · July 2, 2020, 10:21am

Hi @RPA_Dev09

May you show the actual input n what you are trying to do via sharing through Screenshot

Happy Automation

Best Regards
Er Pratik Wavhal

RPA_Dev09 · July 2, 2020, 11:15am

Hi @Pratik_Wavhal

Please see below the screen print similar to the PDF file am working with. System Developer is not being returned using the previously mentioned Regex expression uipath pdf

Regards
RPA_Dev09

RPA_Dev09 · July 2, 2020, 11:21am

Ignore the “Name” on “Company Name:”

Pratik_Wavhal · July 2, 2020, 11:31am

Hi @RPA_Dev09

Below is the Screenshot for the same

Hope this may help to solve your issue
Mark as solution if this helps you and like it

Happy Automation

Best Regards
Er Pratik Wavhal

RPA_Dev09 · July 2, 2020, 11:38am

Am getting blank results.

Pratik_Wavhal · July 2, 2020, 11:45am

Hi @RPA_Dev09

Have you set the flag to Single as below ??
If yes then it should work

Hope this may help to solve your issue
Mark as solution if this helps you and like it

Happy Automation

Best Regards
Er Pratik Wavhal

RPA_Dev09 · July 2, 2020, 12:08pm

Am using a assign Activity. Does that mean i need to install the Regex package

Pratik_Wavhal · July 2, 2020, 12:24pm

Hi @RPA_Dev09

Yes
Bcz it gives you the UI for setting the flag as i have shown above.

So for that use the activity “Matches”.

Hope this may help to solve your issue
Mark as solution if this helps you and like it

Happy Automation

Best Regards
Er Pratik Wavhal

RPA_Dev09 · July 2, 2020, 12:51pm

Ok Thanks. Which package did you install as am getting errors on System.Text.RegularExpressions.Regex by Microsoft

Thanks in advance.

Pratik_Wavhal · July 2, 2020, 12:59pm

Hi @RPA_Dev09

UiPath.Core.Activities.Matches

Hope this may help to solve your issue
Mark as solution if this helps you and like it

Happy Automation

Best Regards
Er Pratik Wavhal

RPA_Dev09 · July 2, 2020, 1:34pm

Thanks @Pratik_Wavhal how ever am not getting the correct results. Please share the sequence you used to test on your side using the example pdf screenshot

indra · July 2, 2020, 1:49pm

@RPA_Dev09 Share the extracted example string and output you are expecting from that example string

bcorrea · July 2, 2020, 2:02pm

Please create your topic for this…

RPA_Dev09 · July 2, 2020, 2:19pm

I can’t share the exact PDF company policy. But the screenshot attached is similar to the PDF am reading. Am using Read PDF text activity. I want to get “John Smith X System Developer”. But using System.Text.RegularExpressions.Regex.Match(DataFound,“(?<=Customer:).*(?=Company:)”).value am only getting “John Smith X”. Hope the information provided is clear.

Thanks

RPA_Dev09 · July 2, 2020, 2:21pm

“DataFound” being the output of the ReadPDFText activity.

Pratik_Wavhal · July 3, 2020, 7:55am

Hi @RPA_Dev09

Actually you are working on Original PDF so you can preserve the format while reading PDF.
But in my case you shared the Img for that data. So working on it with OCR while screen scrapping the data wont be der in the same format as it is der in the img. The data gets scribbled and output comes in single line as i have shown you below.

So i myself have write the data in same format on the Regex editor as it is der in img which you shared and then applied the regex on it. So then it work for me that i have already showed you in earlier posts.

In that way i showed you the output that work wid me. If I have the PDF then only i can make workflow.
Hope you got it what i am saying.

Happy Automation

Best Regards
Er Pratik Wavhal

RPA_Dev09 · July 3, 2020, 9:08am

Hi @Pratik_Wavhal

I got what you said. i have recreated the PDF file that is similar to what am working with. I tested it, still get the same results as mentioned before. Tried to upload the file but am restricted. Please use the google drive link to get the file.

Thanks in advance.

bcorrea · July 3, 2020, 3:06pm

Your case may be easier to automate a pdf reader instead of reading it as text:

RPA_Dev09 · July 3, 2020, 3:14pm

Hi @bcorrea

Which activity did you use for this?

Topic		Replies	Views
Get text using Regex Activities pdf , activities , question	7	1098	June 12, 2022
Regex to extract data including the nextline? Studio uiautomation , pdf , activities , data_scraping , string , question	3	930	September 1, 2020
PDF Search Automation Robot robot , question	30	986	April 29, 2023
Building Datable with RegEx Studio	7	835	September 14, 2020
Get first bit of text on line 8 Studio studio , question , variables_management	4	905	May 4, 2022

Get text from PDF span in two rows on the same cell

Related topics