Trouble extracting PDF text due to selectors failure

Hi guys, I would really appreciate it if someone could help me with this problem.

Basically, I’m trying to extract the same type of information from a PDF form across a few identical forms. In the blow picture, I’m trying to extract “PS4” as part of the platform type.

Image%201

Here’s how my selector originally looks like.

These are the two changes I made so that this sequence would work for other identical forms.

This is the next form I’m trying to extract platform type from. I closed the ps4 form and opened this xbox form instead.

Image%204

I executed the sequence at it worked the first time!

Image%205

However, I closed the xbox form and reopened the ps4 form to try it again. However, this popped up.

I opened the selector viewer and found it had been invalidated despite it working before and me not having changed it at all since. It’s like my selector is unstable.

Image%207

Can someone please tell me what’s going on? I’ve watched so many selector/pdf extractor videos and I still can’t troubleshoot this issue. What am I doing wrong?

Looking forward to your responses!
Kind regards,
Ben.

Hi @Spacecats7

Do you have read pdf text activity ?

Thanks

Hi Ashwin. Yes I do. Would it be more ideal? In any case I feel like this selector is a really fundamental thing I need to get the hang off. Can you spot what I’m doing wrong?

@Spacecats7 Use read pdf actvity and apply regular expression to get the value Xbox

Hi @Spacecats7

Instead of getting selector u can use that read pdf text activity you need to pass string variable and get the value as Xbox based on regex

The selector is fine but set title as *

Thanks
Ashwin.S

Hi @Spacecats7,

If possible, I recommend to use Read PDF Text. You will get a string in a variable.
In this case, you can user REGEX (https://regex101.com/) or Substring, to get your needed value.

For Regex, an example of the variable:

Platform = System.Text.RegularExpressions.Regex.Match(pdftxt,“Platform:\s(\w.)\n”).Groups(1).Value.ToString

These 2 methods are more robust than using Get Text. Get Text needs to have the PDF open.

I hope it helps.

Vasile.

1 Like

Hi all! Thanks for your swift responses.

I looked into regex. How exactly can I implement regex in UiPath? Is the matches activity the best way? @wasea how can I apply the example regex you provided in your reply?

I’ve attached my flow below. I’m able to extract “Platform: PS4” via the matches activity but now I want to trim it and just have the “PS4” bit of the extraction. Can anyone hopefully edit my xaml file and send it back to me? I’ve read on other threads that I need to use the assign activity to identify group 1 in order to extract the “PS4” part, but I’m getting tripped up by the necessary types and processes of the issue.

Attempt to extract.xaml (6.3 KB)

Thanks everyone!

Hi @Spacecats7,

Check my screenshots below:

And:

Let me know if this is what you’re looking for.

Vasile.

1 Like

Hi Vasile,

Thanks for your input. After some tinkering with the regex provided by you, I managed to get it to work.

Thank you!

@Spacecats7, that’s great. I didn’t know bout RegEx until 2 months ago, and now I prefer to use it instead of substring or so.
I am testing the RegEx here:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.