Extract specific text within a string

Hello everyone,

There is some way to extract certain text (which may vary) but that is always surrounded by a text that is standard. This text comes from a pdf file and the text is extracted through OCR and is stored in a variable string from which I want to extract the data.

Here are some examples (this is in Spanish):
-Por medio de la presente Yo, Juan Perez(Is a Name, this is one data that i want) residente de un país…
-…como resultado de dicho siniestro
/n/n(the ocr dont bring the enter)
Reclamo: AUTO-12345-2019(This Part i want)/n
Poliza:…
-…efectuados por Aseguradora Pais, S.A. en el Taller, S.A.(this part i want) al vehiculo de mi propiedad…

Thank you all

There are several ways to do it, but they might be not very easy to learn on your own without programming background, but here is one learning site for you.

1 Like

@askPWC I would recommend using 3 separate regex to extract the text you want. I generally find regex to be the best for extracting a text when it immediately follows a standard text

Hi,if that was my idea but I have no knowledge in regex.

Thanks

Hi, and with your way of thinking, which form would be the most optimal to extract the text.
I have a low level in programming.

Thanks :slight_smile:

I would use string manipulation, but before that you need to detect your patterns, like in:
presente Yo, Juan Perez
FullTextVariable.IndexOf("Yo, ") would give you the position right before the name, but what will identify the position right AFTER that name?
If you have those answered, you can find the name using Substring function…

Ok, I have what goes before and after, but then how do I apply the substring function.

In the first example, what goes before is “present I” and what goes after is “resident of a”.

Thanks :slight_smile:

@askPWC - you can learn regex to a passable knowledge within a few hours. I would recommend reading through https://www.regular-expressions.info/ which does a great job of explaining regex, then use .NET Regex Tester - Regex Storm to test everything (they have a decent quick-reference on that site as well).

In would usually not use substring as it is inefficient and anytime you use it, I’ve found regex works faster and with less code. There are use cases for everything of course though :slight_smile:

To get you started, you can identify the before using a regex “positive lookbehind” (?<=“present I”) and you can identify what goes after with a regex “positive lookahead” (?=“resident of a”). Then regex will pull out whatever you sandwich between them. If you want everything, that expression would then look like: (?<="present I").+(?="resident of a")

Then for that case would be in Assign activity for lets say txtName variable:
txtName = FullTextVariable.Substring(FullTextVariable.IndexOf(“present I”)+1, FullTextVariable.IndexOf(“resident of a”)-FullTextVariable.IndexOf(“present I”)-1)
or something very close to it

1 Like

and i do agree that after you learn regex, the expression will be easier to read/create, BUT i can assure you that if are going to use the expression only a few times, substring is much faster, because regex have to “compile” before used… also you are giving him an expression, but for sure he would not know what to do with it…

1 Like

Main.xaml (4.9 KB)

You can try this

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.