There is some way to extract certain text (which may vary) but that is always surrounded by a text that is standard. This text comes from a pdf file and the text is extracted through OCR and is stored in a variable string from which I want to extract the data.
Here are some examples (this is in Spanish):
-Por medio de la presente Yo, Juan Perez(Is a Name, this is one data that i want) residente de un país…
-…como resultado de dicho siniestro
/n/n(the ocr dont bring the enter)
Reclamo: AUTO-12345-2019(This Part i want)/n
Poliza:…
-…efectuados por Aseguradora Pais, S.A. en el Taller, S.A.(this part i want) al vehiculo de mi propiedad…
There are several ways to do it, but they might be not very easy to learn on your own without programming background, but here is one learning site for you.
@askPWC I would recommend using 3 separate regex to extract the text you want. I generally find regex to be the best for extracting a text when it immediately follows a standard text
I would use string manipulation, but before that you need to detect your patterns, like in:
presente Yo, Juan Perez
FullTextVariable.IndexOf("Yo, ") would give you the position right before the name, but what will identify the position right AFTER that name?
If you have those answered, you can find the name using Substring function…
@askPWC - you can learn regex to a passable knowledge within a few hours. I would recommend reading through https://www.regular-expressions.info/ which does a great job of explaining regex, then use .NET Regex Tester - Regex Storm to test everything (they have a decent quick-reference on that site as well).
In would usually not use substring as it is inefficient and anytime you use it, I’ve found regex works faster and with less code. There are use cases for everything of course though
To get you started, you can identify the before using a regex “positive lookbehind” (?<=“present I”) and you can identify what goes after with a regex “positive lookahead” (?=“resident of a”). Then regex will pull out whatever you sandwich between them. If you want everything, that expression would then look like: (?<="present I").+(?="resident of a")
Then for that case would be in Assign activity for lets say txtName variable:
txtName = FullTextVariable.Substring(FullTextVariable.IndexOf(“present I”)+1, FullTextVariable.IndexOf(“resident of a”)-FullTextVariable.IndexOf(“present I”)-1)
or something very close to it
and i do agree that after you learn regex, the expression will be easier to read/create, BUT i can assure you that if are going to use the expression only a few times, substring is much faster, because regex have to “compile” before used… also you are giving him an expression, but for sure he would not know what to do with it…