Extracting text between two Str Delimiters

Hi @PAD

Moving onto the next issue. I am extracting phone number. Similar to extracting the email in first step. But second step I don’t have an “@” to play with.

I have split the string after third occurrence of the word “Mobile”. So my outputs either start with phone number “05650456654 random text…” or “random text…” as there is no number present.

First thing i need to do is check if there is a number present at index 6 of the string then return a Boolean. Please advise

Second step will be to extract the phone number, I can do this by splitting the string again with " " delimiter then using index(0), this will be fine.

Thanks

Its ok, I have done it. Just a lot of string splitting. The joys of unstructured data/text!!

a simpler approach would be to use Abby Flexi Engine and Studio to create templates. Then you won’t need to do string manipulation and directly get the field values in excel under your specified column names :slight_smile:

Don’t tell me this now :slight_smile:

Will look into that.

I’m learning some stuff which is most important. And i will be glad when its done!

its a time vs cost scenario. If your bussiness needs the automation fast, better approach is Abby. If you have the leisure of time… then ofcourse Regex :slight_smile:

Apart from the matter of time, what might affect our choice of a solution is the fact that Abbyy FlexiCapture requires a separate license (to my best knowledge) :grinning:

@MarkC1500, I am very glad that you have managed to combine my suggestions into something that works for you! As the problem with extracting text between two string delimiters seems to be solved now, would you mark any of my posts that you found useful as a solution? That will (among others, of course :wink:) help in future searches on this forum :slight_smile:

I’ve got finereader but not flexi capture

Won’t be paying for anything at this time.

Of course I will mark your posts.

No worries, if you have any more issue, just let us know - personally, I will be eager to help if I can.

I’m sure I will very soon. :smile:

:smile: Looks that you have gone far since yesterday, so you will be more than fine!

1 Like

@PAD

I’m looking right back to the beginning to see if I can get a better read from the pdf in the first place.

I have noticed the original text is structured with paragraphs etc. After removing all special characters of goes into one line. Anyway to Fox this?

What text structure would you want to get? I understand that you are just extracting data from this text, so do you need these paragraphs? What structure were you getting previously? What was the issue with it?

To be honest the structure probably won’t effect the end result. Just easier to read while I am diagnosing each issue. And I may try to use the special characters to help me get some of the data I need as it produces extra breaks. I have run into a problem on one piece of data where the data I need is always between two words but there is other data there too that could be before or could be after the text that I want, no fixed length of strings or string count to use to pinpoint my data.

What “other data” are you looking for? Is there any way you can distinguish it from the text it is before or after? If we can somehow pinpoint that “core” your data precedes or follows… Or do you just know it from the text analysis not available to a robot? Can you give an example of such text (can be surely something made up, just to present the pattern).

“Title deed number 355566 “text I want” 5557 Area”

The text I want is always between “title deed number and “area”

Now there may or may not be numbers (or text) either side if the text I want, random amount of spaces between data on different pdf extracts etc. My data may also contain numbers.

I’m looking at the data with the special characters (Arabic) and it will give me many more options to split the data where I want.

I copy the special characters to use in the matches process but doesn’t like it.

Ok, I have added : into my allowed characters. This may help.

OK, as for Arabic, here is what might help - in case you still haven’t installed it:

As for the regex, give me a moment… but just to confirm - in the given example would you want to extract “5557” together with the text you want or not (as you have stated: “The text I want is always between “title deed number and “area””)? If not, is this unwanted part that precedes “Area” always going to be one string not separated by spaces (no matter if containing digits with letters or solely any of them)? Is the unwanted part right after the expression “Title deed number” always going to be such string with no spaces too?Your string example:
“Title deed number 355566 “text I want” 5557 Area”

Thanks

I would just want the “the text i want” and unfortunately what follows could be number or text and could be more than one string!

But I think the : will help me. Working on it now

Not sure why this has started happening,

This part of the sequence finds whether there is data in the txt (pdf) or not. If not then moves the file. But for some reason now it has continued to extract further data from later sequences from that file ?!!