Extracting text between two Str Delimiters

@MarkC1500, I am very glad that you have managed to combine my suggestions into something that works for you! As the problem with extracting text between two string delimiters seems to be solved now, would you mark any of my posts that you found useful as a solution? That will (among others, of course :wink:) help in future searches on this forum :slight_smile:

I’ve got finereader but not flexi capture

Won’t be paying for anything at this time.

Of course I will mark your posts.

No worries, if you have any more issue, just let us know - personally, I will be eager to help if I can.

I’m sure I will very soon. :smile:

:smile: Looks that you have gone far since yesterday, so you will be more than fine!

1 Like

@PAD

I’m looking right back to the beginning to see if I can get a better read from the pdf in the first place.

I have noticed the original text is structured with paragraphs etc. After removing all special characters of goes into one line. Anyway to Fox this?

What text structure would you want to get? I understand that you are just extracting data from this text, so do you need these paragraphs? What structure were you getting previously? What was the issue with it?

To be honest the structure probably won’t effect the end result. Just easier to read while I am diagnosing each issue. And I may try to use the special characters to help me get some of the data I need as it produces extra breaks. I have run into a problem on one piece of data where the data I need is always between two words but there is other data there too that could be before or could be after the text that I want, no fixed length of strings or string count to use to pinpoint my data.

What “other data” are you looking for? Is there any way you can distinguish it from the text it is before or after? If we can somehow pinpoint that “core” your data precedes or follows… Or do you just know it from the text analysis not available to a robot? Can you give an example of such text (can be surely something made up, just to present the pattern).

“Title deed number 355566 “text I want” 5557 Area”

The text I want is always between “title deed number and “area”

Now there may or may not be numbers (or text) either side if the text I want, random amount of spaces between data on different pdf extracts etc. My data may also contain numbers.

I’m looking at the data with the special characters (Arabic) and it will give me many more options to split the data where I want.

I copy the special characters to use in the matches process but doesn’t like it.

Ok, I have added : into my allowed characters. This may help.

OK, as for Arabic, here is what might help - in case you still haven’t installed it:

As for the regex, give me a moment… but just to confirm - in the given example would you want to extract “5557” together with the text you want or not (as you have stated: “The text I want is always between “title deed number and “area””)? If not, is this unwanted part that precedes “Area” always going to be one string not separated by spaces (no matter if containing digits with letters or solely any of them)? Is the unwanted part right after the expression “Title deed number” always going to be such string with no spaces too?Your string example:
“Title deed number 355566 “text I want” 5557 Area”

Thanks

I would just want the “the text i want” and unfortunately what follows could be number or text and could be more than one string!

But I think the : will help me. Working on it now

Not sure why this has started happening,

This part of the sequence finds whether there is data in the txt (pdf) or not. If not then moves the file. But for some reason now it has continued to extract further data from later sequences from that file ?!!

What further data does it extract? Do you mean that “Name Arabic:” is not working as a delimiter any more?

After name I’m then extracting email address, phone number, project name, property number, and some more. Each with its own difficulties.

But the name extraction is fine. Just in that sequence above, which is the first sequence, if there is no name then i want to move the file and move onto the next. But I seems to be moving the file but still taking the data and putting it into excel

And what is your condition in IF? That GoodPDForNot? Also, perhaps it starts making sense for your to either begin posting each issue separately to let others keep the track and help you, or to send them via messages to me :smile:

Haha.

If that text doesn’t exist in the txt file then there is no data in it and I don’t want to process it any further and move it to another folder

From the most recent changes you have mentioned adding a colon into your allowed characters - perhaps something with it is affecting your condition… Would it work for you to re-write the condition to: ‘PDFText.Contains(“SECOND PARTY”) and PDFText.Contains(“BUYER”) and PDFText.Contains(“NameEnglish”)’ - if “NameEnglish” is really the text you are looking for and not some your string variable name - then it would be put without these quotation marks. Please see the syntax advised below:

Hi @PAD

Annoyingly this became very easy once I had put the colons in. All done now apart from the move file problem

The sequence does detect whether there is data or not and moves the file accordingly but then continues the work flow with that file extracting data and putting it into excel. I want to move file then stop processing it