Extracting text between two Str Delimiters

Hi.

I have 20000 pdf’s that i need to extract data from. Unfortunately the ui elements are all over the place so i can just scrape data, which is very annoying.

I have “Read PDF Text” then removed all unwanted characters (as there was arabic in the PDF’s)

I now how a String with full text from the pdf.

I need to extract the text in the string that lies between the words “SECOND PARTY THE BUYERS NameEnglish” & “Name Arabic”.

The data i need is consistently between these two strings.

Thank you

1 Like

Hi @MarkC1500

If you have data between

you can create an array of strings, and assign your string split like the expression bellow

yourArrayOfStrings = yourPdfText.Split({"SECOND PARTY THE BUYERS NameEnglish",“Name Arabic”},StringSplitOptions.None)

Now in yourArrayOfStrings you will have 3 elements your are interested in the second one meaning : yourArrayOfStrings(1)

Regards,
Reda

1 Like
strTest="Name english some text Name arabic"
strExtract=strTest.Substring(strTest.IndexOf("Name english")+12,strTest.IndexOf("Name arabic")-strTest.IndexOf("Name english")-12)

strExtract → “some text”

1 Like

Hi @c.ciprian @reda

I get this error with the second option:

Source: Assign

Message: Length cannot be less than zero.
Parameter name: length

Exception Type: System.ArgumentOutOfRangeException

First option is splitting after first occurrence of “Name Arabic” and finishing at second occurrence of “Name Arabic” in my pdftxt string.

Your guidance would be appreciated.

Thanks again

1 Like

Is the combination
“SECOND PARTY THE BUYERS NameEnglish” & “Name Arabic”.
unique around the string that you want to extract??

Because in this case you want to use RegEx

2 Likes

Hey
I have created a code by filling a form on web and i want to send the code into excel file. This code is in popup window. Can anyone help me?

1 Like

Hi @MarkC1500,
How about the solution below, where the whole input string was: "“SECOND PARTY THE BUYERS NameEnglish ABC XYZ Name Arabic”, and the result I got was “ABC XYZ”.


It is the solution adapted from the one proposed here:

2 Likes

Please find the workflow in here:
string in between.xaml (7.5 KB)

1 Like

Hi,
I am generating a code from a registration page. when i fill all the data in the from after that i got a popup with code. Can anyone tell that how to take that code in excel file?
Please help me as soon as possible.

1 Like

Hi @Mudita123

Have you created a separate thread for this?

3 Likes

no, how do i create this?

1 Like

Hi @Mudita123,
Just go to the right category (most likely “Rookies”) and create a post with “New Topic” - make sure it has a good title and you provide enough info to help you asap :slight_smile:

1 Like

I had created it.
Thanks

1 Like

no problem, @Mudita123

1 Like

can you go through the topic name “Extracting data from web into excel”? Please help me out!!

1 Like

Been Hijacked a Little bit :man_shrugging:

Thank you for your help guys.

@PAD I used your solution in the end. needed to do some tweeking and exceptions for files with no data etc. That’s the “Name” extracted…

I have set up another work flow to extract the email address. This is all fine.

But now i have 1 foreach in workflow 1 for name and 1 foreach in workflow 2 for email. How and when do i append all this data to spreadsheet. Do i need to just nest the foreach’s inside each other?

Thank you

1 Like

@MarkC1500
I would place adding values to your excel separately for each Name scope and then for each Address scope.

1 Like

Sorry, but how is this done?

I’m also wary of this just in case it doesn’t match the email to correct name etc, maybe if no data is taken. I’m not sure.

1 Like

And also if i’m doing one after the other with append and the process fails half way through then ill have names and no emails and if i start again will do the whole thing again, but start putting emails in where it left off.

If that all makes sense

1 Like

I will do it in one sequence for now until i have sorted all the other issues.

Next problem is, the email address I’m after is not consistently surrounded by the same text!!
If possible, i’m not sure extracting text surrounding “@” will work either because it could be the 1st, or 2nd or 3rd @ symbol i’m after.

Im not sure there is a work around for inconsistent data ??

1 Like