I have 20000 pdf’s that i need to extract data from. Unfortunately the ui elements are all over the place so i can just scrape data, which is very annoying.
I have “Read PDF Text” then removed all unwanted characters (as there was arabic in the PDF’s)
I now how a String with full text from the pdf.
I need to extract the text in the string that lies between the words “SECOND PARTY THE BUYERS NameEnglish” & “Name Arabic”.
The data i need is consistently between these two strings.
strTest="Name english some text Name arabic"
strExtract=strTest.Substring(strTest.IndexOf("Name english")+12,strTest.IndexOf("Name arabic")-strTest.IndexOf("Name english")-12)
Hi @MarkC1500,
How about the solution below, where the whole input string was: "“SECOND PARTY THE BUYERS NameEnglish ABC XYZ Name Arabic”, and the result I got was “ABC XYZ”.
Hi,
I am generating a code from a registration page. when i fill all the data in the from after that i got a popup with code. Can anyone tell that how to take that code in excel file?
Please help me as soon as possible.
Hi @Mudita123,
Just go to the right category (most likely “Rookies”) and create a post with “New Topic” - make sure it has a good title and you provide enough info to help you asap
@PAD I used your solution in the end. needed to do some tweeking and exceptions for files with no data etc. That’s the “Name” extracted…
I have set up another work flow to extract the email address. This is all fine.
But now i have 1 foreach in workflow 1 for name and 1 foreach in workflow 2 for email. How and when do i append all this data to spreadsheet. Do i need to just nest the foreach’s inside each other?
And also if i’m doing one after the other with append and the process fails half way through then ill have names and no emails and if i start again will do the whole thing again, but start putting emails in where it left off.
I will do it in one sequence for now until i have sorted all the other issues.
Next problem is, the email address I’m after is not consistently surrounded by the same text!!
If possible, i’m not sure extracting text surrounding “@” will work either because it could be the 1st, or 2nd or 3rd @ symbol i’m after.
Im not sure there is a work around for inconsistent data ??