Hi,
Need to data extraction from pdf files depending upon engine model name not able to do for multiple files able to do for single file. Please help.
Sequence pdf3-.zip (7.5 KB)
Thanks & Regards,
Lakshmi
Hi,
Need to data extraction from pdf files depending upon engine model name not able to do for multiple files able to do for single file. Please help.
Sequence pdf3-.zip (7.5 KB)
Thanks & Regards,
Lakshmi
Hello @lakshmi.mp
Can you share the screenshot of the 2 pdf here.
Also is this format the static one? If yes, you can use Read pdf text activity and use regex to fetch the required data. Else you can open each pdf, using use application/Browser and then use Get text to extract the required data.
Hi @Rahul_Unnikrishnan ,
Pdf files are not static, in each file some model number’s are there depending upon that number i need to do extraction. That model number and extraction details are stored in switch case.
I have attached my workflow above, please look on it.
Depending upon some condition that files will be moved into some folder.
Thanks,
Lakshmi
test.txt (399 Bytes)
Here i have attached the text file, of pdf page 1 where i need to extract engine model , TTSN, TFSN, TSSN.
Thanks ,
Lakshmi
please try to use arrayvar= directory.getfiles(“filepath”)
and you can use for each for iterate output from the array , each item will be your file path use read pdf with ocr and pass item as a input
Hi,
Can anyone help me in extracting the values from above text file, i tried using lookahead but only engine model and TTSN values not able to extract TFSN and TSSN values.
Thanks,
Lakshmi
Hi @lakshmi.mp ,
As you have mentioned, there are different files, does it mean the formats are different in each of the them ? If so, do we have the set of different formats that are to be expected or is it not known or indefinite ?
Once we have the above details, we could try to understand the pattern required for the extraction and use it. Do we have anything in text file that is constant or is relative to the engine model values?
It would also be better if you could highlight the values needed for extracting.
Hi @supermanPunch ,
Below highlighted values need to be extracted,
test.txt (399 Bytes)
Thanks,
Lakshmi
Hi @supermanPunch
Seq pdf3.zip (5.9 KB)
This is my workflow. Please look on it.
If the format is going to be the same throughout all PDF files, then we have the below expression for the Engine Model :
(?<=Engine Model\n).*
System.Text.RegularExpressions.Regex.Match(pdfText,"(?<=Engine Model\n).*",System.Text.RegularExpressions.RegexOptions.IgnoreCase).Value.ToString
TTSN, TFSN, TSSN :
(?<=TTSN\s)(.*)TFSN(.*)TSSN(.*)
We could use one Expression for extracting TTSN, TFSN and TSSN values in groups.
But we do see that there are multiple values for these fields. If multiple values are required then we would need to use Matches
instead of Match
, then iterate and fetch the values.
For Extraction of the First Match we could use Regex.Match
in the Below way :
TTSN :
System.Text.RegularExpressions.Regex.Match(pdfText,"(?<=TTSN\s)(.*)TFSN(.*)TSSN(.*)",System.Text.RegularExpressions.RegexOptions.IgnoreCase).Groups(1).Value.ToString
TFSN :
System.Text.RegularExpressions.Regex.Match(pdfText,"(?<=TTSN\s)(.*)TFSN(.*)TSSN(.*)",System.Text.RegularExpressions.RegexOptions.IgnoreCase).Groups(2).Value.ToString
TSSN :
System.Text.RegularExpressions.Regex.Match(pdfText,"(?<=TTSN\s)(.*)TFSN(.*)TSSN(.*)",System.Text.RegularExpressions.RegexOptions.IgnoreCase).Groups(3).Value.ToString
Check the above expressions and let us know if it doesn’t work.
Hi @supermanPunch ,
I am not able to extract the engine model, in regex builder its highlighting but in workflow its not showing blank.
Sequence.zip (1.8 KB)
Workflow has been attached, please look on it.
Thanks,
Lakshmi
A small modification to the Expression :
(?<=Engine Model\r?\n).*
Could you check with the above expression and let me know if it works.
Hi @supermanPunch ,
Above expression working for 2 files not working for
this file, need to extract only AB6L-3AZ but its coming AB6L-3AZ Build Spec,
test.txt (286 Bytes)
can we pass 2 regular expression for extracting single word, please help.
thanks,
Lakshmi
Hi,
Can we pass 2 regular expression for extracting single word, facing difficulty in extracting engine model. please help.
thanks,
Lakshmi
@lakshmi.mp , If only a Single Word after Engine Model needs to be extracted, then could you maybe split the Extracted value with space, then take only the first element ?
We can do it like below :
Split(extractedValue)(0).ToString.Trim
Could you check it in this way ?
Hi @supermanPunch ,
Thanks,
Lakshmi
Hi,
(?<=Engine Model\r?\n).* [This expression works for all files except 2 files]
(?<=Engine Model ).* =>This expression works for 2 files, how to combine 2 expressions for extracting single word.
Please help.
Regards,
Lakshmi