I have the PDF file where it consistsBDA_Bulletin_20210501.pdf (8.1 MB) of multiple CPV codes each cpv code will have sections so whenever I fetch cpv code I need to fetch the address of that also which will be present in that section
Can anyone help me out
wasea
(Vasile)
May 6, 2021, 11:59pm
2
Hello @chaithanya_kumar_M ,
A high level idea can be:
Read PDF TExt, which will create a string variable.
Split that string by CPV number or something like this
The split will create an array of strings, which should contain I believe the information you have.
For each string, put some activities to check if a specific CPV code exist.
If exist, create some rules to extract the data about the address, using REGEX or other String Operations.
You can put everything in a datatable, than at the end to export it as Excel file. (Write Range)
I hope it helps.
Vasile.
1 Like
@chaithanya_kumar_M - Is this related to this post??
@Sudha_Jha - Here you go…Regex_SJ.zip (8.5 MB)
Run this workflow, you will see CPV code # “72212322” extracted in the output.
Also please let us know from where you want to fetch the address? screenshot would be helpful…
Hi @prasath17
Yes indeed but here we are fetching only one value the actual task is to fetch all the CPV codes
Fetch the CPV code (we need to search in the entire pdf file since multiple cpv code will be available)
If we fetch the CPV code on the same section we need to fetch the address for the CPV code
Please check the screenshot
When i use 0 or more am getting error
Regards,
Chaithanya
Hi @wasea
Thanks for suggestion i will try and let you know.
Regards,
Chaithanya
Hi @chaithanya_kumar_M ,
Also, it might help you if you could take a look over this topic
17 use-cases of extracting tables from PDF with UiPath Studio.
[UiPath extract Tables from PDF (use case) (PDF table)]
0:00 ​ Intro
1:10 ​ Install PDF Activities
2:00 ​ GitHub free code for all the files
2:20 ​ Logic of general workflow
4:40 ​ File 1 simple PDF
9:50 ​ File 2 PDF with a column with multiple lines
20:10 ​ File 3 PDF with a column with multiple words ON the LAST column
27:00 ​ File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 ​ File 6 PDF …
Hope it helps!
Best regards,
Marius
chaithanya_kumar_M:
Fetch the CPV code (we need to search in the entire pdf file since multiple cpv code will be available)
If we fetch the CPV code on the same section we need to fetch the address for the CPV code
Please check the screenshot
Sorry Again, it’s not clear…
When I searched for “cpv principal” i got 66 hits in the attached pdf. Would like to fetch all?
When I searched for “adresse principale” i got 29 hits in the attached pdf. And Only 5 or 6 having the CPV code on the same page as shown below…is this want you would like to extract??
It would like to better , if you provide samples from the pdf attached here…and brief would requirement so that we can help…
HI @prasath17
Since each CPV code, they have developed in 3 languages EN,NL,FR but its is not always in 3 languages depends. so what we can do is fetching all cpv code and address then delete the duplicates this is my idea to go for.
@chaithanya_kumar_M - Please find the starter help here…
Build DataTable
Output is Dt ==> Datatable variable
Read your PDF and Store the output to StrInput
Matches activity
Input is Strinput
Patten is used = "(?<=CPV principal:.+)\d{8}"
Output is IEnRegex
Assign
Dt = (From m In IEnRegex.Cast(Of Match)
Select dt.Rows.Add(m.toString)).CopyToDataTable
Assign
DtUnique = dt.DefaultView.ToTable(True,"CPV Code")
Here is the Outpu :
Output.xlsx (9.9 KB)
Hope this helps…