I am having a pdf file and I need a regex expression to extract the above data from the table. Can anyone help me?
Hi @Abhirami_Ashok_Kumar ,
What value you need get?
or
You need get all table
need to get the entire data
Could you try using the Read PDF Text
activity with Preserve Format and check the Output Text, or you could provide us with the Text here, so we can provide you with the appropriate suggestions.
Use Read PDF text activity and Apply regex so it will easy or you can share the text format after the read PDF activity
@Abhirami_Ashok_Kumar Follow these steps-
- Read PDF with Read PDF Text activity with preserve formatting True.
- Get your Table Data by regex into regex collection.
- Iterate for each records in Regex collection and do split with multiple spaces to get each values
- Store your data in preferred data type.
You can use this regex to get all the lines-
==> \d{1,}\s{1,}\w{1,}-\w{1,}\s{1,}\w{1,}\s{1,}\d{1,}\s{1,}\d{1,}(.*)\d
Let me know if you any further issues.
Regards,
Dev
You want to get table in pdf then write table to excel
that’s right?
You can read pdf with ocr, generate data table,
write data to excel
Can you share your file?
I will test it
=> Use Assign activity and give the below expression.
Matches= System.Text.RegularExpressions.Regex.Matches(strvar,"(\d+)\s+(\S+)\s+([A-Za-z]+)\s+(\d+)\s+([\d.]+)").ToString
Note= Matches is of Data type IEnumerable(System.Text.RegularExpressions.Match)
=> Use For Each loop to iterate through Matches
=> Use Assign activity to store the value in the variable as below.
Assign: SLNO = match(0).ToString
Assign: NAME = match(1).ToString
Assign: Description = match(2).ToString
Assign: Quantity = match(3).ToString
Assign: Cost = match(4).ToString
=> Print the values using Log message or Message Box.