I am having a pdf file and I need a regex expression to extract the above data from the table. Can anyone help me?

Hi @Abhirami_Ashok_Kumar ,
What value you need get?
You need get all table

need to get the entire data

Hi @Abhirami_Ashok_Kumar ,

Could you try using the Read PDF Text activity with Preserve Format and check the Output Text, or you could provide us with the Text here, so we can provide you with the appropriate suggestions.

Hi @Abhirami_Ashok_Kumar

Use Read PDF text activity and Apply regex so it will easy or you can share the text format after the read PDF activity

@Abhirami_Ashok_Kumar Follow these steps-

  1. Read PDF with Read PDF Text activity with preserve formatting True.
  2. Get your Table Data by regex into regex collection.
  3. Iterate for each records in Regex collection and do split with multiple spaces to get each values
  4. Store your data in preferred data type.

You can use this regex to get all the lines-

==> \d{1,}\s{1,}\w{1,}-\w{1,}\s{1,}\w{1,}\s{1,}\d{1,}\s{1,}\d{1,}(.*)\d

Let me know if you any further issues.


You want to get table in pdf then write table to excel
that’s right?
You can read pdf with ocr, generate data table,
write data to excel
Can you share your file?
I will test it

Hi @Abhirami_Ashok_Kumar

=> Use Assign activity and give the below expression.

Matches= System.Text.RegularExpressions.Regex.Matches(strvar,"(\d+)\s+(\S+)\s+([A-Za-z]+)\s+(\d+)\s+([\d.]+)").ToString

Note= Matches is of Data type IEnumerable(System.Text.RegularExpressions.Match)
=> Use For Each loop to iterate through Matches
=> Use Assign activity to store the value in the variable as below.

   Assign: SLNO = match(0).ToString
   Assign: NAME = match(1).ToString
   Assign: Description = match(2).ToString
   Assign: Quantity = match(3).ToString
   Assign: Cost = match(4).ToString

=> Print the values using Log message or Message Box.