Get number base got text on PDF file

Hi everyone.

I have many pdf files in one folder. I want to check all of lines and get number one by one pdf files base on got text " Tổng cộng tiền thanh toán: "

If the line has text as above, then the number will be extracted into the excel file.

Thanks you so much!

hi @trunghai,

I’ll suggest you to use regex pattern to extract the same number

Number = System.Text.RegularExpressions.Regex.Match(text,"(?<=Text:).+?(\n)").Value

  1. Instead of Text you’ve to put "Tổng cộng tiền thanh toán: " OR string variable,
  2. text —> is your string variable containing whole pdf text
1 Like

Hi @samir

Sorry but could you pls support me with a workflow or guide step by step for this case?

Thanks you bro!

Follow this steps.
1.Use for each activity to iterate all pdf files one by one…In for each activity… use…
2.In for each activity use Read Pdf text activity and give op as PDFText
3.Then use assign activty…
N0.=System.Text.RegularExpressions.Regex.Match(PDFText,“(?<=Tổng cộng tiền thanh toán:).+”).Value
4.Use this variable to write text in excel sheet.


Yeah sure @trunghai,

  1. use for each like,
    foreach item In Directory.GetFiles(fullPath)
    where fullPath (String var) having full path of the folder with all pdf’s.
  2. use Read PDF Text activity and use item as file name.
  3. use above assign —>
    Number = System.Text.RegularExpressions.Regex.Match(text,“(?<=Tổng cộng tiền thanh toán: ).+?(\n)”).Value
    Number (String var)
  4. if you get number with some white spaces then use trim on it,
    Number = Number.Trim

Sorry Bro I have problem as below picture.

I could selected one by one line on pdf file but I don’t understand error as below.

Hi Bro @samir and @monika.c

I have made my workflow as attach file…

In my xaml file… pls check and correct help me the next step.

Thanks you much.Invoice Number PDF.xaml (8.7 KB)

check this attached workflow
Invoice Number PDF.xaml (14.0 KB)

1 Like


hey just made it correct. it’ll work now but I’ve added one activity called select folder so at the starting of the workflow you need to select that folder.
just have a look here,
run this workflow and you’ll get the idea.

Invoice Number PDF.xaml (7.8 KB)

Note: You don’t need to read that pdf line by line. Read PDF Text will give you whole pdf text in string format.
If you’re expecting more than one numbers then let me know.

1 Like

Hi bros @samir & @monika.c

Thanks you so much bros.

But seem has problem on my pdf file… so I could not get number on that.

Pls check as attached file which include my xaml file.

Thanks you! (562.8 KB)

hey man @trunghai,

check this out. (574.2 KB)

1 Like

Hi Bro @samir

Might has problem about the version.

How to do fix this ?

Hi Bro @samir

Sorry, I have fixed about the version problem.

Hi Bro @samir

Thank you Bro… I have understood :slight_smile:

If I want to got number “123.456.789” in string : (Total): 123.456.789

What is the command ?

hey @trunghai, you want “123.456.789” from string “(Total): 123.456.789” right? Correct me if i’m wrong.

Then simply use replace method on that string.
lets take Str (string var) —> “(Total): 123.456.789”
take 1 assign,
Str = Str.replace("(Total): ","")
so that it’ll replace “(Total): " by null value —>”"

1 Like

Hi bro @samir

Thanks you… but if I want to check and erase all of characters without number and “.”

How to do this ?

Hi Bro @monika.c

Thanks you so much about the xaml file.

Could you pls support me a sample project write data from some pdf files into the excel file ?

I want to extract 3 fields on pdf files and fill in into the excel.

Pls kindly check the attached file.

Thanks you so much!
PDF to (562.8 KB)

which 3 fields you want to extract…?

1 Like

Hi Bro @monika.c

Pls check as picture in folder… I have highlighted them.

Thanks you so much!

check this attached workflow…
Sample (562.1 KB)

1 Like