How to use string manipulation

hi,
please help me find out regex expression for below values[
QUANTITY-100
DESCRIPTION- Clear front covers
UNIT PRICE- $0.49
LINE TOTAL- $49.0

output.txt (803 Bytes)

Hi,

Can you try the following sample?

Sample20211111-5.zip (3.0 KB)

Regards,

please explain me how to build regex expression.

Hi,

We need to find its rule if we want to write regex pattern.
In this case, at the beginning of line there is number. We can write it as ^\d+
Other characteristic patterns is there are 2 prices with dollar sign at the end of line.
We can write it as \$[.\d]+\s+\$[.\d]
And remaining is DESCRIPTION as .*?
So we can write this line as the following pattern.

 ^\d+\s+.*?\s+\$[.\d]+\s+\$[.\d]+

Then we need to extract each items and we use named group as the following.

"^(?<QUANTITY>\d+)\s+(?<DESCRIPTION>.*?)\s+(?<UNIT_PRICE>\$[.\d]+)\s+(?<LINE_TOTAL>\$[.\d]+)",

Regards,

1 Like

Hi Yoichi
your solution working for my process.

Thanks & Regards
Priyanka

1 Like

Hi,

I am trying to create separate variable to get value of quantity
Var_Quantity= System.Text.RegularExpressions.Regex.Match( Var_ReadPDF,"^(?\d+)\s").ToString
but unable to get value
please correct me

Regards

Hi,

It would be the following, for example.

Var_Quantity = System.Text.RegularExpressions.Regex.Match(Var_ReadPDF,"^\d+(?= )",System.Text.RegularExpressions.RegexOptions.Multiline )

Regards,

Hello

Is there any course which explain how do we write regression or select strings.

Regards
Aditya

Hello

Check out this string manipulation megapost.

how to get separate values ?
eg- description, unit price, line total

Hi,

how to get separate values ?

In the above sample. we can get each data separately as the following.
Does this work for you?

item.Groups("QUANTITY").Value
item.Groups("DESCRIPTION").Value
item.Groups("UNIT_PRICE").Value
item.Groups("LINE_TOTAL").Value

Regards,

1 Like

Hi Yoichi
item.Groups(“QUANTITY”).Value
item.Groups(“DESCRIPTION”).Value
item.Groups(“UNIT_PRICE”).Value
item.Groups(“LINE_TOTAL”).Value

this solution working

thanks & regards

Hi,
This values are only for one PDF document. If I want to extract values for remaining pdf and pdf format is fix but values are different. then how can i automate this ?
because selector is not used here.

Regards
Priyanka

Test.zip (5.0 KB)
Invoices.zip (331.8 KB)
invoice_template.xlsx (10.5 KB)
1 Extract the data from the PDF and enter them in the downloaded Excel file.
2 Extract the data from the downloaded PDF based on the following condition:
a. Quantity should be greater than or equal to 2.
b. Unit price should be greater than or equal to 2.
c. Line total should be greater than or Equal to 100.
d. Due date should be greater than 01-April-2019.
e. Payment term should be due on receipt.
3 Enter the extracted data in the invoice_template.xlsx.
4 Rename the invoice_template.xlsx file to “Output.xlsx”

Hi,

This values are only for one PDF document. If I want to extract values for remaining pdf and pdf format is fix but values are different.

It’s because there is thousand separator (.) in some targets.
The following expression will work for 6 pdf files you shared, at least. Can you try this?

System.Text.RegularExpressions.Regex.Matches(strData,"^(?<QUANTITY>\d+)\s+(?<DESCRIPTION>.*?)\s+(?<UNIT_PRICE>\$[.,\d]+)\s+(?<LINE_TOTAL>\$[.,\d]+)",System.Text.RegularExpressions.RegexOptions.Multiline)

Regards,

Yes, I tried this expression, it gets same value for remaining invoices also

Hi,

Can you elaborate your issue?
It seems no problem in my environment.

invoice02

invoice03

Regards,

Hi,
Only last invoice values are written into excel template .I want to write all invoice values in same excel template.

Regards
Priyanka

instead of $ sign writes rupees sign