hi,
please help me find out regex expression for below values[
QUANTITY-100
DESCRIPTION- Clear front covers
UNIT PRICE- $0.49
LINE TOTAL- $49.0
output.txt (803 Bytes)
please explain me how to build regex expression.
Hi,
We need to find its rule if we want to write regex pattern.
In this case, at the beginning of line there is number. We can write it as ^\d+
Other characteristic patterns is there are 2 prices with dollar sign at the end of line.
We can write it as \$[.\d]+\s+\$[.\d]
And remaining is DESCRIPTION as .*?
So we can write this line as the following pattern.
^\d+\s+.*?\s+\$[.\d]+\s+\$[.\d]+
Then we need to extract each items and we use named group as the following.
"^(?<QUANTITY>\d+)\s+(?<DESCRIPTION>.*?)\s+(?<UNIT_PRICE>\$[.\d]+)\s+(?<LINE_TOTAL>\$[.\d]+)",
Regards,
Hi Yoichi
your solution working for my process.
Thanks & Regards
Priyanka
Hi,
I am trying to create separate variable to get value of quantity
Var_Quantity= System.Text.RegularExpressions.Regex.Match( Var_ReadPDF,“^(?\d+)\s”).ToString
but unable to get value
please correct me
Regards
Hi,
It would be the following, for example.
Var_Quantity = System.Text.RegularExpressions.Regex.Match(Var_ReadPDF,"^\d+(?= )",System.Text.RegularExpressions.RegexOptions.Multiline )
Regards,
Hello
Is there any course which explain how do we write regression or select strings.
Regards
Aditya
Hello
Check out this string manipulation megapost.
how to get separate values ?
eg- description, unit price, line total
Hi,
how to get separate values ?
In the above sample. we can get each data separately as the following.
Does this work for you?
item.Groups("QUANTITY").Value
item.Groups("DESCRIPTION").Value
item.Groups("UNIT_PRICE").Value
item.Groups("LINE_TOTAL").Value
Regards,
Hi Yoichi
item.Groups(“QUANTITY”).Value
item.Groups(“DESCRIPTION”).Value
item.Groups(“UNIT_PRICE”).Value
item.Groups(“LINE_TOTAL”).Value
this solution working
thanks & regards
Hi,
This values are only for one PDF document. If I want to extract values for remaining pdf and pdf format is fix but values are different. then how can i automate this ?
because selector is not used here.
Regards
Priyanka
Test.zip (5.0 KB)
Invoices.zip (331.8 KB)
invoice_template.xlsx (10.5 KB)
1 Extract the data from the PDF and enter them in the downloaded Excel file.
2 Extract the data from the downloaded PDF based on the following condition:
a. Quantity should be greater than or equal to 2.
b. Unit price should be greater than or equal to 2.
c. Line total should be greater than or Equal to 100.
d. Due date should be greater than 01-April-2019.
e. Payment term should be due on receipt.
3 Enter the extracted data in the invoice_template.xlsx.
4 Rename the invoice_template.xlsx file to “Output.xlsx”
Hi,
This values are only for one PDF document. If I want to extract values for remaining pdf and pdf format is fix but values are different.
It’s because there is thousand separator (.) in some targets.
The following expression will work for 6 pdf files you shared, at least. Can you try this?
System.Text.RegularExpressions.Regex.Matches(strData,"^(?<QUANTITY>\d+)\s+(?<DESCRIPTION>.*?)\s+(?<UNIT_PRICE>\$[.,\d]+)\s+(?<LINE_TOTAL>\$[.,\d]+)",System.Text.RegularExpressions.RegexOptions.Multiline)
Regards,
Yes, I tried this expression, it gets same value for remaining invoices also
Hi,
Can you elaborate your issue?
It seems no problem in my environment.
invoice02
invoice03
Regards,
Hi,
Only last invoice values are written into excel template .I want to write all invoice values in same excel template.
Regards
Priyanka
instead of $ sign writes rupees sign