Extraction information from Invoices - Multiple Lines

Hi Team,

Hope you all are well. I need your help. I have dozens of pds almost similar format and i would like to extract the following in an excel file. The challenge is there are multiple line items for the same MPR. Please find attached the screenshot.

  1. MPR
  2. Invoice date
  3. consumption (kWh)

Please let me know your thoughts, and any workflow or links would be really helpful as i am new to this field.

Thanks,
Sri

Hi,

I think there are some ways to achieve it as the following.

  1. Get text data using Read Pdf Text Activity and extract each item using Regex.
    As your data seems regular, there is much possibility it works well. (However it depends on structure of the pdf.)

  2. Use Document Understanding framework.

Regards,

Hi @Yoichi ,

Thank you so much. Is it possible to get a workflow please? The Regex not sure how to apply because when i tried to copy the text extracted from pdf for testing Regex its giving me like this below
MPR Meter Serial Number Meter Date Reading Date Reading Conversion Calorific Consumption Price Value
Units Factor Value (kWh) (p/kWh) (ÂŁ)
73113801 E160K0640519D7 M 01/04/2020 25379M 31/03/2020 24768M 1.03013 38.9 6801 1.4601 99.30
73113801 E160K0640519D7 M 30/04/2020 37175M 01/04/2020 25379M 1.03013 38.9 131303 1.4601 1917.16.

So if i want to pick up MPR , Date and Consumption. Not sure how to go about this.

Can you please help?

Thanks,
Sr

Hi,

Hope the following helps you.

Sample20210301-3.zip (2.8 KB)

regex pattern (with multiline option) :

"^(?<MPR>\d+)\s*?(?<MeterSerialNumber>[A-Z0-9]+)\s*?(?<MeterUnits>\w+)\s*?(?<Date1>[\d/]+)\s*?(?<Reading1>[A-Z0-9]+)\s*?(?<Date2>[\d/]+)\s*?(?<Reading2>[A-Z0-9]+)\s*?(?<ConversionFactor>[.\d]+)\s*?(?<CalorificValue>[.\d]+)\s*?(?<Consumption>[.\d]+)"

Regards,

1 Like

Hi @Yoichi ,

Thank you very much. Really helpful. I will try to build on it and if i am stuck will reach out to you.

Cheers,
Sri

1 Like

Hi @Yoichi
Thanks for all the information. do you recommend using Form extractor?

Hi @Yoichi ,

Thank you for the support. Is it possible to extract the result to excel?
Can you please guide?

thanks,

Hi,

Hope the following helps you.

Sample20210301-3v2.zip (9.6 KB)

Regards,

1 Like

Hi @Yoichi ,

Thank you so much. This is perfect solution. Couple of questions,

  1. Do i need to create a template.xlsx before running the workflow?
  2. I have lots of pdfs almost the similar structure . how can i apply this flow to the larger sample?
  3. Also i would like to sum the values under (Consumption) group by (MPR). Is it possible to do it in uipath?

Please let me know ,
Thanks,
Sri

Hi,

Do i need to create a template.xlsx before running the workflow?

It’s not always necessary. We can build datatable using Build DataTable activity or .net class etc.
However, template file helps high maintainability, I think.

I have lots of pdfs almost the similar structure . how can i apply this flow to the larger sample?

Basically we need to add loop outside the above sample. Rough image is the following.(This workflow won’t work because of just sample. )

Sample20210301-3v2-2.zip (9.9 KB)

Actually need to add error handling, file management (move processed file) etc.

Also i would like to sum the values under (Consumption) group by (MPR). Is it possible to do it in uipath?

There are some topics for group sum in UiPath. The following search result will help you.

https://forum.uipath.com/search?q=group%20sum

Regards,

1 Like

Hi @Yoichi ,

Thank you so much. Really appreciate all your help. I will go through it.

Cheers,
Sri

Hi @Yoichi ,

Can you please help me with the regex for this? The fields are Invoice No, Supply Period, Customer No, Contract No, MPR and consumption.

GRSFSDFDE REFDINE GETALS LTD Invoice No 2667915
BSFATS ROAD Supply Period 01/05/2020 to 31/05/2020
SOUTHFIELD Invoice Date 04/06/2020
BRAVESEND Invoice Total ÂŁ2,666.16 due on 25/06/2020
DA11 9BG Balance BF ÂŁ3,141.91 due from earlier invoices
Balance Outstanding ÂŁ5,808.07

Customer No G3232 Premises Supplied
Contract No 67232 BSFATS ROAD, SOUTHFLEET, BRAVESEND, AD11 9ZY
Site Ref AD119DC003

Present Read Previous Read

MPR Meter Serial Number Meter Date Reading Date Reading Conversion Calorific Consumption Price Value
Units Factor Value (kWh) (p/kWh) (ÂŁ)
73113801 E160K0640519D7 M 01/05/2020 37581M 30/04/2020 37175M 1.03013 39.0 4531 1.4331 64.93
73113801 E160K0640519D7 M 31/05/2020 47179M 01/05/2020 37581M 1.03013 39.0 107111 1.4331 1535.01

thank you

@srinivas_pradeep - Yes we can certainly help…

Could you please show us the progress you have made so far in fetching the details which you have asked above???

Hi @prasath17 ,

Please find attached the output based on the previous workflows.

@srinivas_pradeep - Sorry that is not what I asked…what is the effort you put to fetch the data from your question? I see @Yoichi already provided the Regex pattern similar to what you have asked…

So what effort you have put to adjust the Regex ???

Hi @srinivas_pradeep Please click on the below links for Regex patterns:

Invoice #

Supply Period

Customer No

Contract No

Question: Above values are single but the MPR and Consumptions will be multiple ? is that right? Just making sure…

Hope this helps…

1 Like

Hi @prasath17 ,

sorry i didnt get your question earlier. Please find attached the flow which i tried. This is not only regex but also involves the extraction from pdf and then doing the regex. I landed up in execution error. I tried changing the regex as mentioned by @Yoichi but got stuck. Since i am new to this field was seeking for immediate advice .
Main.xaml (11.4 KB)

Thank you @prasath17 . Will go through it.
Yes the MPR and consumption can be multiple. thanks
Also can you please suggest best way to learn Regex? thanks

@srinivas_pradeep - For the Regex…Please refer this post…

I will share the Regex in a min

Hi @srinivas_pradeep - Please check this link for whole table extraction…

You can print these values, as already showed by Yoichi…

1 Like