Hope you all are well. I need your help. I have dozens of pds almost similar format and i would like to extract the following in an excel file. The challenge is there are multiple line items for the same MPR. Please find attached the screenshot.
I think there are some ways to achieve it as the following.
Get text data using Read Pdf Text Activity and extract each item using Regex.
As your data seems regular, there is much possibility it works well. (However it depends on structure of the pdf.)
Thank you so much. Is it possible to get a workflow please? The Regex not sure how to apply because when i tried to copy the text extracted from pdf for testing Regex its giving me like this below
MPR Meter Serial Number Meter Date Reading Date Reading Conversion Calorific Consumption Price Value
Units Factor Value (kWh) (p/kWh) (ÂŁ)
73113801 E160K0640519D7 M 01/04/2020 25379M 31/03/2020 24768M 1.03013 38.9 6801 1.4601 99.30
73113801 E160K0640519D7 M 30/04/2020 37175M 01/04/2020 25379M 1.03013 38.9 131303 1.4601 1917.16.
So if i want to pick up MPR , Date and Consumption. Not sure how to go about this.
Do i need to create a template.xlsx before running the workflow?
It’s not always necessary. We can build datatable using Build DataTable activity or .net class etc.
However, template file helps high maintainability, I think.
I have lots of pdfs almost the similar structure . how can i apply this flow to the larger sample?
Basically we need to add loop outside the above sample. Rough image is the following.(This workflow won’t work because of just sample. )
Can you please help me with the regex for this? The fields are Invoice No, Supply Period, Customer No, Contract No, MPR and consumption.
GRSFSDFDE REFDINE GETALS LTD Invoice No 2667915
BSFATS ROAD Supply Period 01/05/2020 to 31/05/2020
SOUTHFIELD Invoice Date 04/06/2020
BRAVESEND Invoice Total ÂŁ2,666.16 due on 25/06/2020
DA11 9BG Balance BF ÂŁ3,141.91 due from earlier invoices
Balance Outstanding ÂŁ5,808.07
Customer No G3232 Premises Supplied
Contract No 67232 BSFATS ROAD, SOUTHFLEET, BRAVESEND, AD11 9ZY
Site Ref AD119DC003
Present Read Previous Read
MPR Meter Serial Number Meter Date Reading Date Reading Conversion Calorific Consumption Price Value
Units Factor Value (kWh) (p/kWh) (ÂŁ)
73113801 E160K0640519D7 M 01/05/2020 37581M 30/04/2020 37175M 1.03013 39.0 4531 1.4331 64.93
73113801 E160K0640519D7 M 31/05/2020 47179M 01/05/2020 37581M 1.03013 39.0 107111 1.4331 1535.01
@srinivas_pradeep - Sorry that is not what I asked…what is the effort you put to fetch the data from your question? I see @Yoichi already provided the Regex pattern similar to what you have asked…
So what effort you have put to adjust the Regex ???
sorry i didnt get your question earlier. Please find attached the flow which i tried. This is not only regex but also involves the extraction from pdf and then doing the regex. I landed up in execution error. I tried changing the regex as mentioned by @Yoichi but got stuck. Since i am new to this field was seeking for immediate advice . Main.xaml (11.4 KB)
Thank you @prasath17 . Will go through it.
Yes the MPR and consumption can be multiple. thanks
Also can you please suggest best way to learn Regex? thanks