Automation of text extract from unstructured pdf invoices, containing fields that use more than 1 field name on the invoices, and using extracted text data, to create a new maintenance edit file in a desktop application that was installed through web

Hi guys, I have been trying to figure out this automation for a good 2 years probably now, and cannot seem to get it down if my life depended on it.

I am trying to use ocr on pdf unstructured invoices, to extract key fields then open a web browser downloaded application I use for work that is downloaded and used as a desktop application on my pc, travel to the maintenance file entry section of the application, specify whether the truck or trailer had work done to it by intelligently using the data from the invoice to figure out if a truck or trailer was worked on, and choose between opening a new entry for a truck or trailer, once it has been detected and the new form field is opened I want to enter the extracted information into the corresponding form fields as text and also choose the correct drop down menu items that correspond with the extracted values from the invoices. Once all form fields are chosen, and text has been entered in, I want to save the maintenance entry. After the original save I then want to move to the file tab of the form page that is open, and upload the pdf invoice that was used to extract the data, save and close, and do this for all pdf files located in the folder. After the maintenance log has been entered and saved I would like that pdf file in the folder to be moved to a “completed folder” and any instances that has errors or were unable to process correctly be moved to an “error” folder

The fields go as follows and some fields go by multiple names on the invoices

1.Description - issue - complaint - work requested

  1. Fleet ID (drop down menu or text search)- truck # tractor # trailer # vin # - unit (the unit numbers of the invoices will sometimes go by the last 4 numbers of the vin number, and the application unit numbers have a letter in front of them for some units, and they will not appear on the invoice that way so the field names won’t be 100% matching sometimes)

  2. Odometer - mileage

  3. Reference # work order # invoice #

  4. Date

  5. Category (is there a way to use intelligence to have the automation choose an option from a drop down menu that matches the work that was completed on the invoice?)

  6. Comments section (I want the shop info to appear here)

  7. Total price

  8. Work completed section - this will sometimes be shown as list items of a table, and sometimes will be from line text from technician comments on the invoice depending on the structure of the invoice and how it was made up

If someone could PLEASE help me all with this i will be FOREVER grateful, I have over 10,000 invoices to input and this is the only way i will ever get them caught up

@Ethan_Sloan

Welcome to the community

Did you try docoument understanding?

Of if you are on latest…you have generative extractors as well

Cheers

Hi @Ethan_Sloan , welcome to the community!

I am guessing the work order and invoice are generated by an ERP system and then you want to pull that information into Power BI Desktop, followed by publishing to the Power BI service?

All information you are looking for should be already available in the ERP database. You can connect PBI to the database, fetch the information dynamically using DirectQuery, edit the data using the PowerQuery editor, create the visuals and finally publish the file to PBI service.

Why I suggest this route? - you mentioned 6. Category is not available on the invoice. That information should be available in the work order table of the database.

You can loop through all invoice pdfs (For Each File in Folder), grab the work order number and invoice number, write to an Excel file and use that Excel file as a related table in the PBI desktop to filter your data imported from the ERP database.

If this is not possible, then you can look at other less reliable, less accurate options.

The invoices actually have all come from emails, from shops that we have used for on the road maintenance for our trucks and trailers. I have a document understanding project I set up and marked the key values to extract from each pdf which has a decent success rate of pulling the correct data, the issue I’m running into and questions that still remain, are

  1. how to use the document understanding model to loop through each pdf file 1 by 1, and after each data field that is extracted how I could then open the application ran on my desktop (pro transport) and have it then add a new truck maintenance, or trailer maintenance file, based off of the Tractor #, Last 4 digits of tractor VIN number, Trailer #, or last 4 numbers of vin number of the trailer that has been extracted off of the invoice. In the application you have to select whether the maintenance log will be for a truck or trailer, and I’m not sure how to get it to automate the creation of the correct format (truck / trailer) when creating the new maintenance file input for the new form to input the data into.

also another issue I’m having, is on some of the invoices there are structured tables that list what work was done, and what parts were used, on some the only explanation of work that was done comes as a line item of “technician notes” both of these could potentially be used in the work completed section of the application form i would be filling in but it is dependent on what information each one has. for instance, if the tech notes section just has info similar to “job completed” and not actually tells what was done, then i would want to use the table data to list the work done, but if there is no table data then i would want to use the technician comments section notating the work completed. i don’t know how to map both of those into that section, depending on which would have the correct information that was extracted about the job and have it choose the correct option automatically.

I don’t even need every automated, if i could just get the extraction model, to extract the correct fields into an excel worksheet and add a new row for each file so i can take the data from that and use the AI clipboard it would make my life easier, but also along with that i would need it to be able to properly choose from a drop down list that not all the matching unit number information would be readily available to choose from since the invoices would only have partial unit numbers on them compared to what the unit numbers show as in our system

THIS IS SUCH A HEAD ACHE. I can’t even begin to count the WEEKS and possibly months of time I’ve wasted on trying to get this to work to accomplish the tasks that I am needing to.

i tried attaching a video to give a better understanding of what I’m trying to accomplish but unfortunately was not able

Hi @Ethan_Sloan , this is definitely something you need to tackle batch by batch.

It seems you have more power in this process than you think, because you are the one paying the bills! You need to let the repair shops know the details you’d need on an invoice. Basically, give you the details or the invoice doesn’t get paid.

Identify a shop that contributes the lion’s share of the repair bills. Create a master data file (Excel) and enter the details of that shop - shop name, supplier code, address etc etc. Use "Digitize Document" activity to extract information from the pdf and “Classify” the document and identify the shop. Create a template for that shop to process their invoices. Rinse and repeat for all shops.

@Ethan_Sloan do you definitely need the data to be automatically inserted into your desktop app? or an excel/csv/sql/txt output would also be acceptable?

If you need to insert it inside the desktop app, could you maybe share the name and specs of this app?

Do you have a UiPath developer team, this shouldnt be too hard for them to help with?
It sounds like you aren’t sure how to build automations in general?

I do not have a developer team, i actually have the trial pro version. I have put together a couple simple automations in the past but nothing extensive like this. honestly if they wouldn’t go into the app that would be fine, because some of the information has changed, like some truck numbers now are in the system as the last 4 of the VIN number instead of the old truck number, and some trailer numbers don’t match what we have in the system also, with using the last 4 of the vin for them. it wouldn’t do a whole lot of good to be able to autopopulate into the actual app form itself, unless i could somehow make a prebuilt form, to include the drop down menus and button pushes and automate the selections, along with inputting the correct information and also be able to have it decipher between what unit ( truck or trailer) it should be making the new maintenance edit for, with also matching the unit number, or vin number on the invoice, to whatever unit it matches closest to in our list of trucks and trailers.

so what would honestly be the absolute best scenario is my eyes,

.1 extract the correct fields, for each form field with them being completely un structured, and going by a multitude of different field names or anchors… for work completed sometimes it would be through the lines listing all the parts and work, and other times it will be completely missing a table with the info, and you will have to go off of the tech comments, or spots of explanation throughout the document of what was done.

  1. auto populate the extracted information, either into an excel table based on the field, and create a new row for each pdf. or autopopulate it into the form itself, and automatically have this whole task completed( scanning file, extracting information, opening correct option for new maintenance (truck or trailer add new option) choose the corresponding unit in the application with the unit on the invoice that work was done on, enter correct information from extraction into correct fields in app form, and summarize it as much as possible. automate unchecking " send to driver deductions if check, uncheck sent to accounts payable if selected", choose correct date by either changing text or choosing from the drop down menu, entering odometer if available, reference number and amount also, and filling the comments box with the shop info, and work completed as either lines from a table, or comments from the technician.

  2. after that would be completed, the tab uptop, the save option would have to be hit, and then you would need to move to the files tab, and upload the pdf that the invoice was extracted from. save and close, then on to the next.

Sound like a headache? IT IS

i tried uploading a video to try and show you guys what im dealing with, but its way to large, not sure how i could get it posted either. but would def bring more light onto the situation at hand

My advice is to get one.
You are going to need a paid licence because the trial is just a trial and you cannot use a community licence for this.
It sounds like its out of your depth and I don’t think someone can give you the level of detail you require you need to do this yourself (plus your trial will still run out).

I’d also advise you to remove some of those screenshots as they contain customer data and contact info. Whilst not personal so I think technically not against any laws, if I were a customer and saw my work orders posted on forums I wouldnt be happy.

1 Like

I <cut> these screenshots @Ethan_Sloan. Feel free to repost with the data blurred out