How are you?
I came across a query as I am trying to do some automation on CSR.
I have to extract the details like Item description, Quantity and Price from the PDF file, can you please advise how IntelligentOCR can help me on this as I am trying to do with this only but not able to create the workflow.
If you have any other suggestion for me on how to extract the details, please let me know.
Below is an example how the data is in PDF and it may vary
I think you can make use of the Intelligent OCR for this one. They have Position based extractors which you can easily configure to capture the data in the grids if the format of the document is the same all times.
First you need to do is to create a template using the taxonomy manager and classify the document. Next use the position based extractors if the structure is the same. You can also try to use the machine learning extractor to see how it can extract your fields…
How you build the workflow, you can refer to this sample workflows and create your own…
@Lahiru.Fernando
Thanks, Here I have two queries
1- How to use Taxonomy Manager, I have read about it in the portal but I am not clear on it as after it creates the JSON file what next needs to be done.—> Can you please explain in deatails
2- If it will be position based then it might be a problem as different PDF may have different format in that case how it will pick.
You can get a much better understanding on how to use the Intelligent OCR part and building workflows, if you try to follow the 2019.04 updates course in the UiPath academy. It explains well on how to use the taxonomy manager and how to build the intelligent OCR workflows…
For your second question, if the position is changing time to time along with the formats, I would suggest to give a try with Machine Learning extractor or the regex based extractor or the combination of these…
Go through the course first to get a better understanding…
@Lahiru.Fernando
This course is really helpful but I am not able to analyse how may time we need to define to the extractor. I mean how it can be resolved and also as the output it is not giving the data which I have selected in the PDF.
Kindly let me know if my query is not clear
The machine learning extractor is using a pre-trained model that does not adjust with the manual corrections submitted by the users. So it does not learn at the moment.
Thanks for the response.
Could you please advise in my query How should I process on this.
I need to extract the PO number, Date and the data which is available in the table, how can I extract all these data in an excel sheet.
Without a few real life documents, I cannot propose a solution. Try looking into the Regex Based Extractor and the Position Based Extractor if you have just a few variations of where / now the data appears.
Hi Ioana,
I am facing a issue with the ML extractor, its able to fetch all details of a invoice but somehow not able to fetch invoice qty details which is in a table format.
Initially I thought it could be a pdf issue but its not working for any pdf.
Is this is a known issue?