Is there any API that converts PDF purchase orders to EXCEL file? Or any pdf to excel converter software? The pdf format is bit complicated and semi or unstructured data. It’s not tabular data. One line item will have 5 columns like Number, Desc, Qty, UOM, Unit price. But in the pdf, there is no clear line/border to separate these columns. Also, the line item may be multi-line - like the Desc column is 2 lines for 1st line item, 3 lines for 2nd line item.
I have tried with UiPath custom activity PDF to Excel with API Feat Systems - API function ‘SautinSoft’, but getting the output as desired in columns.
I believe the best approach to this is String manipulation. I’ve worked with similar files, and have found the best solution to be using the Split method of the String class to separate the different values and sort them according to the index. Unstructured PDFs are a pain to work with. I will usually as the buyers to provide a structured data format, like an XML or JSON, which makes this exponentially easier to do, with more accuracy as well.
Thanks, rpa4 for the inputs
We have one more requirement with another scanned PDF PO input which need to convert to structured output excel.
We have tried activity READ PDF with OCR and Microsoft OCR and Tesseract OCR, but the output text is not having the values expected as we need the output for the line item table in PO.
Attached the sample with line items.
Please support.Scanned PDF.pdf (155.5 KB)