Pdf to xml in UiPath?

Hi all,

a pdf that has been converted from an excel, does it still contain the xml and tags, being a structured file?

If yes, how do I retrieve them from UiPath?

@Singh7633

-Open the PDF file in a text editor, such as Notepad or a dedicated PDF text viewer, and check if you can find any structured XML or tags.
-Use UiPath activities to extract text from the PDF and examine the content. For example, you can use the Read PDF Text activity to extract text from the PDF file.

Sequence:
Read PDF Text activity (output: pdfText)
Log Message activity (input: pdfText)

-If the PDF appears to be image-based (no selectable text), you may need to use OCR (Optical Character Recognition) techniques to extract text. UiPath has OCR activities like Screen Scraping or Read PDF with OCR that can help with this.

Cheers…!

Hi @Singh7633

May be this thread might help you.

Regards,

Thanks, I already tried to do the deserialize but it returns an error on line 1.

I don’t need ocr as the file is structured and with read pdf text I can read it correctly.

Now, since it is a table, I wanted to see if I could retrieve the xml and access the contents of the columns via tags instead of regex.

Coming from excel, shouldn’t the tags always be there?