Pdf to xml in UiPath?

Singh7633 · December 5, 2023, 10:21am

Hi all,

a pdf that has been converted from an excel, does it still contain the xml and tags, being a structured file?

If yes, how do I retrieve them from UiPath?

Dilli_Reddy · December 5, 2023, 1:10pm

-Open the PDF file in a text editor, such as Notepad or a dedicated PDF text viewer, and check if you can find any structured XML or tags.
-Use UiPath activities to extract text from the PDF and examine the content. For example, you can use the Read PDF Text activity to extract text from the PDF file.

Sequence:
Read PDF Text activity (output: pdfText)
Log Message activity (input: pdfText)

-If the PDF appears to be image-based (no selectable text), you may need to use OCR (Optical Character Recognition) techniques to extract text. UiPath has OCR activities like Screen Scraping or Read PDF with OCR that can help with this.

Cheers…!

Parvathy · December 5, 2023, 1:15pm

Hi @Singh7633

May be this thread might help you.

Regards,

Singh7633 · December 5, 2023, 2:02pm

Thanks, I already tried to do the deserialize but it returns an error on line 1.

I don’t need ocr as the file is structured and with read pdf text I can read it correctly.

Now, since it is a table, I wanted to see if I could retrieve the xml and access the contents of the columns via tags instead of regex.

Coming from excel, shouldn’t the tags always be there?

Topic		Replies	Views
Is there any activity in UI path that converts a PDF into an XML? Help	12	3922	April 27, 2022
XML Activities uiautomation , activities , question	7	465	July 18, 2023
Converting PDF to XML document Help pdf , activities , question , xml	2	3693	November 21, 2019
XML Data to PDF and back to XML Help	1	1896	January 16, 2020
Need to Convert pdf to xml for free Activities pdf , activities , studio	2	1223	April 27, 2022

Pdf to xml in UiPath?

Related topics