Hi,
I need your help regarding using PDF activity. I want to extract hyperlinks in some text and images. I have tried multiple time but PDF activities only extract visible text. Please help if someone faces the same.
I have seen someone in forum uses copy file (pdf to text) and then find hyperlinks. But it didnt work for me.
First, use UiPath’s OCR capabilities (e.g., the “Read PDF with OCR” activity) to extract text from the PDF. Make sure to select an OCR engine that provides the best results for your specific PDF, as OCR accuracy can vary.
Find Hyperlinks in Extracted Text: After extracting the text, you can use regular expressions or string manipulation to search for patterns that represent hyperlinks.
I hope you need to extract the Only Hyperlinks from PDF. If so, you can convert the PDF data into Json and transversal each everynode which need to be Extracted/captured from PDF.
Thankyou everyone for prompt replies. Actually, I can’t use any website to convert to any other format like JSON , as proposed by one of the member due to client limitations. Rest, I will try to use OCR Engines. Hopefully this might helps.
Actually, this is the second part of the process in which we use regex to extract hyperlinks from text. During reading PDF file, it only extracts text which is visible not the hyperlinks on the images in the pdf.