PDF file color text convert as a hyperlink

Manikandan_Ckj · October 24, 2023, 8:48am

I have PDF file in that color text convert as a hyperlink.
Scenario:
A. Must identify the blue color texts which will be having section(Dynamic) as keyword and ignore the section keyword without blue color. The format of the section keyword is not consistent. Sometimes it would be inside brackets.
B. In some cases, Infront of section keyword, module keyword will be mentioned, and we have to take which module the section belongs to and link only the text in blue color.
C. In some cases, the section keyword itself will be not there.

How to achieve this through UiPath.
How to do this
What are all the activity can be used.

Expecting early response

@loginerror @Vibhor.Shrivastava @Palaniyappan @Lahiru.Fernando @mukeshkala @RAKESH_KUMAR_BEHERA

mukeshkala · October 24, 2023, 9:04am

Read the PDF and get the Text
You can use regular expressions (regex) to identify URLs in text.

Here’s a simple regex pattern that can help you match URLs:

text = “Here is a sample text with a URL: https://www.example.com and another one http://google.com”

url_pattern = r’https?://\S+|www.\S+’
urls = re.findall(url_pattern, text)

This regex pattern will match URLs that start with “http://” or “https://” and those that start with “www.”

You can modify the pattern to suit your specific requirements, but this should work for most common cases.

Manikandan_Ckj · December 22, 2023, 2:51pm

Solution of the Day:

import fitz # PyMuPDF

def extract_blue_text(pdf_path):
doc = fitz.open(pdf_path)
blue_text =

for page_num in range(doc.page_count):
    page = doc.load_page(page_num)
    blocks = page.get_text("dict")["blocks"]
    
    for b in blocks:
        for l in b["lines"]:
            for s in l["spans"]:
                color = s["color"]
                # Assuming color code 255(blue) is used
                if color == 255:
                    blue_text.append(s["text"])

doc.close()
return blue_text

pdf_path = “color.pdf”
blue_text = extract_blue_text(pdf_path)

for text in blue_text:
print(text)

def main():
pdf_path = “PDF input file path”
blue_text = extract_blue_text(pdf_path)

pdf_output_path = "Output/bluetext.xlsx"
wb = Workbook()
ws = wb.active
ws.append(["Extracted Blue Text"])  # Header
for text in blue_text:
    ws.append([text])
wb.save(pdf_output_path)

Topic		Replies	Views
PDF color text needs to convert as a hyperlink Community question , community	1	229	October 24, 2023
Identify specific color text in PDF Activities pdf , activities , question	2	253	October 23, 2023
How do I isolate the text in a pdf file that have hyperlinks and also are of different color? Help uiautomation , pdf , activities	0	1287	December 2, 2017
Isolating a colored text in a pdf. Also the text is a hyperlink Help uiautomation , pdf , studio	0	1269	November 29, 2017
PDF Keyword search Help	4	1363	December 15, 2019

Most Active Users - Yesterday
ashokkarale
vrdabberu
Anil_G
mukesh.singh
lrtetala
Parvathy
devasaiprasad_K
Sira
SenzoD
Jon_Smith
More details...

PDF file color text convert as a hyperlink

Related Topics