We use PDF versions of CAD drawings. Generally, a relief valve has a specific symbol in the drawings, and is denoted with a tag number (ex: PSV-XXXX). Each relief valve usually has data located near the relief valve symbol (Ex: Set pressure, valve size). Is there a way to automate the process to search through several pages of PDFs to provide the relief valve number and the associated data? When I do this manually, I usually do a text search for “PSV”. Sometimes the “PSV” text search can return a relief valve (which I want), or it can return a pipeline associated with the valve (which I don’t want).
I am not sure that it will work for CAD design PDF documents. UiPath can extract the text from the pdf. Give a try by scraping the data from the pdf by using Read PDF with OCR activity and insert the Tesseract OCR in it. The output of Read PDF with OCR is in String datatype variable. Use write text file activity to write the variable in to a notepad file.
Then check it was extracting properly or not.
If it extracting properly then we can use the regular expressions to get the required data from the pdf.
Hope it helps!!