Extract Text after underlined first word

Rupendankhara · May 31, 2019, 10:01pm

I want to create a process as shown below. currently not able to separate text elements, as shown below. There for all the text gets extracted instead of just one line.

open PDF, sample above
create excel file with same file name as in bottom of Page.

3.Extract specific information (sample data extraction file shown below)

extract bold text and plug it in a column in excel
Extra all text after underlined text and plung it in different rows under the bold text.
Perform same task for all the files in the same folder.

packiaa · June 1, 2019, 1:51am

Is the pdf in image format?
Did you use Get PDF Text activity or Get Text activity?

Rupendankhara · June 1, 2019, 4:02am

@packiaa The file is not in image formate. Yes, I have used Get PDF activity but it gets all the word. What I am trying to do is to get a specific line. For ex, if you look at the image attached in my question. I am trying to extract all the text written after MATERIAL AND SELECTION words. I want to know a solution which enables me to get text specific to what I mentioned.

wasea · June 1, 2019, 10:32pm

@Rupendankhara, can you provide a copy of the pdf file? From my experience with pdf files and UiPath, using Acrobat Reader 2018 is better than using Acrobat Reader 2019.
Related to this article:

Rupendankhara · June 1, 2019, 10:46pm

Please see the image on first post on this page., Its attached!!

wasea · June 1, 2019, 11:00pm

Using a pdf file from you as a test, I can try to use different settings (as the one in the link that I shared in my first post) to see which is the best method to read the file.
Even if you can’t to take line by line, you can take all the text and apply some ‘substring’ operations to get what text you want and what line ou want.

Shortly, as per my understanding of your request, I believe the workflow should be like:

Read PDF file and save the output in a variable
Using assign activity to assign the text you want to be extracted in a variable. (Using substring or RegEx)
Use Excel Scope Application with Write Range to put the data in the excel file you want.

Rupendankhara · June 1, 2019, 11:10pm

Hey wasea,

Will ypu be able to do from the pdf (attached). It similar for all others. I want to extract bold text, then want to extract text after undeline.

I am new to Uipath and coding, would be able to help me explain how Regex will help?

Thank you,

wasea · June 1, 2019, 11:14pm

Well, attached in the first post is a .jpg file, not a .pdf file.

Rupendankhara · June 1, 2019, 11:18pm

Let me give you then.

Rupendankhara · June 1, 2019, 11:25pm

Please use this for reference.

RTA - 001 AUTO GALLERY for blog.pdf (334.6 KB)

Rupendankhara · June 4, 2019, 1:50pm

@wasea any luck, with solution?

wasea · June 5, 2019, 10:17pm

Hi @Rupendankhara,

Please check this workflow: PDF.zip (341.4 KB)

It can be for sure optimized, but is just an idea how to deal with that pdf.

I used “For each” to get all the files from a specific folder.
I open each file one by one (Start Process)
I read the first and the last row (Last row is the ImageName or so)
Get Text activity to get Flooring data
Get Text to get Treads Data
Get Text to get Walls Data
Get Text to get Ceiling Data
Excel application scope to add data to Excel row by row (it can be optimized with DataTable for sure).
Kill Process to close the PDF.
There are 2 folders: 1 for Invoices/ 1 for Excel file.

It might work also with RegEx (https://regex101.com/) to extract specific words from the pdf, or using substring function over the entire file.

I hope it helps to give you some ideas.

Vasile.

Rupendankhara · June 6, 2019, 3:19am

@wasea I loved it. It extracts all the required fields, as I specified. Thank you so much for your help. I can not appreciate much.

What do you use for extracting specific text? As TREAD , WALLS, CEILING Can all be different words in the next file. for example it would be COUNTERTOP, FLOOR, ROOF. The code you have given is very specific to this file.

Would be possible then to extract data in the same way. Data extracting criteria would " : " Make word in front of " :" as column and after it “Text under it” .

The PDF are ever changing. See example below. It does’nt have same headings (text).
pdf7-3.pdf (868 Bytes)

The only criteria here would be BOLD, CAPS, UNDERLINE and : now i would not know how do we go about it. Let me know if you don’t understand any part of it.

wasea · June 6, 2019, 12:27pm

Hi @Rupendankhara,

As you can see in the solution that I’ve sent, I’ve created a lot of variables in order to get the required text. You can change the variable names, as you want.

Unfortunately, at this moment, I’m not aware of how to extract only the BOLD or Underline words. For CAPS words, REGEX can be used to extract the data.
What I’ve did are only some examples how to extract data, you just need to enhanced it to get the required data.

By the way, your pdf file “pdf7-3.pdf” appears to be empty.

Vasile.

Rupendankhara · June 6, 2019, 1:11pm

Hey, thank you for all you have done. Let me send you that file again. See below. The file will have different text and titiles, only thing uniform will be caps and text after " : ". see if you can find a way to extract using that patterns. Thanks again.
RTA - 007 WINE CELLAR 008 RED WINE STORAGE 009 WHITE WINE STORAGE 010 WINE BUTLER-2.pdf (3.1 MB)

Rupendankhara · June 8, 2019, 4:22am

Hi @wasea,

Can you generate a search for this example using regex. using expression as shown below. REgex%20expression%20for%20Bold

Topic		Replies	Views
Pdf extraction-single data Off-Topic Discussions studio	4	1127	July 22, 2019
Extracting Multiple Text from a PDF Studio excel , pdf , studio , question , activities_panel	2	486	January 13, 2024
Extract Text from PDF / specific elements from pdf / Selecting each paragraph / Accessibility Settings Help pdf , activities	21	17588	October 11, 2019
Extract pdf specific data Help pdf , activities , data_scraping , string , question	4	4233	November 27, 2019
Extract Specific Data from PDF File Activities uiautomation , pdf	5	2290	July 12, 2022

Extract Text after underlined first word

Related topics