I want to create a process as shown below. currently not able to separate text elements, as shown below. There for all the text gets extracted instead of just one line.
@packiaa The file is not in image formate. Yes, I have used Get PDF activity but it gets all the word. What I am trying to do is to get a specific line. For ex, if you look at the image attached in my question. I am trying to extract all the text written after MATERIAL AND SELECTION words. I want to know a solution which enables me to get text specific to what I mentioned.
@Rupendankhara, can you provide a copy of the pdf file? From my experience with pdf files and Uipath, using Acrobat Reader 2018 is better than using Acrobat Reader 2019.
Related to this article:
Using a pdf file from you as a test, I can try to use different settings (as the one in the link that I shared in my first post) to see which is the best method to read the file.
Even if you can’t to take line by line, you can take all the text and apply some ‘substring’ operations to get what text you want and what line ou want.
Shortly, as per my understanding of your request, I believe the workflow should be like:
Read PDF file and save the output in a variable
Using assign activity to assign the text you want to be extracted in a variable. (Using substring or RegEx)
Use Excel Scope Application with Write Range to put the data in the excel file you want.
@wasea I loved it. It extracts all the required fields, as I specified. Thank you so much for your help. I can not appreciate much.
What do you use for extracting specific text? As TREAD , WALLS, CEILING Can all be different words in the next file. for example it would be COUNTERTOP, FLOOR, ROOF. The code you have given is very specific to this file.
Would be possible then to extract data in the same way. Data extracting criteria would " : " Make word in front of " :" as column and after it “Text under it” .
The PDF are ever changing. See example below. It does’nt have same headings (text). pdf7-3.pdf (868 Bytes)
The only criteria here would be BOLD, CAPS, UNDERLINE and : now i would not know how do we go about it. Let me know if you don’t understand any part of it.
As you can see in the solution that I’ve sent, I’ve created a lot of variables in order to get the required text. You can change the variable names, as you want.
Unfortunately, at this moment, I’m not aware of how to extract only the BOLD or Underline words. For CAPS words, REGEX can be used to extract the data.
What I’ve did are only some examples how to extract data, you just need to enhanced it to get the required data.
By the way, your pdf file “pdf7-3.pdf” appears to be empty.