Extract table structure from PDF

tsc1989 · October 13, 2019, 2:00am

Hi, I’m facing issue to extract data from PDF table. I tried suggestions from here but not able to solve the issue. Need help!

My data is confidential, but I will use sample file from other thread to explain - https://global.discourse-cdn.com/uipath/original/3X/0/8/08d920acd8924b1c5153f06859df13f22f60cb3b.pdf (see table on page 2). My PDF table is similar with few more columns. Using “Read PDF text” I get string with rows separated by new line and columns separated by space. I can use split string to separate the rows by new line, but the problem is when trying to separate the columns. I have no pattern that I can use to split the columns. Using above table, imagine some driver names 2 words but some have 3-4. And car name has no fixed pattern. Imagine another column called “Team” which also has entries with 2-4 words. So I think to get the data into datatable, I need to split columns in some other way.

I tried getting robot to open the pdf and scrape data. Data scraping is giving me weird ouput. I am thiking to use loop+get text to change selector to point to different cells of the pdf. But using UI explorer, not able to find attributes that point to different cells of the PDF table. I don’t know how to explore PDF structure in more details.

Please help! Thank you

sushildarveshi · October 17, 2019, 1:54am

I am new to UiPath. After trying multiple ways, I believe below logic will work. I haven’t tested it completed. Still in process. V crude way though

Get PDF Text
Write text to txt file
Open txt file
Copy the table - not sure how to identify start & end of the table
Paste in excel
In excel, under data tab do text to column with Fixed width as the option

Share if u get something working which is easy

tsc1989 · October 20, 2019, 1:04am

hi @sushildarveshi … thanks for suggetsions unfortunatly it doesn work

i explain what happens, maybe then u or someone else can easier to help me

i follow ur sequence…
1.get pdf text - ok
2. write text to txt file - ok but already no column strcuture (screenshot below)
3. open txt file - tried this manually,
4. copy table - done manually to test
5. past in excel - done, all get paste in first column
6. fixed width - unable to split by column (screenshot below)

screensht after write to text file (censor some info due to confidential):

screensht when try to split by fixed width:

any othr suggestion pls?

sushildarveshi · October 20, 2019, 2:26am

I was facing same problem, but got it working…not the way I had written in my earlier post.
Here is what I am doing.

Read pdf text and store the text in some string variable - say output
Extracting say PO No - from the text

(output.Split({“P/O Number :”},System.StringSplitOptions.None)(1).Trim).Split({Environment.Newline},StringSplitOptions.None)(0).Trim

P/O Number is in your pdf as an identifier.
Highlighted 0 in above code ensures - the required text is pulled from same line as P/O Number. If you replace this 0 with 1 then it would extract the next line info.
If you want to read multiple lines one by one - replace 0 with variable v and use a do while loop increasing the value of variable v by 1
To end the loop, check for any identifier …mine had “****” at the end of table. Used that as an identifier to check if the table ended.

Once each line is extracted, you can extract part of the string using output.Substring (5,3)
…5 being first character where to start and 3 being number of characters.

Hope this helps.

sushildarveshi · October 20, 2019, 2:36am

But since car / driver can be multiple words… I don’t know how to identify which text is for which column. Please post the solution if you solve it… Even I need it… If I get it, I would post the solution as well

Topic		Replies	Views
Extract PDF Data Table Into Excel Format Help datatable , excel , pdf	2	1693	August 2, 2018
Trying to extract columns from unaligned PDF data Help datatable , pdf , studio , string , question	11	1799	January 22, 2021
Copying data table from pdf to excel Help	5	3817	August 29, 2019
Unable to extract table data from pdf file Studio studio , question , tools	4	1217	October 10, 2022
Extract data table from PDF to Excel Help datatable , excel , pdf	8	5722	September 12, 2018

Most Active Users - Yesterday
Anil_G
ashokkarale
V_Roboto_V
Yoichi
sharazkm32
eliamma.joseph
Vaishnavi_RP
sullivanne
nikhil.chandre
Alan_Riquelmes
More details...

Extract table structure from PDF

Related topics