Urgent for scraping pdfs

hello everyone I have a data table I want to pass variables to one of the columns to put a changeable regex so I can get a values
Note: has anyone has a solution for unstructured pdf the data inside the table it is not stable so I had to got to regex but by somehow i find it with the changeable data hard to scrap caus it is considered as a dynamic data, not static which doesn’t have a constant number of the item.
i want to scrap the table in csv file JoeyTribbiani_01102020_281092.pdf (25.7 KB) MonicaGeller_09052020_87654.pdf (15.9 KB)


this can be done using Regex do as follows

  1. read using pdf to text activity (let say output : mypdf)
  2. remove Header details we are only focusing on Table to remove use this Pattern β€œ(.INVOICE.\sInvoice.\sInvoice.\sDue.*)”
  3. to extract table use use a simple method we will build CSV string
  4. Identifying table headers use assign activity and for the left side mypdf = Regex.replace(mypdf,β€œ(?<=ID)(\s{1})|(?<=DESCRIPTION)(\s{1})|(?<=QTY)(\s{1})|(?<=PRICE)(\s{1})”,β€œ!”)
  5. we will use β€œ!” as our delimiter
  6. Identify ID column use assign activity and for the left side mypdf = Regex.replace(mypdf,β€œ(?<=^\d{2})(\s{1})”,β€œ!”)
  7. mypdf = Regex.replace(mypdf,β€œ(?<=\d{1,}.\d{2})(\b \b)|(\b \b)(?=\d{1,}.\d{2})”,β€œ!”) this will identify total and Price column
  8. mypdf = Regex.replace(mypdf,β€œ((\s{1})(?=\d{1,}!.\d{1,}..\d{2}))”,β€œ!”) this will identify Item colum
  9. mypdf = Regex.replace(mypdf,β€œ(\s)(?=[1]{3})”,β€œβ€) this will identify line breaks and it will remove it and combine it as single line
  10. now use Generate data table activity and pass your created csv string and use first record as header and also delimiter used in the step 5

for this entire flow you need String Split and Regex Replace to build your code use Assign and Generate data table activity

Please test it to min 30 invoices and harden pattern above patterns are not well tested but you can do it using regex

Finally I highly recommend you to use Uipath Document Understanding framework . it’s lot more easy


  1. A-Z β†©οΈŽ

1 Like

thanks for your support and can you please share with me the workflow and thanks for you .

Hello Islam,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

Your pdf is at this time:
1:17:10 File 19 PDF with multiple pages and columns with multiple lines


Cristian Negulescu