PDF Data Scraping

Hi,
I want to scrape the data from multiple pdfs which contains the data as shown in the SS.Which process should I follow to scrape the data and put the same in an excel sheet
image

Hi @PDcoder use the het pdf text activity and then you will get entire the data in string
then use split function split the data …first split the data with new line and then split the each line using space or : and assign that data.

2 Likes

@PDcoder does the multiple pdfs are same in this format if yes you can simply go with scrape relative
-initially use build data table activity with the fields you want.
-then use directory like directory.getfiles(“path of the folder”,“*.pdf”)
-use for for each and give the input of arrayvariable of directory and change the argument to string.
-inside the body use recording of image type go to scrape relative and scrape all the values which you want.
-at last use add data row activity and give the variable names inside the array and give the datatable input.

1 Like

Hi @kalyanDev,
How to get only that key value after “:”
Can you please explain with example in uipath?
Thanks in advance :slight_smile: .

Hi @PDcoder

Use matches activity

And give the regular expressions as (?<=GSTIN:).*

Check this and let us know
Thanks
Ashwin.S

Hi @AshwinS2

Hope this might be helpful to you Anchor Base

cheers :smiley:

Happy learning :smiley:

1 Like

use split activity or varible.split(“:”…ToCharArray)(1) try this and let me know