How to scrape data into columns


#1

Hi. I am scraping unstructured data from pdf. However, I can separate it into rows by newline but i also want to separate into 4 columns: Details, Date, Amount. (don’t need column headers)

So I thought maybe can put into columns by characters e.g. first 100 characters into ‘Details’ Columns, last 10 characters into ‘Amount’ column.
But I not sure how to go about it. Or if you have any other ways, I am open to suggestions

This is the sample of the data

This is how I would like the output to look like

Thank you xoxo


#2

Scrape individual elements into 4 different variables and push them into 1 object,

Push all objects into Array

https://activities.uipath.com/docs/build-data-table use this activity to build a table and then write it to Excel


#3

@bala_subramanyam But i used read ocr from pdf so all the text are together


#4

Try split string with tabs, may be it will help


#5

@bala_subramanyam
sd.xaml (11.0 KB)
What I did by far:

  1. Get OCR text to get all text from page
  2. Assign- get text position to start from
  3. Assign - get text position to end
  4. Assign - get all the lines between start and end
    5)Assign - split by new lines and put into array
  5. For each - loop through to print each line

If i want to split the string, split function doesnt work in arrays nor under for each


#6

Read all text and put it in a variable (text). Try to split with new line “text.Split(new String() {Environment.NewLine})” and then split with tab


#7

@reyaz I tried using assign
text = extract.Split(new String() {Environment.NewLine}).
It gives an error till i change the type argument of text to string[].
After I change to text (string[]), - error: split is not a function of system.array. Not sure if my split with tab code is correct


#8

I got the answer. When to iterate the row, specify under for each the numbers of characters u want then use it as another variable and add it to another column. But it is mostly hardcoding and not dynamically.