Scraping alot of word documents

Hello, I am trying to learn scrape word documents for a specific strings.
Basicaly what I need to do is to open up a word document find a specific string (for ex: “Price:120eur”)
The idea is that the Robot would find all the prices in different word documents then it would paste them into Excel/CSV. Right now Im just learning all the basics so this task should help me a lot to learn further.

Thank you for your help!

ok so what help do you require?

@Tomas1

You can use Word Automation activities pack and achieve the same. ReadText activity will read the entire document and converts into string. By applying regex and few string operations, you can get all the occurences of Price and their values.

Alternatively, if the position of Prices are fixed in every document, then screen scrapping would also works here.

If possible, share screenshot for the same, possibly help you with exact pointers.

  1. How can I open one document at the time? Like how can I feed into the robot 10 documents at once.
  2. I already know how to open them up, read whole text and then add it to string variable.
  3. I dont know how to make program search for a keyword “Price:” and then make it copy the whole string as a “Price:120eur”
  4. Excel part is not a issue.

I am really thankfull for taking ur time to help me.

(1) First you need to place all the documents in one folder. Have a look at this use it as an example and read each file inside the loop
(3) you can use regular expression to handle the search and extraction

@Tomas1,

  1. assign arrstr = Directory.getFiles(“Word docs path”)
  2. For each item in arrstr
    a) use word application scope
    - And then read word file using read text and store result in one String variable.
    - Assign RequiredPrice = str.substring(str.indexOf("Price: ")+"Price: “.Length).split(” ".TocharArray)(0)
    - And then write into Excel file.

Thanks it worked perfectly how I wanted to. Altho I used “Matches” activity with regex. :stuck_out_tongue:

Hi @Tomas1,

If the values are constant you can use regex & get the data easily.