Scanning Google Spreadsheet in Web and convert info to CSV

Hello everyone,

Every week our company receives an e-mail that contains information about hotel allotments in a Google Spreadsheet format that looks like this:

We want to build an automation that takes this info and converts it into a csv with the following format:

image

I need to scan every colored cell inside the tables, and depending on the color of the cell, the field “Action” will have different values, so for exemple, if the color of the cell is red, the value of the action should be 0. I tried using file explorer and there doesn’t seem to be any color attribute related to the cells that I can use.

I have a separate data table with the hotel codes like so:

|Sun Villa|1040281|3152059|
|Standard Beach Bungalow (Beach Villa - 2022)|1040281|1274883|
|Superior Beach Bungalow (Deluxe Beach Villa - 2022)|1040281|1274884

I guess my first question would be how do I know dynamically how many tables are in the google spreadsheet file? Or should I copy the info from the google spreadsheet to an excel file and try to go from there?

Is this even an appropriate use case for RPA?

I apologize for posting such a broad question, but I really don’t know how I should procede.

Thank you in advance.

Hey Andre,

There are a couple of ways to go about this depending on what selectors are available for the sheet. I do think it is a good use case for RPA.

The first option I would try, is the Find Children activity. This should help you determine the number of tables in the file. Then you could use a for each over the output. As for determining the cell color, using the get attribute activity on each cell, could help you identify the color.

Alternatively, you could find the selector for each table and increment the IDX value to find each table. Using an element exists with the selector. When the element no long exists, you have identified the number of tables. The same approach could be used to get the cell colors.

Interesting problem to solve but I certainly think it is possible. It is hard to help without access to the sheet to test selectors with it. There could also be a package that helps integrate with google.

May I share the link with you privately Thomas?

Hey Andre,

I got a link in my email that seems to be working. Let me check it out and get back to you.

Thank you Thomas, truly. :slight_smile:

Hey Anrdre,

I put together part of the project. First to extract the months from the website, then using the Class attribute from the Get attribute activity to identify the cell color.

I imagine using the extract structured data to extract all the from the website. Then you can use a for each row to iterate through the rows and using a dynamic selector inside the get attribute to extract each cell color.

Attached is the example. Let me know if you have questions. ScrapeGoogleSheet.zip (803.8 KB)

Thank you so much Thomas, let me have a look at the code and I’ll get back to you!

So, I’ve taken a look at the sequences you made Thomas. I’m struggling now on how to get the color information from the cells in the context of the day and the hotel.

So take the following table for instance:

When I scan the first day of the table and get the color attribute of the cell (October 1st), it needs to be in the context of the hotel and the day it’s being scanned. Can you have multiple anchors for a selector? For instance can you have an anchor for the hotel (Beach Villa) and the day of the month (1) and then increment the number of days? You’d need to know the number of days in the month beforehand so you’d know when to stop the cycle.

Here’s what I have so far.

VILLA RESORTS_STOP SALES.zip (216.9 KB)

Hey Andre,

Check out this updated code. ScrapeGoogleSheet.zip (10.5 KB)

Also, check the TestOutput.csv This was generated from running the code. You will need to add more to the switch cases to account for all the varieties of “Class”

This 3 loops nest. The first outer loop to iterate through the entire scraped table. The second loop to iterate through the next 4 rows after a month is found. The last loop to iterate through each column of those rows to get the cell colors. You may need to add an outloop to iterate through the 3 tabs at the top.

This was an interest project. Hopefully, this helps get you to a solution. Let me know if you have questions.

Hello Thomas,

I’m so sorry for getting back to you so late. I really appreciate your input but we decided to go a different route and execute the transformations using excel files.

So, in that regard, I need to extract the tables with the date and color information from the three sheets named “SUN”, “ROYAL” and “PARADISE”. Keep in mind that the number of tables in any sheet could vary.

Here’s and image of the sheet “ROAYL” of the excel file:

How I want the extracted data to look like:

I’ll attach the input excel file, as well as how the output files should look like.

If the mods deem it appropriate, I’ll create a separate thread to address this issue.

Thank you.

hotelMasterFileInput.xlsx (200.1 KB)
sheetParadiseOutput.xlsx (20.0 KB)
sheetSunOutput.xlsx (24.0 KB)
sheetRoyalOutput.xlsx (16.0 KB)