I’m new to UiPath and currently trying to scrape data from google patent. So far I’ve gotten most of the data needed by constructing slectors based on HTML id:s and classes. But one table from which I need to pull data doesn’t have a unique ID or a distinct class.
See this page for an example:
I need to get all the data from the first column of the table under the “Cited By” h3 header. As you can see, the table itself and all the links share the same class as most other links on the page. So far I haven’t been able to target this table specifically and I’m all out of ideas. The goal is to get each patent number in this table into a single string in UiPath…
The header above the table, which says “Cited By”, has an ID as you can tell. So I tried using an anchor base to look for the element directly beneath it, but I couldn’t get it to work and I don’t know if it’s the right kind of solution.
Any help would be greatly appreciated! Either if someone could help me with a solution or just point me in the right direction in terms of what activitites I should use.
a rough check gave following:
with datascraping we have a good starter option to retrieve all cited by items by Publication numbers (including first 2 ones and the group of Family to family Citations)
However I would suggest to setup this retrival on a base of find children activity as it can be done more reliable.
in general we are interested on the div following to the h3: Cited By (x)
this div can be later scraped on a more detail level
On the first look I dont see any major blockers for a retrieval
In case of you need more help, so let us know. Happy automation
I tried using data scraping to get the citations, and it works in some cases.
See the attached file, it’s a working solution, at least for some patents. But if I try with another patent it doesn’t seem to work.
A patent which works (the one in the file):
A patent in which the solutions doesn’t work:
I rebuilt the data scarping activity based on the patent that didn’t work and the selector ended up being the same. The only difference was the “idx” parts where there was a minor difference. So I ended up removing the IDX-parts of the selectors. I then got results from both pages, but the problem was that it didn’t stop at the “Cited by” table, but kept going through all the links following it including “Similar Documents”.
Could you explain a bit more how I would go about using the “find children” activity? Would I go about finding all the children of the div in question? Meaning the rows in the table in the div? I’m thinking that such a solution would still require me to point out the div following the H3 in some was, which I don’t know how to do.
Or should I use the find children acitivty to find all divs in the page, and then somehow poiont out the div folliwing the H3?
I’m not really following you on how to apply the “find children activity”.
Sorry for the late reply, I’m in the middle of an exam week.
I want to start by thanking you for taking the time and giving me a solution to this problem. I would have been very unlikely to figure this out myself. Super appreciated!
When I opened the file, all of the “find children” activities came up as missing. Perhaps I did not have the same packages installed as you did (I’m on an academic alliance version of UiPath).
However by looking at the xaml file in notepad I was able to recreate the acitivities with the same parameters as you had used. This gave an extra advantage of me going through the logical strucutre in depth, which was very educational.
So big thanks for helping me solve the problem as well as giving me a lot of insights into how to do scraping based on relative elements!