Hello,
first of, the challenge is also to understand if the pdf is standard or not… so here are a few warnings before executing…
- if you can get the unit number from elsewhere use that one.
- I had to save the PDF as pdf (weird right) this was in order not to lose the data where I was able to determine the checkbox status.
3 this has to be opened in Chrome. because I have no pdf client in my pc… lol
4 Perhaps you need to re scrap the area.
How did I came to a conclusion, well this is how.
I screen scrap the pdf in chrome, this is the result.
El START I] STOP VERlFY
START [3 STOP :1 VERlFY
I] START STOP C] VERlFY
[j START [1 STOP VERlFY
START I: STOP [1 VERlFY
[I START [:1 STOP C] VERlFY
Cl START I: STOP I: VERlFY
START III STOP CI VERlFY
El START I] STOP :1 VERlFY
El START [3 STOP El VERlFY
So if you take a closer look every checkbox marked, does not show any weird string, in other words the one preced for the posible conditions(start , stop, verify) is the one that is checked… please compare the list with the actual dummy pdf @badita uploaded.
From there it was easy split the entire text into lines, then each line it has to be splitted using " " and then check if the length of the string has 5 or 6, for instance the lenght is 6 means none of the boxes was selected otherwise one was selected.
so if start was selected no string is before the actual word, so the split array with the word “start” selected will be 0, if “stop” was selected this means that the array in this line is 2 (because a weird string is 0 and start is 1 and stop is 2) and it has to be stop. and so on.
the rest of building a data table and adding row, append activity the reader can find information here in the forum.
this is the result:
hope this helps,
13.RPA Challenge - PDF Scrapping.zip (1.0 MB)