Trimming of string from a webtable

Hello Everyone,
Need a clarification. I’m trying to extract particular model type component details from portal: MANN-FILTER Online Catalog Europe - Vehicles Air Filter Oil Filter Fuel Filter Cabin Filter OFF-HIGHWAY APPLICATIONS AG-CHEM (AGCO) Ag/Go-Gator 1004. Using native citrix recording options able to extract the below text:
Model type Engine code kW HP Year of manufacture
1004 504 V8
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 29 939 use PFU 19 226 use WK /16 x for for x 950WD 940/2
OE for no. OE
702455 no. OE 600776 no. 600393 Hydraulics"
Thereafter trying to find out a particular pattern to apply trim/split function for extracting the below model type table details.
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
C 29 939 PFU 19 226x WK 950/16 x WD 940/2
C 17 149 W1294 WA 923/3
Using Assign activity I’ve tried to use Div.ToString().Split(“use”.ToArray,StringSplitOptions.RemoveEmptyEntries) function to remove the “use” string from the extracted values but it’s giving the below result.
“Model type Engine code kW HP Year of manufacture
1004 504 V8
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 29 939 use PFU 19 226 use WK /16 x for for x 950WD 940/2
OE for no. OE
702455 no. OE 600776 no. 600393 Hydraulics”
Using the above function getting the below details.
“Model type Engine code kW HP Year of manufacture
1004 504 V8
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 29 939 use PFU 19 226 use WK /16 x for for x 950WD 940/2
OE for no. OE
702455 no. OE 600776 no. 600393 Hydraulics” “Model type Engine code kW HP Year of manufacture
1004 378 V6
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 21 317 use PFU 19 226 use WK /16 x for for x 950WD 940/2
OE for no. OE
720258 no. OE 600776 no. 600393 Hydraulics” “Model type Engine code kW HP Year of manufacture
3004 Cummins 555 1985 →
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
C 29 939 WP 1270 WP 962/3 x
Secondary flow” “Model type Engine code kW HP Year of manufacture
3004 Cummins V903C 1985 →
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
C 29 939 W 1294 WP 962/3 x
Primary flow” “Model type Engine code kW HP Year of manufacture
1254 Cummins QSB5.9 205 280 2000 →
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
PU 9006 x
W 950/18 WH 980/3
Hydraulics” “Model type Engine code kW HP Year of manufacture
663 (Sprayer) CUMMINS 6BTA5.9
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 1281 W 950/18 723 for use use WK WD 13 006 x
OE for for no. OE
609374 no. OE 608990 no. 709935 Hydraulics” “Model type Engine code kW HP Year of manufacture
664 (Sprayer) CUMMINS 6BTA5.9
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 1281 W 950/18 723 for use use WK WD 13 006 x
OE for for no. OE
609374 no. OE 608990 no. 709935 Hydraulics” “Model type Engine code kW HP Year of manufacture
664 (Sprayer) CUMMINS 6BTA5.9
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 1281 W 950/18 723 for use use WK WD 13 006 x
OE for for no. OE
609374 no. OE 608990 no. 709935 Hydraulics”
Appreciate your help regarding the same.
Thanks in advance.

Hi @rayabhijit2017,
Check out the regex (Regular Expression) function. It might be easier way to work with.

1 Like

Hi Pawel Wozniak
Thanks for your response. Ok sure.

1 Like

Hello Pawel Wozniak,

I’ve tried to implement regex function Regex.Replace(Div.ToString,"[string]","") pattern to remove the unnecessary strings from the below extracted Div. But along with replacing the particular string it’s also removing the matching letters from the extracted text.

 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
C 29 939 W 1294 WP 962/3 x
Primary
Accessories flow Accessories
C 17 149
Accessories

Please let me know if you have any idea about the regex pattern for finding only the values corresponding to  Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters tabular data or any other way to get the required data.
Thanks in advance.

hi @rayabhijit2017 - If you could you give the exact string(entire text) and what needs to be extracted…i can try in regex…

hello prasath17
Ok sure. Thanks for your response. Sorry for late reply. Please find below couple of sample screen scrapped text extracted from the https://catalog.mann-filter.com/EU/eng/vehicle/MANN-FILTER%20Katalog%20Europa/Vehicles/OFF-HIGHWAY%20APPLICATIONS/AG-CHEM%20(AGCO)/Ag~Go-Gator/1004%20378%20V6%20(T00000000830709) portal for different model type Air filter, Oil Filter, Fuel Filter, Cabin Filter, Other filters.

Model type Engine code kW HP Year of manufacture
1274C (Sprayer) Caterpillar C9
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
WD 13 008 x
Hydraulics
VIN No. Add to favorites Feedback Back

Model type Engine code kW HP Year of manufacture
1004 504 V8
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 29 939 for use PFU 19 226
OE for x use WK 950/16 x WD 940/2 no. OE for OE use
702455 no. 600776 no. 600393 Hydraulics for OE no. 700266
Accessories

Model type Engine code kW HP Year of manufacture
1004 378 V6
 Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters
use C 21 317 for use PFU 19 226
OE for x use WK 950/16 x WD 940/2 for no. OE use
720258 no. OE 600776 no. 600393 Hydraulics for OE no. 700266

From the above screen scrapped we need to extract only Air filter, Oil Filter, Fuel Filter, Cabin Filter, Other filters related component name details e.g. WD 13 008 x , C 29 939, PFU 19 226 etc. Please let me know if you have any questions. Thanks in advance.

@rayabhijit2017 - Sorry for the delayed response… I gave try and then later found that extracted values can’t be mapped to the right filters…

Example: In the first set, If i extract WD 13 008 x which actually corresponds to “Other Filters” as per the website right? because other Filters are actually empty in this case…Similarly for each case some column values are spaces which makes Regex to complex/impossible

hello prasath17
No issues. Thanks for trying it out. Yes, right. A Regex function which can extract the strings post  Air Filter  Oil Filter  Fuel Filter  Cabin Filter  Other filters strings to match with the WD/C/PFU/WK similar kind of string along with space string (received from screen scrapping) then it might be possible. Any ideas?

@rayabhijit2017, a more reliable way of scraping this table, is by using the FindChildren activity.
Use this selector which uniquely identifies the result table:

  • html app=‘chrome.exe’ title=‘MANN-FILTER Online Catalog Europe - Vehicles Air Filter Oil Filter Fuel Filter Cabin Filter OFF-HIGHWAY APPLICATIONS AG-CHEM (AGCO) Ag/Go-Gator 1004’ />
  • webctrl tag=‘DIV’ class=‘partsTable’ />
    FindChildren will return a set of UiElements (the columns) that can be passed to GetVisibleText activity which will scrape each column’s text. Then, you can easily parse the text and don’t need complex Regex logic.
    Or you can use DataScraping feature and get the scraping results into a StructuredDataTable that you can further use.

Hi gheorghestanGheorghe
Thanks for your response. As suggested I’ve tried to pass the find children UiElements using For Each activity followed by Get Attribute activity with “aaname” and GetVisibleText with each element values in each column (which needs to be stored into an excel along with header values) but getting messages like "Unable to cast object of type ‘UiPath.Core.UiElement’ to type ‘System.Collections.Generic.IEnumerable`1[UiPath.Core.UiElement]’.
Also tried with data scrapping feature to store them into a datatable and used If condition to ignore the header values but facing issues in indexing the table header elements to 0 instead of 1.
Please let me know if I’m missing anything.

Hi All,

Using Extract structured text and dtAirFilter.Rows(1).Item(“Air Filter”).ToString Log message activity I’m able to get the below details:
Air Filter
C 29 939
use for OE no. 702455
Accessories
C 17 149
use for OE no. 702463
Accessories
But I’m only looking for the Air Filter model type below text details.
Air Filter
C 29 939
C 17 149

Any ideas on how to approach this? Thanks in advance!