I have html file (consider it as invoice file) from where i need to extract lot of information like, company code, company name, payment type, date etc. I am able to extract the data but it is taking lot of time to extract data from one file. Can you please suggest best way we can extract the so that extraction is faster.
Hi, Please see the screen below : the one in green we need to extract. FYI it is kind of structured data (if you closely observe). Assume this file containing 150 such data.
If the selectors have the Table/TD/ Table Column or Table row attribute then those can be incremented by passing dynamic selector and the value can be gotten if scraping is taking too much time.
So no need to inspect element and find the DomPath either.
By seeing the kind of data is only the text it must be fairly quick/fast.
I have a similar question but maybe slightly different. I am processing html files that reside on my computer. I need to extract data that resides in a grid within the file. That being said, I’m looking for the best way to pull this out. Should I use webscraping? Should I try and iterate through the source HTML (60K lines)? This grid is a small subset of the information within the html document so once I am done pulling that data, I would like the end processing.