I have bunch of pdf and want to extract a table from the pdfs. But data scrapping not working.
when i am taking the data in text file headers are coming 3 times as there are 3 pages with headers in the pdf. How i can remove these headers?
i have to create separate excel file for every pdf.
Any approach that u can suggest.
What’s the issue when you say that data scraping isn’t working?
That means your datatable(in which you store extracted data) has the headers in it as rows because every page starts with that?
You can go ahead and create the datatable entirely by extracting data then you can use look-up datatable for that and add the value of the header of the datatable and you’ll get a row index returned, then simply remove that row using
You can have a logic like, run this lookup datatable sequence for a counter that is equal to the number of pages in PDF.