Problem with scraping after first PDF processed

Hello,

I have multiple PDFs and im gathering data from tables in those PDFs. PDFs are very smilirar but no the same, espiecialy selectors, for example on one PDF table have index selector:

on another pdf it`s just table without any index:
<role=‘table’ />

Anyway thats not the problem as i have done if condition with element exist which is check if "idx=2" exist and this is fine. I have wrote above as maybe you can point me to the better solution ( still i didnt tested all PDFs and there may more conditions needed )

The problem is that when im scraping first PDF - thats all fine, data is scrapped correctly, but when its opening and scrapping next PDF its just getting messed up, for example its selecting whole document or its scrapping table, but collumn order is reversed ( first collumn goes as last, last as first ) - i can not invent any logic to it when the scrapping is unpredictable. Could you guys help me please to find solution? The tables have always the same headers and number of columns, but i dont know if i can use any anchors to it? Scraping is working fine but just for first PDF (no matter which one), next ones are the problem. Example

PDFs A and B
Scraped first: A - Scraping Correctly
Scraped second: B - Scraping wrongly

Scraped first: B - Scraping Correctly
Scraped second: A - Scraping wrongly

Thank you in advance community!

Without knowing details I can only guess…

It could be linked to variables initialization: When you scrappe the first PDF all variables are empty but when you scrape the second it contain values from previous scrapping.

Cheers

Hello @AndRewDev,

Maybe you should use Regex code to get the data from PDFs. It is more dynamic, and it is more manageable :slight_smile:

You read the PDF, save it as text variable, and after that you can extract exact data from the PDF.

You can use this website to test the Regex Code if you was interested in:

Regards! :slight_smile:

HI @AndRewDev

As @Angel_Llull suggestion regex will be more helpful for you to extract the data dynamically.

Just an example look into the link

Regards
Sudharsan

Thank you guys! Unfortunately i tried regex but output is with not pattern.

The problem with scraping was the datatable which i had to delete all columns after loop(single pdf read)

Problem solved, Thank you!

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.