Problem with scraping after first PDF processed

AndRewDev · October 15, 2021, 7:00am

Hello,

I have multiple PDFs and im gathering data from tables in those PDFs. PDFs are very smilirar but no the same, espiecialy selectors, for example on one PDF table have index selector:

on another pdf it`s just table without any index:
<role=‘table’ />

Anyway thats not the problem as i have done if condition with element exist which is check if "idx=2" exist and this is fine. I have wrote above as maybe you can point me to the better solution ( still i didnt tested all PDFs and there may more conditions needed )

The problem is that when im scraping first PDF - thats all fine, data is scrapped correctly, but when its opening and scrapping next PDF its just getting messed up, for example its selecting whole document or its scrapping table, but collumn order is reversed ( first collumn goes as last, last as first ) - i can not invent any logic to it when the scrapping is unpredictable. Could you guys help me please to find solution? The tables have always the same headers and number of columns, but i dont know if i can use any anchors to it? Scraping is working fine but just for first PDF (no matter which one), next ones are the problem. Example

PDFs A and B
Scraped first: A - Scraping Correctly
Scraped second: B - Scraping wrongly

Scraped first: B - Scraping Correctly
Scraped second: A - Scraping wrongly

Thank you in advance community!

J0ska · October 15, 2021, 7:14am

Without knowing details I can only guess…

It could be linked to variables initialization: When you scrappe the first PDF all variables are empty but when you scrape the second it contain values from previous scrapping.

Cheers

Angel_Llull · October 15, 2021, 9:38am

Hello @AndRewDev,

Maybe you should use Regex code to get the data from PDFs. It is more dynamic, and it is more manageable

You read the PDF, save it as text variable, and after that you can extract exact data from the PDF.

You can use this website to test the Regex Code if you was interested in:

Regards!

Sudharsan_Ka · October 15, 2021, 9:55am

HI @AndRewDev

As @Angel_Llull suggestion regex will be more helpful for you to extract the data dynamically.

Just an example look into the link

Regards
Sudharsan

AndRewDev · October 15, 2021, 9:58am

Thank you guys! Unfortunately i tried regex but output is with not pattern.

The problem with scraping was the datatable which i had to delete all columns after loop(single pdf read)

Problem solved, Thank you!

system · October 18, 2021, 9:58am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PDF Data Scrapping issue Help pdf , activities , data_scraping , error	5	1193	November 30, 2019
Problem with data scraping in PDF Activities pdf , activities , data_scraping , question	5	1320	October 18, 2021
Data scraping on an inconsistent PDF Studio pdf , question	2	963	June 22, 2020
PDF is Incorrectly scraped using Get Full Text Activity Help activities , studio	2	1047	January 31, 2019
Need to scrap the more number of data in pdf Help studio	14	994	September 25, 2019

Problem with scraping after first PDF processed

Related topics