I am trying to scrape invoice PDFs using Read PDF activity and capture the required field values from that scraped data. Scraping works perfectly fine for any PDF type but the way data is scraped is not consistent.
See this Invoice.zip (12.2 KB) containing the pdf and it’s scraped data in a txt file.
For a multi-line field the text is sometimes scraped per field and sometimes it scrapes the first line of all fields and then scrapes the second.
| Unit \n Price | Total \n Amount |
Is scraped as:
and sometimes it is scraped as:
I wanted to know, if there is a way we can maintain consistency in the scraped data?