Native scraping method for PDF

Indiroy · February 5, 2019, 10:02am

Hi there!
I’m trying to get information from PDF file.

“Read PDF” works, but the problem s that it gives me unstructured data (mix of numbers and letters)
“Get text” captures by block, not by specific field (which is not helpful for me)
Screen scraper’s:
“full text” works as read “red PDF”, no structure, mixes number and letters
“Microsoft OCR” does not recognize text
“Google OCR” recognize badly
“NAtive” recognizes very well, however, when im trying to output data through “message box” or “write line” it gives me an empty field.

The question is how can I output data through native scraping method from PDF file.

P.S. on some forum topics I saw Regex and split on spaces, what are those methods?

sandipauti · February 5, 2019, 10:16am

@Indiroy - you can use image scrapping method to scarp data from PDF

PAD · February 5, 2019, 10:18am

Hi @Indiroy
You have addressed quite a few issues here. Just to answer a part of your question from post scriptum, a regex (regular expression) is a sequence of characters that define a search pattern (usually this pattern is used by string searching algorithms for “find” or “find and replace” operations on strings, or for input validation (quoting after Wikipedia). In practice, you can find out more and test this e.g. on the websites like the ones below (but surely, you can google much much more than that):

One of the applications of regex can be found e.g. in here:

Topic		Replies	Views
Scrape Text from Scanned PDF Help pdf , activities , data_scraping , question	11	3017	November 18, 2019
How to extract specific text from PDF Certification studio	10	4381	July 13, 2020
PDF extraction issue Something Else feedback	2	728	March 25, 2021
PDF Scrapping get data from PDF Studio pdf , studio , question , landing_screen , pdf-extraction	1	169	March 20, 2024
Read specific data from scanned PDF using Regex Studio activities , regex , question , intelligent_ocr	1	965	March 12, 2020

Native scraping method for PDF

Related topics