How to stop extraction if a line in a pdf form has no data

chlmy · November 10, 2020, 7:06am

How do I extract data from a form, in pdf format, provided that it is not empty and stop extraction if it is empty? For example, I want to extract the details of the person in the form and stop if it is empty.

Maria2na · November 10, 2020, 7:36am

When handling data from documents, there are various ways to get it:

Open the PDF and then scrape for the data - based on this you continue scrapping for the rest
Use read PDF - and then use regex to match the needed field - and based on that you continue extracting the rest of the fields
Use Document Understanding and handle from the results the data - seems that the document is a fixed form and FormExtractor will be very easy to configure

chlmy · November 10, 2020, 7:45am

Hi,

For method 1, I need to only scrape specific data such as Name, IC, Nationality, DOB and Address.
I am using method 2, without regex. I am using specific array numbers to extract the needed data.
For method 3, Document understanding has a limit on the total file size of the pdfs (My pdfs have 4 pages each) and I cannot extract empty fields.

Topic		Replies	Views
Screen scraping & empty text Help	2	1194	October 15, 2018
How to get Data from the file Activities pdf , activities , question	2	617	August 22, 2022
Extract empty field from PDF form Studio studio , question , activities_panel	6	1274	October 8, 2021
How to scan for fields on a PDF Help datatable , pdf , studio , abbyy	0	1197	August 22, 2018
Doubt in pdf automation Automation Ops question , automation_ops	47	501	February 8, 2024

How to stop extraction if a line in a pdf form has no data

Related topics