Extracting data(number) from different types of document - Digitize document

Mateusz_Koper · March 9, 2021, 5:29pm

Hi,
I want to build a process for extracting a specific tax number from different types of scanned documents.

What is the best approach for extracting the number from documents? the task is a bit complex due to the fact that in each document the number is placed in a different location, with a different label. I’ve used taxonomy to create document types, then digitize document, and classify to appropriate category based on the keyword. Right now only extracting part left, so any ideas, what would be the best option for extraction?

Thanks in advance

prasath17 · March 9, 2021, 6:46pm

@Mateusz_Koper - We can try Regex based Extractor to extract the Tax Number, if it follows certain pattern…Please see the below example…

Mateusz_Koper · March 11, 2021, 9:28am

Thanks, I’m already using regex
So how to present extracted data further, is it necessary to use activity “Present Validation station” and accept extracted data manually? instead of for example write everything to excel directly, and do not require any manual user intervention here?

Topic		Replies	Views
Data extraction using Taxonomy Studio studio , question , activities_panel	9	777	July 23, 2022
Extract data from a wide range of documents, including Native files, Scanned documents, as well as Word and Excel documents in over 100 different formats Studio studio , question , workflow_diff	8	354	September 21, 2023
Extract certain key words from multiple pdfs Activities pdf , activities , question	8	913	February 8, 2022
Extracting specific information from a variable of type IDocumentData Activities activities , question , document_understanding	3	109	November 19, 2024
Document Understanding Confused Activities mail , question	3	427	August 8, 2023

Extracting data(number) from different types of document - Digitize document

Related topics