Document Understanding - Issue with the incoming input files

Hi,

I am processing the input file of same format using Document Understanding.

But, at times getting the inputfile with the blank page in between the pages (of actual format).

Example: actual input file has 4 pages. But at times receiving 5 pages with Blank page 2.

Then, fetching the wrong data.

How can this be fixed.

Thanks.

Hi @krishbcd ,

Could you let us know what are the extractors used ?

Also, Is there a classifier used ?

Intelligent Keyword Classifier and Form Extractor.

In the extraction using input file with 6 pages and providing each page matching info.

But the second file has a blank page at page 2. So, it has total 7 pages, and not extracting data rightly.

But when I change extractor with 7 pages working for 2nd file. then first file with 6 pages not extracting properly.

Is there any way to use anchor based instead page number matching info.

@krishbcd

one way is to preprocess the pdf before sending it for extraction

you can use a for loop and read each page of pdf and check if any page is empty then remove it and then send the non empty pages only to the extraction…so that always extraction will not be effected

to check firt usepdf page count to ge count and then loop through usign Enumerable.Range(1,pdfcount).ToArray

use read pdf with specific page and pass currentitem.ToString as input

Say output is stored in str then use if condition with str.Trim.Equals(String.Empty)

on then side save th page number to a list say list1 of type string list use append item to list and ad currentitem.tostring

after the loop use extract pdf and give the range as String.Join(",",list1)

this will extract the pdf with pages only with data

cheers