Hey everyone!
I am trying to build a process that:
- Takes 10Ks filings PDF files which can have anywhere between 50-500+ pages
- Split these PDFs into smaller files up to 100 pages
- For each split file, use Digitize Document with a For Each to run through each page searching keywords
- For pages where the keywords are found, Extract PDF Page Range to get a 2 page document only and run it through Extract Document Data (with Generative extractor) to grab the paragraph needed
So, for the first file has 293 pages, which I split into 2 files, one from 1-150 and another from 151-293. The 1-150 file processes well, but in the second one, the Digitize Document is throwing the following error: “Digitize Document: 890 0 R : Indirect reference was not dereferenced.”
The “890” number changes depending on how I split the pdf (150 max pages, 100 max pages, etc).
I’ve tried a bunch of other ways to process these documents, but I can’t find the root cause of this problem. Any ideas?
expectionDetails:
*RemoteException wrapping System.InvalidOperationException: 890 0 R : Indirect reference was not dereferenced. *
- at .()*
- at .( ,*
Predicate`1 ) - at .[]( ,*
Predicate`1 ) - at .( ,*
) - at .( ,*
) - at .(Int32 ,*
) - at .(Int32 ,*
) - at UiPath.DocumentUnderstanding.Digitizer.Pdf.Docotic.DocoticPdfDocument.<>c__DisplayClass25_0.b__0()*
- at UiPath.DocumentUnderstanding.Digitizer.Pdf.Docotic.DocoticExceptionHelper.HandleDocoticExceptions[T](Func`1 func)*
- at UiPath.DocumentUnderstanding.Digitizer.Digitization.Preprocessing.PdfDigitizationDocument.GetPageStreamThreadSafe(Int32 pageNumber)*
- at UiPath.DocumentUnderstanding.Digitizer.Digitization.Preprocessing.PdfDigitizationDocument.GetPage(Int32 pageNumber)*
- at UiPath.IntelligentOCR.Activities.Digitization.DigitizationActivityScheduler.ScheduleProcessingTask[T](Func`1 func,*
CancellationToken token) - at UiPath.DocumentUnderstanding.Digitizer.Digitization.PageDigitizer.ProcessPage(IDigitizationDocument digitizationDocument,*
Int32 pageNumber,
IOcrEngine ocrEngine,
Boolean shouldApplyOcr,
DigitizationSettings settings,
String contentId,
CancellationTokenSource source) - at UiPath.DocumentUnderstanding.Digitizer.Digitization.DocumentDigitizer.GetPages(Content content,*
DigitizationSettings settings,
IOcrEngine ocrEngine,
CancellationToken token) - at UiPath.DocumentUnderstanding.Digitizer.Digitization.DocumentDigitizer.Digitize(Content content,*
DigitizationSettings settings,
IOcrEngine ocrEngine,
CancellationToken token) - at UiPath.IntelligentOCR.Digitization.IntelligentOcrDigitizer.Digitize(Content content,*
IOcrEngine ocrEngine,
ApplyOcrOnPdf applyOcrOnPdf,
Boolean detectCheckboxes,
IDigitizationScheduler scheduler,
IDigitizerTelemetryService telemetryService,
CancellationToken token) - at UiPath.IntelligentOCR.Activities.Digitization.DigitizeDocument.ExecuteAsync(NativeActivityContext context,*
CancellationToken cancellationToken) - at UiPath.Shared.Activities.AsyncTaskNativeImplementation.BookmarkResumptionCallback(NativeActivityContext context,*
Object value) - at UiPath.Shared.Activities.AsyncTaskNativeActivity.BookmarkResumptionCallback(NativeActivityContext context,*
Bookmark bookmark,
Object value) - at System.Activities.Runtime.BookmarkCallbackWrapper.Invoke(NativeActivityContext context,*
Bookmark bookmark,
Object value) - at System.Activities.Runtime.BookmarkWorkItem.Execute(ActivityExecutor executor,*
BookmarkManager bookmarkManager)