Hi..
I have a DU project which handles one PDF and single doc type called statements.Now there are certain pdf with multiple statements in 1 single pdf.There is a keyword that marks end of the file .
opt 1
one way to handle is pdf split using keyword [pdf activity] .
opt2
I see split doc option under classifier .Tried using it with intelligent keyword classifier and gave keyword as “summary”.but it couldn’t detect the file itself result was empty..
can someone help if du classifier will best suit my need ..if yes how can achieve it
@phoenixacademy,
I would suggest try opt 1 because you have a fixed rule to identify the statements and also it will save your AI units.
Hey @phoenixacademy,
You can handle multiple statements in a single PDF by using the Split Document option in the Data Extraction Scope with a proper classifier like Intelligent Keyword Classifier or Regex Based Classifier. Configure the keyword that marks the boundary, then map it in the taxonomy to split documents correctly. If splitting fails, first validate that the keyword is being reliably extracted by OCR or digitization.
Is it possible to retrain model build using Document understanding modern folder?
Hi @phoenixacademy
you cannot retrain models built with document understanding modern folder directly use split document by keyword to separate pdfs then build a new model with updated data apply du classifier after splitting to improve accuracy reduce errors and save time
If helpful, mark as solution. Happy automation with UiPath
Classifiers are meant to detect document types, not to physically split pages unless the classifier supports it.
The Split Document option in the Classify Document Scope works only if the classifier can recognize where each document begins/ends.
The Intelligent Keyword Classifier works page by page:
If a page contains a keyword, it classifies it as that document type.
But if your keyword is at the end of the statement, DU will classify that last page only — it will not automatically “group” previous pages until the next keyword.
That’s why your result was empty or incomplete — DU didn’t find a page that matched the keyword as a starting page.
Recommended:
Use PDF Split by Keyword activity before sending documents to DU