When processing multiple document types in the same execution using document understanding framework triggered by queue what is a good approach when one of the document fails but others are successfully processed?
One possible solution that came to my mind was to send only the part of document that fail to reprocess in a new queue item, but Im not sure if this is the best solution in high volume cases and also for the mantainence and process audit.
I didn’t find use cases in the documentation, so if anyone face this situation it would be really good to understand how it can be!
To start with when something fails because of system it is always good to retry before marking exception…
So ideally adding it back to queue is good to re processes it…
Again …as AI center involves sku’s/AI units which get consumed…if we are extracting information from multiple documents in same transaction and bot failed after extracting one or two and if the exception is unknown we can retry…but to maintain it…it would be good…if the process can be continued or reuse the document data already extracted instead of extracting it again…which saves the AI units…generally we cheive it by saving the extracted data immediately to a shared location so that we can first check if some data is extracted or not and then proceed accordingly
We had the same scenario in one of the projects where we had to deal with 60+ document types, but these are huge files.
In our case, we programed the bot to deal with failures differently. Actually it depends on the failure. For example,
If its a failure that occurred trying to connect with the ML model, we retried that activity.
If it is a failure in digitizing, depending on the error, we did a retry on the activity
We also checked whether the error is due to a issue in the logic (unhandled exception) or whether it is because of something on the file (corrupted data or file) and retry according to that.
So you basically need to identify the scenarios which you need to do a retry and attempt for those, while rest of the things can go as alerts to business users
I think I understood the part of retrying mapped situations, but the unhandled is not very clear.
In this case (unhandled) how are you going to reprocess just the document type that fail? The business are need to submit again only this small part that fail after you understand what caused this exception? I think this is the confusion in my mind.
Well, if it is failing because the file is corrupted or there is some issue with the file, then yes, the business team needs to submit the fixed file again.
However, if it is just a specific part of the file failing → we need to see what is causing the failure. Ideally, it should not fail like that after you have the classification and the extraction data unless it’s some unhandled issue in the code.
Do you have a specific issue that happened and trying to resolve?
I don’t have an specific issue yet because I’m in developing phase so I’m just trying to understand how to do this part of handling exception since It’s the first time I’m working with different types of documents.
My big difficult is with unhandled situation, for example something in the logic that is not attending and it could not be mapped before. This kind of situation is common mostly when you launch the project.
So it’s not clear how I’m going to reprocess only the small part that fail minimizing costs. For example if I already extract the data and it fails in post processing (unhandled BRE for example) it would be not necessary to extract again because I should already have it right? I just need to reprocess and make the specific part go from reprocess until the end.
Not sure if I’m explaining in a clear way, tell me if you don’t get what I’m trying to say.
Agree, and this can happen, and had happened many times on my project too. We had an approach like this…
For the items that gets classified, and extracted, but fails to move into post-processing for some reason -
We maintained a log table with all the extracted information. We had a separate simple process to handle all the items that failed due to unhandled exceptions.
Note: We used Queues for each document to process
We identified all the failed Queue items and got the data from the log table, and performed the rest of the things that did not happen, and pushed it to post-processing step. The post-processing stage get the data from another Queue. The Queue contains all the information extracted from the document.
If some error occurs in post-processing → we retry the queue item that is feeding the data to that process. However, if it is keeping on failing due to application exception, we fix that unhandled exception, and re-do the queue item. So it is just the post-processing job that gets executed and not the DU part.
So basically we have an architecture that has three levels.
Post-processing - collecting documents and feeding into a Queue
DU workflow - Use the Queue, and classify and extract the data from the document mentioned in the queue. The data goes and updates in another Queue
Post processing - use the second queue, and perform post-processing activities such as updating software applications etc.
So we can easily even build supporting processes around this to handle unhandled exceptions and process those through the original workflow.
Thanks @Lahiru.Fernando , I understand your approach. Its very similar to @Anil_G ‘s idea to have the extracted data storaged so you can use it later if you need to reprocess. This way you avoid costs of reprocessing generated by unhandled errors.
All this discussion made me think about why in DU we process a big document with different document types in the same execution instead of processing one by one as a transaction item. This scenario is possible but then we would have a lot of jobs running at same time. But seems easier to handle expetions in this scenario than with a big document.
A big document also creates a complexity approach to classification sometimes