I’m working on a small project where I want bot to get details from bus, train or flight tickets. I need data like date, Departure, Arrival and amount.
The Challenge I am facing is the ticket format is different depends on the travels or how the ticket is being booked.
I’m getting correct output for only file format which I have added in the template of Form Extractor. for other files its not working as expected, I’m not getting any error in my workflow. But output is null.
Also in Present Validation station Bot is not identifying the document type. Even if it is able to classify the document correctly.
Kindly note I’m not using invoices here, I’m using the travel tickets.
At First Glance, we cannot make if there are a finite number of Ticket Formats, If there are a Finite Number, then maybe we could use Regex Extractor for each of the Ticket type present.
But if the Ticket formats are not fixed, meaning there can be n number of Ticket formats, Regex Extractor would not be so much of a Help in extraction unless there is always a Common Keywords for the values to Extract in all the different Tickets.
In this case, we would rely on the Document Understanding Model where we would have to Train the model with the Ticket formats available by using the Labelling feature. Then Generating the Dataset, Training the Model and Then Deploying it as an ML Skill and using it for Extraction in the ML Extractor.
Also, I do not think Invoices Model will be of use here. If you Could Provide us with few More details of the Ticket formats, maybe we could help you further. Mainly the Following Details :
Yeah, Regular Expression is not working as expected.
I’m currently using Digital documents only. May be in the future I might try for image format.
Keywords are not same in all the tickets. It is different for every format.
For eg. in Flight tickets the keywords are different in IndiGo and SpiceJet airline tickets. We have N number of airline/operators like this. This is the challenge I’m facing.
Can you please provide me some extra information about training the dataset and provide it as a ML skill in ML Extractor, Any reference?
Option 1 - we can use form extractor.
Before form extractor you need to classify the document types using classification trainer and Classify document scope. Next you define both templates in form extractor.
Option 2- you can train the documents in AI center. for this you can use the document understanding OOTB model
I think given your scope of work you can try classify document scope with data extraction scope using the form extractor.