I have 20 input files all are purchase order but in different file type such as few PO are in JPG file and few are scanned PDF and few Image PDF files.
Do i need to classify into three types in taxonomy? all files are mostly semi-structured.
Also i want to confirm taxonomy classification is based on content type or file extension type I am so confused.
When i tried to classify using Intelligent keyword classifier for few files i am getting below error.
If you have all your documents of Purchase order then classification can be skipped
For extraction use ML Extractor
Here you can get the endpoints
In the taxonomy you have to create on one type for eg
Group → Semi-Structured
Categories → Finance it depends on you what you put)
Document type → Purchase Orders and then you add fields that you want to extract
For the file extension
This are the files that are supported in DU and you don’t have to mention file type anywhere .png , .gif , .jpe , .jpg , .jpeg , .tiff , .tif , .bmp , and .pdf .