We are running a PoC extracting data from unstructured documents using IXP. All PDF’s average 6 pages each. It seems to be going well but we are looking for benchmarks for % of docs with all fields accurate and % of all targeted fields accurate. Our testing produces 77% and 93% respectively. What is your experience?
If you are using LLMs for extraction, try different models to see if that increases the accuracy. We observed Gemini giving the best results.