Not working in pdf?

Antonio_Campos1 · July 4, 2024, 9:36am

Hello everyone. For testing purposes, I created a document mimicking a receipt using Writer. Then, I exported it to PDF on the computer. When I perform du on this document, I do not obtain any extracted values, although the du classifies this document correctly. I am using regex as the extractor. In the present validation station, it does not show any extracted data. After several tests, should the creation of a PDF document be done using a specific program? Thank you.

Let me know if you need any further assistance!

supermanPunch · July 4, 2024, 9:49am

Hi @Antonio_Campos1 ,

Could you maybe check the document text output received after digitization ? If it contains data, it would mean that the data is obtained but it could be an issue with how the regex expressions are added or how the extractor is setup.

Could you provide us with the configuration done in the Data Extraction Scope activity ? Also the Regex Expressions used ?

If possible, do also provide the extracted text, so that we could evaluate the regex for it.

Antonio_Campos1 · July 4, 2024, 10:44am

thank you!!!
so … receipt is bellow
receipt.pdf (27.8 KB)
document text after digitalization:
saída.txt (507 Bytes)

classification is ok
Captura de ecrã 2024-07-04 113547

data extration scope

Regex extrator (based on chatgpt4o and test in edit

present validation station

thank you so much!!

Antonio_Campos1 · July 4, 2024, 11:12am

but working…
why?

because as far as i think, there must be a indication of capture in edit when construting the regex. it’s what i have done and now…

thank you so much!!!

system · July 7, 2024, 11:13am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Document understand field extraction issue Activities question , document_understanding	2	871	September 2, 2021
Getting error while creating Template in Form Extractor (Document Understanding) Activities activities , question , document_understanding , form-extractor , extractors	1	177	May 20, 2024
Data Extraction, validationa and verification using Document Understanding Studio uiautomation	6	1624	April 14, 2022
If data is not able to extract properly from pdf can we indicate that non extracted using some text in that pdf itself Document Understanding studio , question , activities_panel	3	495	February 14, 2023
Document Understanding data not getting extracted Activities excel , uiautomation , studio	5	377	November 17, 2023

Not working in pdf?

Related topics