Not working in pdf?

Hello everyone. For testing purposes, I created a document mimicking a receipt using Writer. Then, I exported it to PDF on the computer. When I perform du on this document, I do not obtain any extracted values, although the du classifies this document correctly. I am using regex as the extractor. In the present validation station, it does not show any extracted data. After several tests, should the creation of a PDF document be done using a specific program? Thank you.


Let me know if you need any further assistance!

Hi @Antonio_Campos1 ,

Could you maybe check the document text output received after digitization ? If it contains data, it would mean that the data is obtained but it could be an issue with how the regex expressions are added or how the extractor is setup.

Could you provide us with the configuration done in the Data Extraction Scope activity ? Also the Regex Expressions used ?

If possible, do also provide the extracted text, so that we could evaluate the regex for it.

thank you!!!
so … receipt is bellow
receipt.pdf (27.8 KB)
document text after digitalization:
saída.txt (507 Bytes)

classification is ok
Captura de ecrã 2024-07-04 113547

data extration scope


Regex extrator (based on chatgpt4o and test in edit

present validation station

:frowning:
thank you so much!!

but working…
why?


because as far as i think, there must be a indication of capture in edit when construting the regex. it’s what i have done and now…

thank you so much!!!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.