Need support for PDF Data extraction

hello, i have create two separate robots to extract data from PDF invoices. each one extract data from an invoice with different language. each one work fine . but when i try to extract the data from all invoices at once in one robot i get this massage.


i need to fix this as i get invoices with different language.
Thanks

Hi @mohamed.saty2012 ,

I believe it might because not all of the languages are taken into consideration.

Could you let us know what was the configuration done in separate workflows to handle different language ? Or what was the method used for the Extraction ?

i am using Regex. check the workflow
Main.xaml (16.8 KB)

@mohamed.saty2012 ,

Could you let us know if you have checked the Regex Expression for Invoices of Different languages as well or was it only checked with one Language ?

Was a Similar Datatable created for the other workflow/Bot with different regex ?

yes, as i mentioned before i have created two separate workflow one for English and one for Arabic both are work fine. but not together. all Regex have been checked.

@mohamed.saty2012 ,

Could you maybe check whether adding the If Condition activity like below would help (Could not test as there were no data) :
image

I believe the error was happening since the regex used for Arabic invoice don’t result in any matches when English Invoice is passed/used and vice versa. So maybe a handling like stated above would be needed.

can i send to you two different invoices to check?

@mohamed.saty2012 ,

Yes. If not confidential, you can send here.

Egyptian Holand.pdf (61.0 KB)
Invoice3.pdf (54.0 KB)
the first one is Arabic the second is English. Thanks for your great support i am really appreciate it

@mohamed.saty2012 ,

When checked with your Input PDF data provided, we get the Output in the below manner :
image

this is great can you also add another result as the print screen


this is the output file
output.txt (2.8 KB)

also can i know what was missing in my sequence

@mohamed.saty2012 ,

As already provided screenshot before, I have added the If Activity /Condition to check if there are any matches with the associated regex expression for the document. If present, then I am adding the matched value, if not, I am assigning it an Empty string.
image

For the new Field value to be Extracted, I have checked found the below regex to be working :

.*(?=\s?اجمالي المبلغ \(ج\.م\))

Updated in the Build datatable :

Also added the New Column in Output DT :
image

Added also the value in Add Data Row activity :

Thanks Bro. Can you send the work flow to me please.:pray::heart:

@mohamed.saty2012 ,

Check the below updated workflow :
Main (1).xaml (20.3 KB)

Would encourage you to check the Screenshots and correct your workflow accordingly as some of the times the packages version used in my environment may cause conflicts in your environment.

Hello,

The error message you’re encountering when trying to extract data from invoices with different languages in a single robot is likely related to the language-specific settings or configurations within your robot. To resolve this issue and enable your robot to handle multiple languages, you can consider the following steps:

  1. Language Detection: Make sure your data extraction process includes language detection for each invoice. This can help your robot determine the language of the invoice and apply the appropriate language-specific rules and configurations.
  2. Multilingual OCR: If your invoices contain text in various languages, you should use Optical Character Recognition (OCR) software that supports multiple languages. OCR tools like Tesseract are capable of recognizing text in various languages. Ensure that your robot uses an OCR engine that’s equipped to handle the languages present in your invoices.
  3. Language-Specific Rules: Define language-specific extraction rules and configurations. For example, create separate templates or rules for each language that your invoices might be in. These rules should account for variations in layout and language-specific data extraction patterns.
  4. Conditional Processing: In your robot, implement conditional logic that checks the language of each invoice and applies the corresponding language-specific rule set. This allows your robot to adapt to different languages on the fly.
  5. Testing and Validation: Test your robot with invoices in various languages to ensure it correctly identifies the language, applies the appropriate rules, and extracts the data accurately.
  6. Error Handling: Implement error handling and logging to identify and address any issues that may arise during the data extraction process.

By following these steps, you should be able to create a more versatile and adaptable robot that can handle invoices with different languages effectively. If you encounter any specific issues or error messages, please provide more details, and I’d be happy to offer further assistance.

Thanks for your support.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.