Hi Team,
@Lahiru.Fernando @loginerror
I need to extract data from native, scanned, and Word and Excel documents, which involves different types depending on the format and content of the documents.
Fields Need to Extract ::
Invoice Number
Invoice Date
PO Number
PO Date
Total Amount
Is there an easy way to implement this use case?
Using DU, we can extract PDF documents, but what about Word and Excel files? I’ve been able to extract text using the “Read PDF” activities and “Read PDF with OCR” without DU, using regular expressions as well.
Do you have any suggestions for automating this type of task?
Thank you.
HI @Naveen_Chaganti
For the Word
-
You can Install the “UiPath.Word.Activities” Package
2.Use Read Text activity from the Word Activity.

-
Then based on the regex you can extract the data same way for pdf you have done
For the Excel File
1.Install the UiPath.Excel.Activities
2. Use the “Excel Application Scope” activity to open and work with Excel files.
3.You can use activities such as “Read Range” to extract data from specific Excel sheets.
4. Use the “Read Cell” activity to extract data from specific cells in an Excel sheet.
5.Then based on the regex you can extract the data same way you have done all
1 Like
Hi @Kartheek_Battu
Thank you for your response.
As I mentioned in my previous post, I am able to extract the elements separately based on each type of document. Since there are more than 100 formats, the keywords will vary for each document and vendor. Is there a better way to extract the data in this scenario?
We can cover the training for all types of documents under Document Understanding using ML extractor
1 Like
Yes we can go with ML extractor what ever the different format’s you have, you can do by training in ml for more PDF if you train the better output you will get even though the keywords will vary for each document , it will extract the data
based on the your documents variations you can train more so it will be better for your output.
@Naveen_Chaganti
@Praveen_Mudhiraj @Kartheek_Battu
Is the ML extractor capable of extracting data from both Word and Excel document types?
For better extracting data from Word and Excel documents in UiPath, you use other activities such as the “Read Text from Word” and Excel activities provided by UiPath’s Word and Excel packages. i’m not sure with the ml for that but i have used the ml but i have not tried with word or excel but we can go through the Read Text from Word" and Excel activities
@Naveen_Chaganti
@Naveen_Chaganti
UiPath’s ML extractor in Document Understanding is primarily designed for structured data extraction from documents like PDFs. For Word and Excel documents, it’s more common to use specialized activities like “Read Text” and “Read Range” to extract text content.