What pdf extraction approach would be for getting different product details from a PDF having multivalued fields?

I have a pdf containing different product details in a pdf. I have used get text from pdf and used REGEX to get product details like title, price, description.

Could you please let me know other approaches to get these fields and put it in an excel sheet?

Try using Document Understanding Model
because through normal pdf operations and activities, I don’t think that they would help much!

but there are no. of products available in no. of pages in the same pdf?
In document understaning, how to get all product title or any field by selecting 1 or 2?

Don’t create Regular fields, just create the Column fields and map in around 10 documents.

Apologies, I didn’t get it, can you share a way, a video?
So far I have used Taxonomy for some invoice projects available in RPA challenges(you might aware of).

@Aakash_Singh_Rawat I think @adiijaiin is talking about labelling documents in the document manager (that is inside action center). You can check the documentation here:


In this documentation you can see that are 2 types of fields that you cann label: Regular fields and column fields!

If you want you can also find some instructions here: Training UiPath Document Understanding ML Models - Data Manager - Part 1 | RPA - YouTube

I just found that these features are only available in Enterprise edition. Is the same way can be done from Studio only using form extractor (manage template) in data extraction scope when using Taxonomy approach?

@Aakash_Singh_Rawat You are right, action center is not available in community license. Document manager is more complete than the use of form extractor. Using action center also enable you to use the data labelled to train a machine learning model.

If you have an comercial email you can try the enterprise trial to check all possibilities.

