Extracting tables with varying number of items from pdf using Document Understanding

shrey.shah · March 11, 2022, 7:40am

I am using Document Understanding (Form Extractor) to extract tables from pdf files. The items in the table are varying for example some pdf’s have tables which contain 6 items:

whereas some pdf’s contain tables that have only 1 item.

So if I create a template based on the 6 item pdf file then for the file which has 1 item in the table it is not extracting properly:

4 items.xlsx (9.3 KB)

1 item.xlsx (9.0 KB)

In the above excel files, the “4 file” is the data extracted from the pdf file on which the template was created and hence it is extracting properly. But for the other pdf file which contains only 1 item, the extraction is not proper. Some of the headers are not extracted.

Any solution to this? Can I use Anchor in this or is it not possible?

thank you for your time and help!

suraj.setty · March 11, 2022, 7:44am

Hi @shrey.shah

Can you give a try using “Intelligent Form Extractor” and train the document having multiple line items (6 items in your case).

And check the output for both the files.

Thanks.

shrey.shah · March 11, 2022, 8:58am

@suraj.setty I tried with intelligent form extractor but it still extracts incorrectly!

suraj.setty · March 11, 2022, 9:02am

Hi,

Did you try with ML Extractor by providing the Api key and End Point.

Please find the link for endpoints

Thanks.

shrey.shah · March 11, 2022, 9:06am

@suraj.setty Hi as I mentioned in the question, all the pdf have multiple pages and the ML extractor has a limit of 2 pages and 4mb. So it wont accept pdf files with more than 2 pages

suraj.setty · March 11, 2022, 9:21am

Hi @shrey.shah

Yes its limited to Community Plan.

If possible you can Request for an “Enterprise Trial” and try to extract using ML extractor.

shrey.shah · March 11, 2022, 9:42am

I tried using the ML Extractor on an invoice which had only 1 page but there also it is giving the same problem @suraj.setty

suraj.setty · March 11, 2022, 1:10pm

If you have an enterprise plan you can go with Combination of AI Center and Document Understanding for accurate results.

shrey.shah · March 14, 2022, 4:32am

@suraj.setty AI Center is used for training custom ML models right? Do I really need the Enterprise version for that?

suraj.setty · March 14, 2022, 8:50am

Hi @shrey.shah

Yes Enterprise license is required to Train an ML model.

Thanks.

Topic		Replies	Views
Extract tables from pdf using anchor using Document Understanding in UiPath? RPA Discussions general	1	1439	March 14, 2022
Extract Varying Size PDF Using Document Understanding Action Center uiautomation , studio , question , document_understanding , action_center	2	773	February 2, 2023
Extracting table in PDF document dynamically Activities activities , question , document_understanding	11	1805	February 1, 2023
Table extraction from multiple pages in same pdf using intelligent form extractor template Studio studio , question , template	1	1285	March 11, 2022
Extract table from PDF - Document Understanding Studio studio , question , activities_panel	5	77	October 19, 2024

Most Active Users - Yesterday
ashokkarale
sonaliaggarwal47
Jon_Smith
Lynn_Song
sharazkm32
V_Roboto_V
Stef_99
Gulshan_Orujova
Aki1111
Hieronymus_Bot
More details...

Extracting tables with varying number of items from pdf using Document Understanding

Related topics