Dear Forum Team, I am facing one issue. I have a multiple pdf files which contains the data in text as well as tabular format. the tabular data goes to multiple pages also for example bank statement. The page number also not fixed means the tabular data may be start with page no3 or may be page no …

Hi @anand.t You can convert pdf to word, and then grab the tables from it using “Index of the table”, each table will have a unique index value, In this scenario, if the table contains many pages is not an issue !! You can grab the Table easily !! Thank you

Hi @anand.t , You can try this approach in this thread: Convert PDF Datatable to Excel - Build - UiPath Community Forum If your table has multiple headers, this first approach may not work because the way the table is obtained read and values seperated using string manipulation. If it is a stand…

Thanks @jeevith and @Rakesh_Sampath I have 20 pages pdfs in which some pages has only text and some pages has data in tabular format. The page index of tabular format data is not fixed. For example suppose 1 to 5 pages is only text and 6-10 pages only tabular data. This index is not fixed. This i…

You can open any PDF in word. One thing you need to check before anything else is, if the PDF contains richtext or scanned data (images). You can only extract data if the PDF contains richtext using the mentioned approaches and not physical scanned/software scanned images as part of the pdf content…

Hello Anand, In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel and I have also exampels with multiple pages: [UiPath extract Tables from PDF (use case) (PDF table)] 45:50 File 10 PDF with multiple columns that have multiple lines + multiple page…

Thank you so very much putting this tutorial together. I watched it many times and learned a lot from applying the technique to different scenarios. My case is very similar to your case 15, but my process didn’t seem to pick up the correct rows with various case of 2nd and 3rd row. Here is a samp…

Dim rows As String() = strtmp.Split(New String() {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries) Dim patternRowStart As String = “ [1] \d{5}” Dim patternColumn2 As String = " - " Dim patternColumn3 As String = “-[^ ]” Dim currentRow As String = “” Dim col1 As String = “” Dim col2 A…

PDF Table extraction

Help Studio

anand.t (Anand T) January 23, 2021, 6:21am 6

Hi All,

The above approach fails in some of PDF files. Can I use document understanding for pulling the data from multiple pdf pages i.e. banking statement? Table page is not fixed? Is it possible with DU?? or go with ABBYY Flexi capture??

Need expert advice here.

See the example in my this thread.

Regards
Anand

Data Extraciton from PDF tables

Data table extraction by pdf

PDF Data Scraping Fail

Topic		Replies	Views
Multipage PDF Data Extraction Studio	4	1551	February 28, 2021
How to extract Multiple datatables from a PDF which contains multiple pages (Max 3 pages) AI Center question , document_understanding , ai_center , pdf-extraction	9	145	October 10, 2024
Parse PDF to retrieve tabular data Document Understanding datatable , pdf , form-extractor , pdf-extraction , generative-extractor , pdf-parshing	2	338	December 29, 2023
How to extract pdf table data in multiple pages Help studio , question	1	1130	February 28, 2021
Data Extraciton from PDF tables Automation Suite excel , uiautomation , robot , activities , question , pdf-extraction	6	1241	January 26, 2023

PDF Table extraction

Related topics