DOCUMENT UNDERSTANDING SPLIT

  1. I have a PDF file whee there is a table in fixed format. From there I have to extract two rows identified by COlumn Value which is static. I am using ML Extractor. The challenge is everytime I am extracting the table and then extracting the two values. The position of those tow fields in the table is not fixed. Is there a way to extract the two column values withlut extracting the table ?

  2. I have a pdf with 50 pages which is having same table. The challenge is each 50 pdf contains a country name and according to the country name I have to extract the table. Is there a way to achieve this.

@Ritaman_Baral

  1. Better to extract the table and then get the required values.
  2. One way is to try with multi value option together with required identifier option unchecked in the taxonomy manager
  3. For the second one…select country name also as a column and extract the data so that each row will be appended with country and then we can filter the data or group by countries and can try to separate the data as needed

Cheers

Hi @Ritaman_Baral ,

Some Additional Questions :

  1. Are the PDF’s Digital or could be a mixture of both Digital and Scanned ?

  2. If the PDFs are Digital, then could you let us know What is the Format of the Table and the format of the row that you would want to Extract ?

  3. For the 50 Pages PDF, How is the Country Name presented, Is it always in a Fixed Naming format ?

We are just asking these questions so to analyse whether a Document Understanding is needed or if it could be done using String Manipulations/Regex operations.

  1. Mixture
    3.Country Name position is fixed. It is on the left handside