Unable to convert pdf files to datatable

Here is the pdf file
merchant_settlement_Cheque_20240307_BPI.pdf (314.1 KB)

I want to extract details from that PDF to datatable
I tried using regex but unable to accomplish it .
Any help will be appreciated

Expected Output:

TIA!

Hi @joscares ,

Could you let us know if the original PDF data would also be in the same format ?

Upon first analysis, we could mention that each page will only have four values and each value will be separated in different lines. Hence, from this pattern, we should be able to extract each value from each line as each line is the resemblance for the each field.

We could provide you further with the data extraction logic if the assumption made is correct, else we would need to understand more details of the data formatting in the PDF.

Yep the pdf provided is the correct and final format

@joscares ,

Could you maybe check with the below workflow :

PDF_Regex_Data Extraction.zip (156.0 KB)

There are some additional assumptions made with date value being the delimiter for next set of values. An addtional dummy date value is added to the end of the text so that the regex conforms with the extraction on all the values.

Once the Match set is retrieved, we are then Splitting the data based on the new lines and we will receive 4 values (expected) after data cleaning (Trim + Trim with *). We are then transforming data extracted into the required datatable format.

Let us know if the workflow does not work for your data.

image

What is the main reason behind this why you need to add date in the end of the text from pdf or it does add data on end of each item?

It works btw

@joscares ,

As mentioned earlier, to conform with the Data pattern retrieval, we add the dummy value.

From the below, we can see that the last Set value is not being retrieved with the regex pattern used.

But adding the end dummy pattern below would allow us to retrieve it as well.

However, from the result, the last date value is not retrieved as shown in the result.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.