Extract table from pdf with missing row values


i want to extract this table from the pdf
1124816797(Sample).pdf (27.1 KB)

i want to do this in regex or string manipulation

i was able to extract the table in string format and the output is

06.12.2021 088 DHL PAKET (GK) bis 1 kg 16 ST C1 3,25 52,00
DHL PAKET (GK) bis 3 kg 10 ST C1 3,35 33,50
Service GoGreen 26 ST C1 0,02 0,52
Maut- und CO2-Zuschlag 26 ST C1 0,11 2,86
08.12.2021 088 DHL PAKET (GK) bis 3 kg 1 ST C1 3,35 3,35
Service GoGreen 1 ST C1 0,02 0,02
Sperrgut 1 ST C1 20,00 20,00
Maut- und CO2-Zuschlag 1 ST C1 0,11 0,11

Anzahl Sendungen 27
Anzahl Services 28

now there are few missing values in between how to generate this datatable in proper format as shown in this

and convert it to excel file???

Hello @Prem_Kumar_S,

What are you using to extract the table? Extract Data Table?

image

If so, you already get a datatable so you can use write range activity.

Another way is using the document understanding activites!! :slight_smile:

i dont have the data table yet its just a snap of what output i want

and i want to do this in regex or string manipulation

Hi @Prem_Kumar_S

You need to extract this colored values only

Regards
Gokul

Hi @Prem_Kumar_S

If Yes, Here is the workflow using Regex Expression

RegexPDF.xaml (7.4 KB)

Regards
Gokul

theres a table in 2nd page which looks like this

i want to extract this without using DU

thank you

Hi @Prem_Kumar_S

Try this expression

System.Text.RegularExpressions.Regex.Match(readPDF,"(?s)Aufträge aktueller Zeitraum:(.*?)Summe\s").Groups(1).ToString.Trim

Regards
Gokul

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.