Need to extract only column values using regex

Hi,
I have a following sample text in table format (extracted from pdf) from where i need to extract the third columns’ (Beschreibung) values only. The following extracted value is text, not a table.

Nr.     Bauteilgruppe                   Beschreibung
1       Ausrüstung                      Fahrzeugschlüssel m. Fernb. (2x Batterien leer) - --- - erneuern
2       Tür vorn links                  Tür - Dellen - sanft instandsetzen
3       Instrumententafel               Ablagefachdeckel (Sitzt schräg) - abweichend Auslieferungszustand -
                                        einstellen/justieren

I think I only have option to use regex ?
Any suggestion on this ?

1 Like

Hi,

First to use a filter datable activity and use it to filter and leave only the third column.

Then use for each row activity and within it Use the regex activity on each row.

Hope this helps :slight_smile:

sorry, actually the above one is a text only, extracted from pdf in table format. Its not a table.
I need to get the values under “Beschreibung”

Hey,
Maybe you will try replaced text where you have more then one space to character ‘;’ and create csv file ?

in general we would try to split into the different columns, targeting in a way that we can parse it as CSV into a datatable

But we cannot do on split by space. With following approach we can achieve the most of cases

Except of last line

When we can rely on following assumption: there never will be a | in use. We can do:

  • Substitution of multiple spaces to | - Regex.Replace
  • Split the text on Line Breaks - Split, Regex.Split
  • look for the line only having 1 | - insert missing | on this line on begin - LINQ

Feed the enhanced and corrected text lines to a Generate DataTable activity and handle it as CSV Data

it will return a data table, which we can process, e.g Filter datatable and keep only column Beschreibung

3 Likes

Understood.

In that case I can think of 2 options.

  1. Try to convert the text into a Data Table. By using the Output Data Table activity.
    Here is the documentation for this activity, it has an example also.
  1. An alternative to this will be to use Document understanding. Which is a bit more work, but very powerful and a very good skill to master. UiPath Academy offers a great course on this.

Hi, thanks for your reply.
can you please provide it with example (or the code) ?
It will be very helpful.

thanks.

Find some starter samples:

Getting the lines that needs to be corrected ( we can use for a for each)
grafik

Generate data tatable:

Just start the implementation and we are sure you will get it done the most by yourself

3 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.