How to Split and store DataTable?

Hello guys!
I have a DataTable like this (DT and Excel):


Every “________ no preferential origin ____" means it’s an entry and what I need to do is to store the values on variables or directly to colunms.
For ex:
The first line I need the number 2347402.
Second line I need the value 20 which is the Quantity and 8,81 which is the Price.
After getting the values I need, I have to “jump” to the next entry. This entry is marked for another "
no preferential origin ______” at the end and I’ll need to get the values again and store in the second line and so forth.

In the end, what I need is something like this table:
image

How could I do that?

Thanks.

Hi @vip.thsantos ,

Could you provide us the Sample Input Excel files ?

Try to use Text to Datatable activity.

That format is awful. If it were me, I would look at getting a better source file. Don’t automate bad processes.

I totally agree, but this “awfull” source is created by SAP and this is pratically impossible to change haha!

Test.xlsx (10.5 KB)
That’s the file I need to work on.

@vip.thsantos , The row data to be extracted and formatted starts from Row 16 ? If so, is it always row 16 ?

No, it’s not always 16, so…
What I thought was:
The first step I look for the info “______ no preferential origin _____”.
After that, every entry is between this info and 5 rows Up.
For ex.:
Found “no pref origin” at line 10.
The entry is 9 to 5. Then I apply the split. However it should be applied.
That would be some way to start a positioning. I’m not sure if it works or even how it does, but maybe is an idea.

Assuming those 6 lines ending “________ no preferential origin ____” keep the same format and represent one data record I would use regex for extraction.

Cheers

@vip.thsantos , Could you maybe also provide the Expected Output for the Input provided.

You could Populate the Output with 2-3 rows as specified in the Screenshot above

However, I do think that we do have some unwanted lines in between the data. Please do confirm if that is the case :

For Solutioning, we should start applying regex and find a pattern that fits your unstructured data. If the Structure pattern is unique, we should be able to achieve your output requriement.

Yes. These lines won’t be used.
They are there because they are some kind of “header” on the very source documment (which is a .pdf file), so it comes either way.

The pattern I can see is:
→ To find “no preferential origin”.
→ The full entry is between the 5 lines Up.

If we can use this pattern, I think the rest of the unwanted information wouldn’t be a problem.

The Output Excel doesn’t have a pattern to follow, so basically I need to get the information separetedely (the SPACE is the char that split the info from each other) and store in Colunm A, the othr B, etc.

@vip.thsantos , We cannot use just the space, we need a more definite pattern to be used.

Confirming on the Output or Providing the Expected Output would help us prepare the Required data.

If we consider space as the delimiter, then we won’t have a proper data. From the Screenshot below, we see each of the words separated with space, but most likely it is a single sentence or single value :
image

The reason I thought on SPACE is because this info you highlighted isn’t necessary too.
We won’t use it and the rest of them is separeted by a space.
The Output could be:
ColunmA: 2347402
ColB: 20
ColC: 8,81
ColD: Here I would mutiply 20 * 8,81
ColE: 870830
ColF: 4019722237416
And the next register would be the same, but line 2.

@vip.thsantos ,

Could you Check the Below Workflow :
DT_Split_Store.zip (11.3 KB)

The workflow doesn’t provide a Complete solution, meaning there is still modifications in the Regex Expression to be made proper for identifying the right targets.
The Regex Expression used is the below, It is aimed on caputuring only the Numbers, However, it doesn’t work very well :

^\s*(\d+)\s+(\d+).*?(\d+).*?([\d.,]+).*?(\d+).*?([\d.,]+).*?(\d+).*?(\d+).*?(\d+).*?(\d+).*?(\d+).*?(\d+).*?(\d+)

The Reason is that the Last row doesn’t behave in the same pattern as that of other rows.

In order to fix that, we would have to understand what are the ColE and ColF values for the Last row and it’s pattern.

I was trying to do some tests here and I now identify all the “no pref…” entries.
I set the Match configuration and for each match it writes to a Text File this match. With that configuration I could see that it finds all the entries correctly.
Now, what I need is to work on my DataTable, or my excel or text file, based on that entry.
If it’s in the line 10, I need to find the values and split all the 5 up lines.
:frowning: