How to Split and store DataTable?

vip.thsantos · April 28, 2022, 1:51pm

Hello guys!
I have a DataTable like this (DT and Excel):

Every “________ no preferential origin ____" means it’s an entry and what I need to do is to store the values on variables or directly to colunms.
For ex:
The first line I need the number 2347402.
Second line I need the value 20 which is the Quantity and 8,81 which is the Price.
After getting the values I need, I have to “jump” to the next entry. This entry is marked for another " no preferential origin ______” at the end and I’ll need to get the values again and store in the second line and so forth.

In the end, what I need is something like this table:

How could I do that?

Thanks.

supermanPunch · April 28, 2022, 1:55pm

Hi @vip.thsantos ,

Could you provide us the Sample Input Excel files ?

Rahul_Unnikrishnan · April 28, 2022, 1:57pm

Try to use Text to Datatable activity.

postwick · April 28, 2022, 2:04pm

That format is awful. If it were me, I would look at getting a better source file. Don’t automate bad processes.

vip.thsantos · April 28, 2022, 2:07pm

I totally agree, but this “awfull” source is created by SAP and this is pratically impossible to change haha!

vip.thsantos · April 28, 2022, 2:12pm

Test.xlsx (10.5 KB)
That’s the file I need to work on.

supermanPunch · April 28, 2022, 2:20pm

@vip.thsantos , The row data to be extracted and formatted starts from Row 16 ? If so, is it always row 16 ?

vip.thsantos · April 28, 2022, 2:40pm

No, it’s not always 16, so…
What I thought was:
The first step I look for the info “______ no preferential origin _____”.
After that, every entry is between this info and 5 rows Up.
For ex.:
Found “no pref origin” at line 10.
The entry is 9 to 5. Then I apply the split. However it should be applied.
That would be some way to start a positioning. I’m not sure if it works or even how it does, but maybe is an idea.

J0ska · April 28, 2022, 2:42pm

Assuming those 6 lines ending “________ no preferential origin ____” keep the same format and represent one data record I would use regex for extraction.

Cheers

supermanPunch · April 28, 2022, 2:49pm

@vip.thsantos , Could you maybe also provide the Expected Output for the Input provided.

You could Populate the Output with 2-3 rows as specified in the Screenshot above

However, I do think that we do have some unwanted lines in between the data. Please do confirm if that is the case :

For Solutioning, we should start applying regex and find a pattern that fits your unstructured data. If the Structure pattern is unique, we should be able to achieve your output requriement.

vip.thsantos · April 28, 2022, 3:00pm

Yes. These lines won’t be used.
They are there because they are some kind of “header” on the very source documment (which is a .pdf file), so it comes either way.

The pattern I can see is:
→ To find “no preferential origin”.
→ The full entry is between the 5 lines Up.

If we can use this pattern, I think the rest of the unwanted information wouldn’t be a problem.

vip.thsantos · April 28, 2022, 3:04pm

The Output Excel doesn’t have a pattern to follow, so basically I need to get the information separetedely (the SPACE is the char that split the info from each other) and store in Colunm A, the othr B, etc.

supermanPunch · April 28, 2022, 3:14pm

@vip.thsantos , We cannot use just the space, we need a more definite pattern to be used.

Confirming on the Output or Providing the Expected Output would help us prepare the Required data.

If we consider space as the delimiter, then we won’t have a proper data. From the Screenshot below, we see each of the words separated with space, but most likely it is a single sentence or single value :

vip.thsantos · April 28, 2022, 3:22pm

The reason I thought on SPACE is because this info you highlighted isn’t necessary too.
We won’t use it and the rest of them is separeted by a space.
The Output could be:
ColunmA: 2347402
ColB: 20
ColC: 8,81
ColD: Here I would mutiply 20 * 8,81
ColE: 870830
ColF: 4019722237416
And the next register would be the same, but line 2.

supermanPunch · April 28, 2022, 3:43pm

@vip.thsantos ,

Could you Check the Below Workflow :
DT_Split_Store.zip (11.3 KB)

The workflow doesn’t provide a Complete solution, meaning there is still modifications in the Regex Expression to be made proper for identifying the right targets.
The Regex Expression used is the below, It is aimed on caputuring only the Numbers, However, it doesn’t work very well :

^\s*(\d+)\s+(\d+).*?(\d+).*?([\d.,]+).*?(\d+).*?([\d.,]+).*?(\d+).*?(\d+).*?(\d+).*?(\d+).*?(\d+).*?(\d+).*?(\d+)

The Reason is that the Last row doesn’t behave in the same pattern as that of other rows.

In order to fix that, we would have to understand what are the ColE and ColF values for the Last row and it’s pattern.

vip.thsantos · April 28, 2022, 7:22pm

I was trying to do some tests here and I now identify all the “no pref…” entries.
I set the Match configuration and for each match it writes to a Text File this match. With that configuration I could see that it finds all the entries correctly.
Now, what I need is to work on my DataTable, or my excel or text file, based on that entry.
If it’s in the line 10, I need to find the values and split all the 5 up lines.

Topic		Replies	Views
Convert DataTable row into Excel Cells Studio studio , question , activities_panel	20	425	September 1, 2023
Splitting by delimiter in Datatable for each row Studio datatable , excel , regex	22	3835	September 12, 2020
Extract value from DataTable when condition is satisfied Help	20	5270	May 18, 2018
Split cells in excel Help activities	7	3458	July 15, 2022
Generate An Data Table Using Text String Studio studio , question , activities_panel	5	762	January 29, 2024

How to Split and store DataTable?

Related topics