Every “________ no preferential origin ____" means it’s an entry and what I need to do is to store the values on variables or directly to colunms.
For ex:
The first line I need the number 2347402.
Second line I need the value 20 which is the Quantity and 8,81 which is the Price.
After getting the values I need, I have to “jump” to the next entry. This entry is marked for another " no preferential origin ______” at the end and I’ll need to get the values again and store in the second line and so forth.
In the end, what I need is something like this table:
No, it’s not always 16, so…
What I thought was:
The first step I look for the info “______ no preferential origin _____”.
After that, every entry is between this info and 5 rows Up.
For ex.:
Found “no pref origin” at line 10.
The entry is 9 to 5. Then I apply the split. However it should be applied.
That would be some way to start a positioning. I’m not sure if it works or even how it does, but maybe is an idea.
Assuming those 6 lines ending “________ no preferential origin ____” keep the same format and represent one data record I would use regex for extraction.
For Solutioning, we should start applying regex and find a pattern that fits your unstructured data. If the Structure pattern is unique, we should be able to achieve your output requriement.
Yes. These lines won’t be used.
They are there because they are some kind of “header” on the very source documment (which is a .pdf file), so it comes either way.
The pattern I can see is:
→ To find “no preferential origin”.
→ The full entry is between the 5 lines Up.
If we can use this pattern, I think the rest of the unwanted information wouldn’t be a problem.
The Output Excel doesn’t have a pattern to follow, so basically I need to get the information separetedely (the SPACE is the char that split the info from each other) and store in Colunm A, the othr B, etc.
@vip.thsantos , We cannot use just the space, we need a more definite pattern to be used.
Confirming on the Output or Providing the Expected Output would help us prepare the Required data.
If we consider space as the delimiter, then we won’t have a proper data. From the Screenshot below, we see each of the words separated with space, but most likely it is a single sentence or single value :
The reason I thought on SPACE is because this info you highlighted isn’t necessary too.
We won’t use it and the rest of them is separeted by a space.
The Output could be:
ColunmA: 2347402
ColB: 20
ColC: 8,81
ColD: Here I would mutiply 20 * 8,81
ColE: 870830
ColF: 4019722237416
And the next register would be the same, but line 2.
The workflow doesn’t provide a Complete solution, meaning there is still modifications in the Regex Expression to be made proper for identifying the right targets.
The Regex Expression used is the below, It is aimed on caputuring only the Numbers, However, it doesn’t work very well :
I was trying to do some tests here and I now identify all the “no pref…” entries.
I set the Match configuration and for each match it writes to a Text File this match. With that configuration I could see that it finds all the entries correctly.
Now, what I need is to work on my DataTable, or my excel or text file, based on that entry.
If it’s in the line 10, I need to find the values and split all the 5 up lines.