Hi folks! I have a unique question when processing data - see attached workbook and image below: (NOTE: I added the values in dt2 other than column headings).
This solution is reading scraped data that an end-user has labeled and fed into a document intelligence tool.
The goal here is to match the key values in dt1 to the column headings in dt2 and write the corresponding value tied to the key. My current solution it to open a nested for each row loop, and if the key value in dt1 matches the dt2.columns.columnname in the current iteration - write to that cell.
The problem here is that I need to write to dt2 which is a blank datatable with only column headings. When I try to iterate down to the next row to print my matches, it throws an error.
How can I dynamically add data rows to this table? Please note that I already tried to ArrayRow property in the Add Data Row activity but I do NOT want to hard code the fields because they will vary based on the document being processed.
@joshgregory10 Will all the Columns in DT1 be present in DT2 ? If So, you can use Transpose Datatable Activity which might be a Lot easier to Shift the Structure and get the Output you need. Try it and let me know if you face any Difficulties.
In dt1, the values were extrapolated to dt2 and formed the column names. I call this an âunpivotâ. Once I did this, I need to loop through dt1 and any time the Key in dt1 matches the column name in dt2, write that value.
Each block of kvps in dt1 are extracted from a document intelligence tool so they need to be contained in one row within dt2 - they cannot be separated.
Is there a way to bunch together the identifiers in dt2 into a DataRow and print that DataRow in dt2? Note that I need to be able to have a dynamic number of columns, I cannot hard code the fields.
@joshgregory10
have a look on the yellow Block. currently your structure is varying. Can you just check at your end if this was a copy paste issue or if our data really comes in an unorderd structure.
In general you do sometig like a block parsing and later transpose it to this other form. Blockparsing can be done with markers. For an initial idea hava look to a not so far away case, as block parsing was used
Find below the updated table - I anticipated this issue would occur so I created a field called âIdentifierâ in dt1 where it would link every kvp to a unique value.
Refer to dt2 - I completed two rows manually. Identifier field from dt1 will form âprimary keyâ of the row, and the remaining kvps will print if the column name matches. This is why I put emphasis on match the column name. The column name will always match the âKeyâ field in dt1 since the values originated from there.
I was thinking tuples (identifier, key, value) but not sure how to implement this in dt2 to get the output I need without hard coding the name of the columns and predefining the fields in the dT. As mentioned, I need this to be dynamic since users will change the number of keys they extract from the document.
Yes - key/value pairs can be unordered (I am retrieving from Azure Cognitive Services via REST API so the response varies). However, if you see below, I append a document identifier to keep the kvps bunched together to form the row. Just need to make sure it is writing the key to the correct column in dt2.
used the distinctKeys List and create datacolumns in a second empty Datatable
order as retrieved
Logic:
Group the rows by ID (GroupBy)
Create from the Grouped Rows a Dictionary of Keys / Values
Create From the Key List a dictionary (Value is nothing)
Find out with An Except which Keys are Missing for the full DataRow Structure and create a Dictionary with Keys and Empty Values for missing items
Merge It with dict A (Now we have a dictionary with KVPs from Group with the values and Filled Up with KVP (empty Values) for the missing ones, but not in the right column order structure
Iterate over the Key List, Fetch the Values from unordered Dictionary and create an array from it
Use this array for adding a new datarow for this Group
copy it to datatable
Linq:
(From d In dtData1.AsEnumerable
Group d By k=d(0).toString Into grp=Group
Let dictA = grp.ToDictionary(Of String, Object)(Function (x) x(1).toString, Function (x) x(2))
Let dictK = Keys.ToDictionary(Of String, Object)(Function (x) x, Function (x) Nothing)
Let dictD = dictK.Keys.Except(dictA.Keys).ToDictionary(Of String, Object)(Function (x) x, Function (x) Nothing)
Let dictM = dictA.Union(dictD).ToDictionary(Of String, Object)(Function (x) x.Key, Function (x) x.Value)
Let arrRow = Keys.Select(Function (i) dictM(i)).toArray
Select newrow = dtData2.Rows.Add(arrRow)
Select newrow).CopyToDataTable
I would not suggest to rely on such a long statement as long you donât feel that you can maintain it. But you can map some parts to essential UiPAth Activities and deconstruct the statement. Maybe this can help you.
Hi there - thanks for this! Can we print the âIDâ field to the dt2 output as well in the first column? I need to join this to another dT. So the final output in dt2 will be the labels as column headings + the ID field.
Thanks @ppr - I tried to follow the LINQ query to pull it out myself but was unable. If you are using the identifier group the data, wouldnât it just be a matter of adding that field to the final output in the newRow?
@joshgregory10
Due A Bug in UiPath Studio I lost much time and was blocked to rework on it. However the Statement requires a map to more essential Activities. Otherwise the ability of maintenance will be lost. The LINQ was good for Quick prototyping but doesnt have longer any reason to stay longer.
I will check on another machine about the studio Bug and will revert back to you
@joshgregory10
Find this updated version no handling the ID. The LINQ was except the GroupBy decomposed into essential UiPath Activities. With DataTable.NewRow we got an empty row, that was fillable by accessing with the Key Info from dt1.
Thanks for this! I ran this but throwing an error - âAdd Data Row: Cannot set column âIDâ. The value violates the MaxLength limit of this column.â in the last activity. Did you have the same?
One more issue I ran into. The overall project I am working on is extracting data from a JSON file originating from Microsoft Form Recognizer. In order to get the âValueâ field used in dt1, I am doing the following:
This loop is iterating through the JSON result page (resultPage.AnalyzeResults) and as shown in âAdd Data Rowâ activity, writing file.ToString (âIDâ), label.Key (âKeyâ), label.Value.Text (âValueâ) to dt1.
The problem now is when Form Recognizer fails to find a âValueâ during extraction, it writes âNullâ in the value property within JSON file. This throws an error (Object reference not set to instance of object) because the value is null in Add Data Row.
**Question: How can I check if an object is null and write a string (ex. âNullâ) so that my âAdd Data Rowâ activity will not error? I tried to convert label.Value.ToString.IsEmptyorNull but you cannot do that with objects⌠**
@joshgregory10
I am Not Sure if i got you. For Testing If a string is null or empty you can use String.isnullorempty(youtstringvar) and will Return true or false.
Can you please give a Feedback If my Demo Code integrated by you IS working as expected and also handle the ID now