I have a task where I need to scrape a table from a citrix application, but where data scraping is not permitted. I then need to convert this scraped text into a datatable.
The table is of the format 4 columns and a variable number of rows. Unfortunately I cannot provide a screenshot due to sensitive content.
From checking other posts on this forum I have gathered that I need to split the string into an array, build the datatable and then use Add DataRow within a For Each. I have managed to get the scraped information into a datatable of 1 column and 4 rows by creating an array using the following assign:
Array = ScrapedText.Split(Environment.NewLine.ToCharArray)
But this returns an array of 1 column and 4 rows because there is only one split condition.
In summary, what I currently have is:
Column1
Test Test Test Test
Test Test Test Test
Test Test Test Test
Test Test Test Test
What I need the output to be:
Col1 Col2 Col3 Col4
Test Test Test Test
Test Test Test Test
Test Test Test Test
Test Test Test Test
One other issue is that the information in each column, due to formatting, is not always separated by 1 space - normally 5+ spaces. How can I get the data into the format I need?
1 Like
Hi @PeterHeyburn,
Greatā¦!
So buddy you have an option āGenerate data tableā activity to convert the structured format of text to a datatable. Then you can use add datarow and invoke method together for mentioning the new column name
Cheersā¦!
You can follow the advancement of Computer Vision, however I donāt think there is an Extract Structured Data yet.
Assuming that all your columns are only one word, then you can replace the space with a delimiter character like a comma. But, since you say there could be multiple spaces between columns, then you can convert them to a single space or comma.
Here is an example:
System.Text.RegularExpressions.Regex.Replace(ScrapedText, " +", ",")
After you have it delimited, then you can simply use Write CSV or Generate Data Table. (Generate Data Table is sometimes glitchy though)
You can also attach headers for columns as well if needed by using concatenation. headers+System.Environment.Newline+System.Text.RegularExpressions.Regex.Replace(ScrapedText, " +", ",")
I hope this helps.
Regards.
Thank you for your reply!
This is quite close to my desired outcome, however I find that because some of the columns can have more than one word, I am getting something like 8 columns instead of 4. I have tried setting the column separator to " " (two spaces) so that the string will not be split by only one space but that is not working!
Thank you for replying!
Unfortunately because of the environment I am working in, I cannot make use of Computer Vision here.
As you mentioned, not all columns are one word so although your solution works nicely it does not quite give me my desired output. I do not actually need to write this to a file, I just need to be able to reliably get the value of the 4th column in each row. I will be using the value here inside a āFind text positionā to reliably select an item.
Alternatively, since the value I need is always the value before the new line, is there a way I can read all text from the new line back until a space is reached? Apologies if that doesnāt make sense.
xxxx xxxxx xxxxxxx 12345
xxxx xxxxx xxxxxxx 23456
xxxx xxxxx xxxxxxx 34567
I only need the values in bold.
When I said you can use Write CSV, I meant you can follow that with Read CSV to get it back to a Data Table. But, like we suggested, Generate Data Table is an option if it works.
If your columns are always separated by more than one space, then you can first replace the single space with a different character (any character), then replace the multiple spaces with a comma, then replace that character back to a space. If this is an option, let me know if you need help finding the solution on how to replace only the single space.
If all you want to do is get the last value in the row, then you can split by the newline, and use LINQ to create an array of only those last values.
Here is an example of that:
ScrapedText.Trim.Split(System.Environment.Newline(0)).Select(Function(x) x.Trim.Split(" "c).Last)
So maybe that will work, if I got the syntax right. Essentially, it splits the text by newline to create an array of each row, then selects the last item for each row by splitting by a space, to create an array of only the last items. Then, you can generate a table or run them through a loop, or whatever you need.
Regards.
This works really nicely, thank you!