Help - I need to only fetch first and second occurrence of a word in a dt and all rows in between it

Hi (Im not sure if i manage to address this post to the correct section in the forum, so just move it to correct location).

Im up for a task to pull out Build-logs from Azure DevOps (from mainly two pages, one page is a pipeline page that contains Description of the PR - pull request and a set of flags), and then to insert this build log into another system. Up till now, this have been done manually but we are going to automate this task as it can be time consuming.

My first task is to scrape the pipeline page, which is fine, i collect mainly 3 pieces of information:
Description
URL behind the description-field
Flag (that contains either “xxxxx Publish Success xxm xxs” or “xxxxx Publish Skipped” )
image

So i basically then got a dt - datatable with these 3 columns.

To speed up and remove all the things i do not need, i run a Filter Data Table. All i am left with is items that contains the text “*Hotfix - Merged PR” and either a “xxxxx Publish Success xxm xxs” or “xxxxx Publish Skipped” flag.

i have been testing and testing and trying to figure out how i can solve the next bit.

Goal: I want to keep the first occurrence of “xxxxx Publish Success xxm xxs” and all the items between till and including second occurrence of “xxxxx Publish Success xxm xxs” flag.

I am not sure if i really can accomplish this at all with Filter Data Table activity. Somethings tells me that this should probably be done with LINQ,or maybe a For Each, but unfortunately i dont not speak LINQ.

I hope there is some souls out there that have some good ideas.

XML from the scraping:

<extract>
	<row exact='1'>
		<webctrl tag='a' />
	</row>
	<column exact='1' name='Desc' attr='text' name2='URL' attr2='href'>
		<webctrl tag='a' />
		<webctrl tag='td' idx='2' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='span' idx='1' />
	</column>
	<column exact='1' name='Flag' attr='text'>
		<webctrl tag='a' />
		<webctrl tag='td' idx='3' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='span' idx='1' />
		<webctrl tag='span' idx='1' />
		<webctrl tag='span' idx='1' />
		<webctrl tag='span' idx='1' />
		<webctrl tag='span' idx='9' />
		<webctrl tag='span' idx='1' />
		<webctrl tag='a' idx='1' />
		<webctrl tag='svg' idx='1' />
	</column>
</extract>

Filter Data Table
image

I have started to use a For each row in Data Table, but i am not sure if this is the correct approach or if i should just use Filter Data Table activity, or if there could maybe be some LINQ activity?

This is basically the output from the datatable, so i want to keep first and second occurrence and everything in between.

There might be a third and fourth occurrence of “Success” but i do not need/want that.

All advice or ideas is helpful to me, thanks.

As a general alternate option check also for API calls to retrieve the extracted information via an alternate channel.

As far we had understood your request have a check on following strategy.

  • retrieve the row index for the rows with trigger words/text - xxxx Publish Success xxm xxs
  • then calculate the segments which are to fetch
    with the help of take and skip the rows can be fetched

yourDataTableVar.Skip(5).Take(3) … toList / CopyToDataTable…
will fetch row 6,7,8

I shall try to elaborate a little bit more.

In my case, row index 0 contains the first occurrence of “Success”, so i want to keep all the data on:
Row index 0, 1,2,3,4

My second occurrence of “Success” is on row index 4, so that will be my last piece i need. So basically in this particular scenario everything from row index 0 to 4 should be kept.

I currently do not do anything with my row indexes, I’m just printing them as i am not sure how i should approach this.

The first occurrence of “Success” could have happened on row index 12 and second occurrence on row index 17, so then i would want to keep 12,13,14,15,16 and 17.

I do not really have a strategy on which method i should use, if its even possible to accomplish this with the activity “Filtered data table”, or if i can use some “for each row” or if it must be done by LINQ (which i have no idea how i should write the LINQ-select, unless someone have a clever LINQ statement?).

The result can be either just sent back to the same data table or just sent to another table if that is most optimal approach.

Did this answer your question, if not, please let me know and i shall try to explain again :slight_smile:

not sure

taking your addition especially

can be done e.g. with
Assign Activity:
dtFiltered = dtOrig.AsEnumerable.Skip(0).Take(5).CopyToDataTable

Assign Activity:
dtFiltered = dtOrig.AsEnumerable.Skip(12).Take(6).CopyToDataTable

So it is about Row Index calculation.

You can do it with for each row - using index output

  • if statement and adding relevant found index to a list

Or
For each row and trigger when rows are to fetch and when to stop

Or with LINQ

Assign Activity
arrIndex =

Enumerable.Range(0,yourDTVar.Rows.Count).Where(Function (x) YourDTVar.Rows(x)("ColName").toString.Contains("YourTriggerToken")).ToArray

Feel free to share some sample data with us and we can help you more based on this for the solution approach

You have come with some good approaches which I really appreciate.

If i use this approach , is it possible to use variables that i have collected.

Skip(12) : I understand this as the first occurrence i collect. Is it somehow possible for me to have Skip(VarFirstOccurance)

Take(6): I understand this as numbers of positions i want to keep till second occurrence plus one additional position. 17-12+1=6. Can i again use a variable input here like Take(VarSecondOccurance) that i have pre-calculated already?

If you can just answer me on the last one i want to give this a try and see if i manage to get to my goal :slight_smile:

Yes we dynamize the Skip, Take calls

Skip - number Count to ommit
Take - number Count to fetch

Row index = 0-based
Skip/Take are counts taking 1 row = 1, skipping 4 rows = 4 (=rowindex: 3)
Also have a look here:

and also have a look on the takeWhile / skipWhile from the linked resources (e.g. LINQSamples)

1 Like