I have to extract data table from unstructured excel sheet

I have to extract data table from unstructured excel sheet (Note don’t know range where we have to extract but I have to extract after “Information Required for New Cell” and end upto data table but don’t know end point …please help

@shaik.muktharvalli1

  1. Use find/replace activity and search for “Information required for new cell”
  2. Use the output to know the start of table
  3. After reading find a reliable value for identifying last row and remove the rows after that

Cheers

Hello @shaik.muktharvalli1
Try this

  • Use “Lookup Range” and search the “Information Required for New Cell” and store value in FindCell Vraibale name-----> Output will be A17

  • Use regex to get the Row number from the cell

FindRowNumber= System.Text.RegularExpressions.Regex.Match(FindCell,"\d+")   ---------> Output = 17
  • Use Read range activity to read the data and range should be as below
"A"+(cint(FindRowNumber)+1)  ----------> Reads data from A18

Hi @shaik.muktharvalli1

One way to do is: You want to get cell no. of B column where cell contains or equals “Information Required fopr New cell”.
Then you can provide start cell location in your read range activity.

To do so use following:
Assign an int variable startCell

  1. For each row in datatable (whole DT which you have read already), set index variable to i in for loop
  2. Inside loop, check if currentRow(1).ToString.Contains(“your string to break dt on”)
  3. Inside if, set startCell = index+2 and break from loop
  4. Now you can use startCell in an assign activity after loop as: dt = dt.AsEnumerable.Skip(startCell).CopyToDataTable

@shaik.muktharvalli1

1)The text Information Required for New Cell and continuing until the end without specifying a specific endpoint.

input_datatable = excelWorksheet.UsedRange.Rows.OfType(Of DataRow).SkipWhile(Function(row) Not row.ItemArray.Any(Function(cell) cell.ToString.Contains("Information Required for New Cell"))).TakeWhile(Function(row) Not row.ItemArray.All(Function(cell) String.IsNullOrWhiteSpace(cell.ToString))).CopyToDataTable()

Replace YourDataTableVar with the variable name you want to assign to the extracted DataTable, and excelWorksheet with the appropriate Excel worksheet variable. This one-liner will find the starting point after the specified text and continue extracting data until it encounters a row with all empty cells.

not understand please send sample workflow

send sample workflow for understanding please

One more efficient way to do is: without a for loop you can get index of startCell using AsEnumerable as below:

startCell = dt.Rows.IndexOf(dt.AsEnumerable.Where(Function(x) x(“column name”).ToString.Equals(“Information required…”))(0))

newDt = dt.AsEnumerable.Skip(startCell).CopyToDataTable

Column Name? which column name i have to mentioned here

Looks like your Value “Information required…” is present in column B, you can provide exact column name or index 1.

@Yoichi please help

thanks
shaik

@Anil_G , please send one sample workflow

@shaik.muktharvalli1

Please send a sample excel

cheers

Hi @shaik.muktharvalli1

Please find the below xaml for your reference

BlankProcess13.zip (168.2 KB)

Hope this helps!!

Hi @shaik.muktharvalli1

please find below approach and let me if it is helpful.

  1. Read Excel / Csv and store the data into Datatable (dt)
  2. int_startIndex = dt.Rows.IndexOf(dt.AsEnumerable.FirstOrDefault(function(x)x(0).ToString.Contains(“Information Required for New Cell”)))
  3. dt = dt.AsEnumerable.Skip(int_startIndex +1).CopyToDataTable
  4. int_lastIndex = dt.Rows.IndexOf(dt.AsEnumerable.FirstOrDefault(function(x)x(0).ToString.Contains(“Information End”)))
  5. dt = dt.AsEnumerable.Take (int_lastIndex).CopyToDataTable

@Siva_krishna_Thumma , but in my side last index name will dynamically change , then what i have to do?

thanks
shaik

@lrtetala , I am getting whole data from excel but I have to get only data table

Hey,

as per the screenshot you mentioned. you can check the last index of table like below.

int_lastIndex = dt.Rows.IndexOf(dt.AsEnumerable.LastOrDefault(function(x)
not stringIsNullOrEmpty (x(0).ToString) and
not stringIsNullOrEmpty (x(1).ToString) and
not stringIsNullOrEmpty (x(2).ToString) and
not stringIsNullOrEmpty (x(3).ToString) and
not stringIsNullOrEmpty (x(4).ToString) and
not stringIsNullOrEmpty (x(5).ToString) and
not stringIsNullOrEmpty (x(10).ToString) and
not stringIsNullOrEmpty (x(11).ToString)
))