Extract Data from Word Table to Data table or Excel

Hello,

I am having a table(apart from other text) in word doc from which I need to get the info like no. of unique facilities and no. of beds and no. of facility that has cost > 2000 and load in excel.
For ex: from the screenshots below, no. of facilities is 2 and no. of beds is 150 and facilities with cost > 2K is 1. The input table can have any number of rows but my output will be one row. There will not be more than one table in the word document. I am looking for some guidance on how to go about this requirement.
image
The table in some case can come in a different format like shown below
image

hi @arunasan … There are couple of options suggested in this post … Could you please take a look?

1 Like

Hi @prasath17 ,

I tried that approach from Balareva and I am getting an error for which I have to install a specific version of his packages but I don’t see that version in my community edition. I have reached out to him as well.

Hi @arunasan

Here is the community manage and package…

Regards
Balamurugan.S

Thank you @balupad14 and @prasath17.
I am trying to get the count of unique facilities from this table using these statements
ExtractDataTable.DefaultView.ToTable(true, “Facility Name”)
Uniq_Facility.Rows.Count

I am getting count of 3. But actually I want the count to be 2. The row after FN1 also belongs to FN1. How do I achieve this?

image

The output should be like :

image

I tried this
(From p In ExtractDataTable.AsEnumerable
Group p By FN=p.Item(“Facility Name”).ToString Into grp=Group
Let ra= New Object() {FN, grp.Sum(Function (r) CInt(r(“Number of Beds”).toString.Trim))}
Select BedCount.Rows.Add(ra)).CopyToDataTable()
but getting this error:
Object reference not set to an instance of an object.
Not sure where I am wrong. Could someone please take a look?

@arunasan
often the runtime error comes from an uninitialized datatable, in this case BedCount.

Just check the implementation on:

  • is BedCount correct initialized
  • are all needed Datacolumns configured before BedCount is used in the LINQ

@ppr , thank you for responding.
I initialized BedCount as datatable and assigned the query to BedCount. Is that correct?
I am not sure I understand your second point. Could you please explain?


image

give a try on using an assign activity before the shown assign
BedCount = ExtractDataTable.Clone

1 Like

Thank you so much @ppr. This solved the error. But my output is coming as FName1,248 instead of FName1,124. Actually my expected output is given below.
Input:
image
Expected Output:
No. of Facilities : 1
No. of Beds : 124
Facilities >2K : 1

@arunasan
at a first look:
your groups are defined by FN
and it is summing up the beds
so FN1: 124+124= 248

so LINQ is doing as implemented.

Maybe there is another intention on what is to achive.

@ppr Could you please let me know how I can update the query so that I add the bed count for distinct Facilities ? I have given an example of my input for more than one facility and expected output.

@ppr
I updated my query like this - (From p In ExtractDataTable.AsEnumerable()
Group p By FName=p.Item(“FacilityName”).ToString Into grp=Group
Let ra= New Object() {FName, grp.Max(Function(r) CInt(r(“Beds”).toString.Trim)),grp.Sum(Function(r) Convert.ToDouble(r(“TotalCost”).toString.Replace(“$”,“”).Trim))}
Select BedCount.Rows.Add(ra)).CopyToDataTable()
But output is like -
FName1,124,2906.32
FName2,120,1859.19
Should I do for each row to get the expected output or is there any other efficient way?

@arunasan

your provided output (the excel 2,224,1) and the last LINQ is not matching deriveable
e.g. more cols are fetched in LINQ as shown in the excel sample

So this makes it more hard to have an understanding on input / computing / output

on this part you can help us

if this helps you: in case of you want to have the number/count of group members it can be retrieved (e.g. for a column of the output) grp.Count

@ppr
Just so I understand, are you saying that with my current actual output, I should write a LINQ to get my expected output ?

I guess my computing method( using LINQ) needs to be corrected. My requirement is as shown below. I want to achieve this output with the given input.

refering to the last screenshot following LINQ (taken from your post)

Group p By FName=p.Item(“FacilityName”).ToString Into grp=Group
Let ra= New Object() {FName, grp.Max(Function® CInt(r(“Beds”).toString.Trim)),grp.Sum(Function® Convert.ToDouble(r(“TotalCost”).toString.Replace("$","").Trim))}
Select BedCount.Rows.Add(ra)).CopyToDataTable()

and the reported output:
FName1,124,2906.32
FName2,120,1859.19

is reasonable and within the expectation (2 Groups, 2 rows, Max, Sums) as it deriveable

comparing to the expected output:
2,224,1

the LINQ / computing rows are unclear not confirmed deriveable

  • 1 row, but having two groups
  • 224, the deriveable computation could be: FN1 Max Bed + FN2 Max Beds = 124+120=224
  • 1 the deriveable computation could be. count the groups where the Cost Sums > 2000

the 2,224,1 looks a little bit like it is computing the result of
FName1,124,2906.32
FName2,120,1859.19

Maybe you can do following:

dtGroupResult1 =
Group p By FName=p.Item(“FacilityName”).ToString Into grp=Group
Let ra= New Object() {FName, grp.Max(Function® CInt(r(“Beds”).toString.Trim)),grp.Sum(Function® Convert.ToDouble(r(“TotalCost”).toString.Replace(“$”,“”).Trim))}
Select BedCount.Rows.Add(ra)).CopyToDataTable()

Prepare an empty datatable (dtGroupResult2) with the column structure (NoOfFac, Beds, FacsOver2k)
Collect the different information from dtGroupResult1:

  • NoOfFac = dtGroupResult1.Rows.Count
  • Beds = dtGroupResult2.AsEnumerable.Sum(Function (x) Convert.ToDouble(x(ColNameOrIndex)))
    • But result will be 244 and not 224
  • FacsOver2k = dtGroupResult2.AsEnumerable.Where(Function (x) Convert.ToDouble(x(ColNameOrIndex)) > 2000).Count

So with this results dtGroupResult2 can be populated

Maybe this helps for your next steps.

1 Like

@ppr This worked perfectly. Thank you so much.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.