I want to find duplicate words from column

Hi… facing issue while adding keywords to dictionary because of duplicate city names like India and Pakistan both have city name Hyderabad etc. So can we have solution if we ignore or highlight any duplicate before adding to dictionary.
Note: there will be multiple keywords just providing sample table.
Expand Table

India IN,Mumbai,Delhi,Mysore,Surat,Hyderabad,Kolkata,Bangalore
Iraq IQ,BAGHDAD,BASRA,NAFJAF,KIRKUK,KARBALA,NASIRIYAS,AMARA
China CN,SHANGHAI,beijing,chongqing,guangzhou,tianjin,shenzhen,hangzhou,wuhan
France FRA,Paris,Lyon,Nice,Nantes,Reims,Lille
England UK,London,Manchester,Bristol,Liverpool,Southampton
Russia Moscow,Kazan,Omsk,Volgograd,Krasnoyarsk,Samara
Afghanistan AF,Kabul,kandahar,jalalabad,kandahar,herat,ghazni,khanabad
Zimbabwe Harare, Bulawayo, Manicaland, Midlands, Mashonaland West, Masvingo
Germany Berlin,Munich,Hamburg,stuttgart,essen,Leipzig
Morocco MA,Casablanca,Rabat,fez,Tangier,Agadir,kenitra,oujda,meknes
Pakistan Karachi,Lahore,Islamabad,Hyderabad,Faisalabad,Peshawar,Multan,Bahawalpur,Sargodha
Jamaica JM,Kingston,Portmore,Montego,saint catherine,mandeville,maypen,holdharbour
Mali bamako,gao,sikasso,kalabancoro,koutiala,segou,kayes,kati,mopti,niono
Iran Hamadan, Ilam,Karaj, Tehran, East Azerbaijan, Semnan, Birjand, Bushehr
Spain Madrid,Barcelona,Seville,bilbao.valencia,cadiz,Girona

Hi,

How about using Tuple as the following?

We can use pair of country name and city name as key of dictionary.

Or

Another approach is to concatenate country name and city name as key.

Regards,

No we cannot concatenate or join city country because after adding to dictionary i want to search each keyword in dictionary ( city names) in string to find matching keywords which will return country name of matched keyword.

I have one linq for adding this data to dictionary. But problem is if we getting duplicates as i mentioned Hyderabad or anything else it will provide error( key already added to dictionary).

Hence i want to first remove or perform some action on that duplicate keywords. Before using that linq to add into dictionary.
Attaching linq query please check…I am getting error if i create variable of dictionary (Tuple,, ) for that linq getting error.
grafik

Basically I want to check…if any keyword from list exist in input string then it should return country name.

Example:
Input string -“AFZAL KHAN 9345 Shalimar Street 78A Kurla Mumbai IN”
Output-
INDIA for keyword IN and Mumbai.

Note- It should not extract AF keyword from AFZAL for Afghanistan country.

Hi,

I understood your requirement.

If it’s no problem to remove duplicated city, the following will work. (Please rewrite to query style if needed)

 dtTemp = dt.AsEnumerable.SelectMany(Function(r) r(1).ToString.Split({","c},StringSplitoptions.RemoveEmptyEntries).Select(Function(s) dt.Clone.LoadDataRow({r(0).ToString,s},False))).CopyToDataTAble

Then

dict = dtTemp.AsEnumerable.GroupBy(Function(r) r(1).ToString).ToDictionary(Function(g) g.Key,Function(g) g.First.item(0).ToString)

However, it might be better to use Dictionary<String, String[]> as the following, because It keeps all information.

dictArray = dtTemp.AsEnumerable.GroupBy(Function(r) r(1).ToString).ToDictionary(Function(g) g.Key,Function(g) g.Select(Function(r) r(0).ToString).ToArray)

Sample20211228-2.zip (7.9 KB)

Regards,

1 Like

Thank you…just small requirement can we separate cities column into 2 column where first column contains only single word city or keyword name and other column contain two or than two word city or keyword name.

E.g.

If cities/ keyword are:

India IN,Navi Mumbai,Delhi,Mysore,Jammu and Kashmir,Surat,Hyderabad,West Bengal,Bangalore
Iraq IQ,BAGHDAD City,BASRA-al-kafa,NAFJAF,KIRKUK,KARBALA,NASIRIYAS,AMARA
China CN,SHANGHAI,beijing,chongqing,guangzhou,tianjin,shenzhen,hangzhou,wuhan
France FRA,Paris,Lyon,Nice,Nantes,Reims,Lille
England UK,London,Manchester,Bristol,Liverpool,Southampton
Russia Moscow,Kazan,Omsk,Volgograd,Krasnoyarsk,Samara

Then for e.g.:
For india and iraq

|India|IN,Delhi,Mysore,Surat,Hyderabad,Bangalore|Navi Mumbai,Jammu and Kashmir,West Bengal|
|—|—|
|Iraq|IQ,BASRA-Al-kafa,NAFJAF,KIRKUK,KARBALA,NASIRIYAS,AMARA|BAGHDAD City|

Colunm 1 contain for india- IN delhi mysore surat etc and column 2 will be Navi Mumbai, Jammu and Kashmir , West Bengal

For column1 iraq: IQ, BASRA-AL-kafa, NAFJAF etc
And column 2 iraq: BAGHDAD City

Hi,

Can you try the following?

img20211229-2

dtResult = dt.AsEnumerable.Select(Function(r) dtResult.LoadDataRow({r(0).ToString,String.Join(",",r(1).ToString.Split({","c},StringSplitoptions.RemoveEmptyEntries).Where(Function(s) not s.Trim.Contains(" "))),String.Join(",",r(1).ToString.Split({","c},StringSplitoptions.RemoveEmptyEntries).Where(Function(s) s.Trim.Contains(" ")))},False)).CopyToDataTable

Sample20211228-2v2.zip (14.1 KB)

Regards,

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.