StringA: “The main civil works Contract No. 4/WSD/17 commenced on 12 February 2018. Works were about 77% completed.”
StringB: “Works were about 22.0% completed. Mainlaying for reclaimed water in Sheung Shui and Fanling areas was in progress. The main civil works Contract No. 4/WSD/17 commenced on 12 February 2018.”
i want to extract the common part of the two strings, that is “The main civil works Contract No. 4/WSD/17 commenced on 12 February 2018.”
i searched the forum but seems no similar topic was raised before. Any thought?
Thx Yoichi! I have deployed this package it worked sucessfully until i put this two text in as input
s1 = 9198WC Implementation of Water Approved by F.C. 239.700 end 2016 end 2019 Contract No. 10/WSD/16 commenced on 23 December 2016.
s2 = account the impact of the COVID-19 pandemic affecting steel fabrication in the Mainland and the delivery of hangar door from the UK. Contract No. 10/WSD/16 commenced on 23 December 2016. Construction of pressure management and district metering installations was completed in October 2020. The project was
i expected there will be
equal part = Contract No. 10/WSD/16 commenced on 23 December 2016.
come out but it didnot, instead it shows the following
DELETE, account the impact of the COVID-19 pandemic affecting steel fabrication in the Mainland and the delivery of hangar door from the UK. Contract No. 10/WSD/16 commenced on 23 December 2016. Construction of pressure management and district metering installations was completed in October 2020. The project was
and
INSERT, 9198WC Implementation of Water Approved by F.C. 239.700 end 2016 end 2019 Contract No. 10/WSD/16 commenced on 23 December 2016.
i tried reserve s1 and s2 but the problem still exists. anything i did wrong?
what i want is “The main civil works Contract No. 4/WSD/17 commenced on 12 February 2018.”
the custom activity proposed by @Yoichi actually worked 90% of the time, i just found one case that did not work and would like to know any tricks needed.
@22:50pm, i further tried if i reduced the length of s2 = account the impact of the COVID-19 pandemic affecting steel fabrication in the Mainland and the delivery of hangar door from the UK. Contract No. 10/WSD/16 commenced on 23 December 2016. Construction of pressure management and
the result seems working. is there any limit of the variable input to text comparison?
It seems algorithm matter. The activity uses major diff algorithm, and it judges cost of deleting all and inserting all is lower than leaving some words.
There are some online text comparison site and they return same result.
If you need to extract the above sentence, should use another algorithm…
So, I just wrote sample code. Hope the following helps you.