I am trying to build a regular expression to match and extract the company names in their public statement. However, this seems a little bit too difficult and I have no idea how to match all of them. Is it a good idea to use regular expression to match them?
Here are the examples of the company names:
SUNRISE Industrials, Inc.
Oceanic Health Food Holdings Company Limited
Forest International INC
Zenith Bioscience Limited - B
Y.Z.Q. Electrics (Hu Bei) Co., Ltd
Rivercross Bridge Limited
Emerald Smart Technology Inc., Co., Ltd.
Mountainview Aluminium International Holdings Limited
Thanks for tour reply! However, the robot should download the statements from websites itself and read the pdf file to extract the company name. I am sorry but your answer cannot apply to this situation.
Is there any other static text or pattern before or after the name? that can be used as one anchor and can check
but if it is only with name then any name is possible and no patterns are there
few might be possible but again not 100% as ltd. etc can be used but again company can have any number of words as names infront so anything around is better to use in regex
I think before the name there should be a word “sponsor” in the extracted text. But they are in different lines. Is it able to extract the name using the word “sponsor” as anchor?