Building RegEx for different types of Company names

I am trying to build a regular expression to match and extract the company names in their public statement. However, this seems a little bit too difficult and I have no idea how to match all of them. Is it a good idea to use regular expression to match them?

Here are the examples of the company names:

  1. SUNRISE Industrials, Inc.
  2. Oceanic Health Food Holdings Company Limited
  3. Forest International INC
  4. Zenith Bioscience Limited - B
  5. Y.Z.Q. Electrics (Hu Bei) Co., Ltd
  6. Rivercross Bridge Limited
  7. Emerald Smart Technology Inc., Co., Ltd.
  8. Mountainview Aluminium International Holdings Limited
  9. Vivid Pharmaceuticals. - B
  10. Juniper Hydrogen Energy Equipment Co., Ltd.
  11. Golden Material Technology Co., Ltd.
  12. Skyline Technology Limited
  13. WellnessWay Inc.
  14. Fertility Hospital Management Group Limited
  15. Golden Peak Gold Mining Co., Ltd.
  16. Sunrise Technology Co., Ltd.
  17. Harmony Biotherapeutics, Inc. - B
  18. Summit Box Holdings Limited
  19. Pearl REFIRE Group Limited
  20. Starlight Energy Holdings, Inc.
  21. Evergreen Group Co., Ltd.
  22. ABC Electronics Co., Ltd.

No, it doesn’t look like the use case is suitable for Regex. You can load all names to a List(Of String) and check List.Contains(CurrentName)

Thanks for tour reply! However, the robot should download the statements from websites itself and read the pdf file to extract the company name. I am sorry but your answer cannot apply to this situation.

I am sorry as well that I couldn’t read your mind :grinning:

Nvm thanks for offering help :blush:

1 Like

@Komom

Is there any other static text or pattern before or after the name? that can be used as one anchor and can check

but if it is only with name then any name is possible and no patterns are there

few might be possible but again not 100% as ltd. etc can be used but again company can have any number of words as names infront so anything around is better to use in regex

cheers

I think before the name there should be a word “sponsor” in the extracted text. But they are in different lines. Is it able to extract the name using the word “sponsor” as anchor?

@Komom

Yes we can…show some patterns we can find how to extract…better find an ending patter also like a dot or a new line etc

Cheers

Joint Sponsors [REDACTED]
=~ i.
=>»
Goldman BofA SECURITIES HATONG

The above texts is an example, can I extract only the forth line?

@Komom

Yes you can here is an example

(?<=Sponsors(.*\r?\n){3}).*

In regexoptions use multiline

Cheers

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.