Help with City Name project - "Correct spelling and Filter abbreviations"

I’m trying to Make a little program to filter user input to make sure the city spelling is the full and correct spelling for that city.
It’s tough because everyone has different databases for the spelling of a city, Some might have
St. paul, while other have Saint Paul.

What I’ve done so far, is

  • I get the users input city, and state. And I do a google search for it.

  • I find the wikipedia link about that city and click on it

  • I scrape “City name, State” From the top of the wikipedia page.

  • I split the scraping into an array {St Paul, MN}

  • Get rid of the state and any abbreviations like “St.”

How do I get rid of the state?
Also I’m going to have to replace ‘St’ with ‘Saint’

So I’m not sure if I should split the array with a ", " or just " " a space.

Then I have to get it all back into one variable “Saint Paul”

I may need some ongoing help with this if you would so kindly stay with me
For the day. I have most of it done, just a few tricky things I’ve been trying to work out.

Thank you for your help.

Just to clarify what you’re looking for is:

  1. User inputs city, state abbreviation
  2. Search wikipedia for city, state
  3. Pull out city name ONLY for an output
  4. If output is an abbreviation, change it to the full name

You mention different databases for the spelling of the city, then say you’re using wikipedia (this is essentially a “database” that you’re using), then changing that database after the fact? Why not use a database you’ve found that spells it the way you want?

So this is actually for work, where the user, could be any employee of the thousands, that is putting a city in, to ship something to the customer. They also put in the zipcode. The program checks our database for the zipcode, then checks to make sure the city names match.

I’m running into a problem where the user inputs an abbreviation, because none of the cities are abbreviated in the database. It’s a really large database, and I don’t have access to it whatsoever. So I’m using Wikipedia. (I was using Google but it seems like the selectors keep changing every time I put a different city in).

It also corrects common spelling mistakes, but searching in google first. So if I searched for
Miani FL, Google will pull up the results for Miami, Florida.

Some cities are just abbreviated no matter what I’ve tried, like St. George, UT, and St. Louis Park, MN, So I am adding a regex to replace ST?. with Saint. And If any other issues arrise I’ll have to deal with them in the future. For now the error we got back from the logs was for "St. Louis Park is not the city that pulled up for the zip code, We pulled up Saint Louis Park.

I’m also fixing a common file that can be used in many other processes, so it’s also checking the spelling.

Right now I’m trying to figure out what is the most effective way of splitting this string, into an array.
{St. Louis Park, MN}
I need to get rid of the state on the end
I also need to eventually check if the first array index is St.
So that it can be replaced with Saint.

So I think I have most of it.
What i’m struggling with now, is I have just the city name array, Without the state
And It’s some unknown length.
After I replace st. with saint, I need it to go back into one string
in a variable that can’t be sent out.

How do I get my array of unknown length, into one variable, with the spaces “Saint Paul”
Intact.

Also how do I share my XAML file here.

I’d lobby to get read-only access to that database, as that is the only way to truly sanitize the input to ensure it’ll work.

Changing St or St. to Saint is one way, but there’s always more abbreviations then you’ll be able to think of, so it is a fragile solution. For example, do you need to show Fort Myers, FL or Ft Myers, FL?

As for your second question, I’d do the following: Assign CityName (str variable) = WikipediaResult.Split(","c)(0) to only keep the city.

Not sure how to do the St → Saint properly yet though. Does wikipedia always include a period? My first though was to split CityName by " " and check the first result to see if it starts with St, but if the cityname is Stockton, you’d get incorrect results. If it ALWAYS contains a period, then you could do the same but see if it starts with St. and if it does, then change the first string in the array to saint, then all strings in the array to a single string again.

Just click on Upload file in your reply and either drag + drop the xaml, or navigate to it in the dialog box

Another good option is to use the USPS address API. You can pass the 5 digit zip code then retrieve the city info from there.

https://www.usps.com/business/web-tools-apis/address-information-api.htm

CityAbbrConversion.xaml (22.4 KB)

They are not going to access the DB unless this fails at least a few times I’m pretty sure.
They have so much work anyways, It’s just not going to happen any time soon.

So this will have to do for now.
My problem is I have the array splitCityArray,
I’m trying to put it into one string out_City.
But it only takes the first index out of the array.

I need it to join all the array indexes together with spaces.
Like, Salt Lake City, or
Saint Paul.
The array size is variable.

It has to do with your last if statement splitCityArray.Count > 1 AND splitCityArray(0).ToUpper.Contains("ST")

When the statement is false and the input city has spaces in the name, it only grabs the first portion of the array. Since wikipedia is showing it as saint paul (not St paul) it doesn’t satisfy the AND criteria in your if statement. Therefore, it is only grabbing the 1st value in the array.

Your code as written works with sn louis park as the input because wikipedia shows it as St. Louis Park.

You’ll have to add another elseif statement IF splitCityArray.Count > 1 to join together the array for cities that have a count >1, but don’t start with the letters ST.

Keep in mind as i mentioned above, this solution will fail if the city starts with the letters ST. For example something called Sterling City (no idea if that’s a real city or not) would fail with the code as written

CityAbbrConversion.xaml (22.6 KB)

Okay, I did what you said, but I have to filter for (ST) first because I’m not sure how I would do that if it wasn’t in the array. So I don’t want to join the array, until I’ve filtered that.
I use the regex to replace ST with Saint.
I’m running it now.

Here you go - I made a couple other small changes prior because it wasn’t working properly for me - any changes i made have a comment activity explaining what i did.

I didn’t put in a comment activity for the nested if though. However, it should be obvious enough to see it as it is the last IF activity.

CityAbbrConversion.xaml (29.3 KB)

Okay, I Just seperated them into those two If loops I’m not going to nest them unless
you have a really good reason why I should.
Now my problem is
Staton, CA
Also starts with ST.
So in the end it becomes.
Saintnton, CA

I don’t know if you’re any good at regular expressions… I’m not… lol

I think I’ve got it all working, I’m just inputing some city names, to see if I can find any bugs before I turn it in.

CityAbbrConversion.xaml (22.4 KB)

You want it nested because it should evaluate the first expression, then the second expression, before applying the final ELSE expression. If you separate the if statements then it is applying it as first expression, first else, second expression which is not what you want.

The ElseIf/Nested If also helps stop the Saintnton problem a little bit as well. Only cities that start with ST and contain more than one word in the city name would get changed. So it would stay Staton, CA. However, if it was Staton City, CA then it would get changed to Saint City.

EDIT: I’m so-so with regex, but not sure that you need it here?
EDIT2: Just looked at your new file uploaded, sorry. You don’t need to nest, it is just easier to read if you do as it is more straight forward. Right now you are still nesting them as far as order of operations goes, it is just spread out and harder to read that way.

As for regex I see what you’re looking for now. You want an expression that will search for ST or ST. followed by a space/blank character, correct?

I completely remade this, because as I was doing something else, searching for zipcodes… I came across the USPS Zipcode finder… So now it searches the zipcode input, matches it with what’s in the USPS database, and then seperates the city from the state, Makes sure the states also match. It works perfect.