Find specific text and paste it in a formular field (based on MS Access)

Hello, i’m having trouble with the following use case of mine:

I need to find a specific text which is on a website (it’s actually a mail in protonmail.com) and want to paste it into a specific field in a formular (which is based on microsoft access).

For example:

“Hello, my name is blah blah, my new address is: Rockefellercenter. I moved there three weeks ago.”

And I want the Bot to extract only “Rockefellercenter” and paste it in the formular next to the “Address” Label, then extract “12345” and paste it into “Postleitzahl”, etc.

Can someone help me? I can extract the text but i don’t know how to extract a specific text though… :frowning:

Thank you very much for your help!

Named Entity Recognition (NER) is a challenging task in Natural Language Processing (read more about that here). If you - which I doubt - know that the email follows a fixed structure, you may attempt using regular expressions. No one is stopping your customers from using any syntax, and deterministic approaches will only take you so far.

In any case, you might want to double-check the extracted address, e.g. by services offered by Deutsche Post (https://www.deutschepost.de/en/d/deutsche-post-direkt/consumer-adressen.html), or any other (open) data source.

thank you for your fast reply @redlynx82.

Yes, NER is really challenging, thats exactly what i feard.:no_mouth:

What if the data is structured? e.g.:

Address: Rockefellercenter
City: New York
Postal Code: 12345

Can’t one search with keywords or a text that matches e.g. “Address:” then copy the text next to it or save it in a variable and then just paste the value from the variable into the form?

You can, but as said - a deterministic approach will cover defined cases only. Imagine this:

image

Now, what happens if someone decides to rather use “Street” instead of “Address”? What if they misspelled “Adress”? What if they left out the keyword entirely?

I would recommend one of the following approaches:

  1. Deterministic, but with master data. Assuming that you already have an address database for all countries, the best approach is to lookup the most likely entry. To make it work regardless of the individual elements of an address, I would use Bags of Words (one for the email body, another one for the database entries) and a Vector Space Model (VSM) representing your data. [Here’s] (https://www.theorycrafter.org/quipu/the-magic-behind-fuzzy-database-search/) a potential approach (disclaimer: I am the author).
  2. NER. You don’t have to start from scratch, most ML frameworks already have you covered. For example, Google AutoML offers address recognition and validation out of the box, and integrating Google Cloud Services into UiPath is simple thanks to native REST support.
2 Likes

Hi redlynx82. I have a custom trained google named entity recognition trained in google cloud, any advise on how to get to take a variable (text string that I OCR’d) and get the NER predictions back out using HTTP request, can’t seem to figure it out, but would be wonderful if I could. Thank you