Data extraction from PDF invoice

Hi.

I have invoice pdf where is seller’s and buyer’s adress and i need extract them in two different places.

The text is "…the Seller known as Donald Duck with a mailing address of Bridgeroad 5, 34234, Austria agrees to sell the following item described as Town House
to a Buyer known as Daisy Duck with a mailing address of Bumbyroad 44, 23567, Austria for the purchase price of $100000 (US Dollars). "

For the seller I use “agrees” word and everuthing is OK:
Value: sellerInfo.Substring(
sellerInfo.IndexOf(“of”) + “of”.Length,
sellerInfo.IndexOf(“agrees”) - sellerInfo.IndexOf(“of”) - “of”.Length
).Trim()

But for buyer the output is blank value code is:
If(buyerInfo.Contains(“of”) And buyerInfo.Contains(“purchase”),
buyerInfo.Substring(
buyerInfo.IndexOf(“of”) + “of”.Length,
Math.Max(buyerInfo.IndexOf(“purchase”) - buyerInfo.IndexOf(“of”) - “of”.Length, 0)
).Trim(),
“”)

and if I use same code with modifying the seller adress for buyers then error is “start index cannot be larger than length of string”

buyerInfo.Substring(
buyerInfo.IndexOf(“of”) + “of”.Length,
buyerInfo.IndexOf(“for the purchase”) - sellerInfo.IndexOf(“of”) - “of”.Length
).Trim()

What I’m doing wrong?

Thank you!

@miuku,

Using string manipulation is not the reliable solution I would say. Use RegEx approach for this type of problem statement.

Follow this mega post to refer for RegEx.
Regex help tutorial MEGAPOST – Making your first Regex post, Reusable Regex Patterns, Regex Troubleshooting, Sample Workflow and more - News / Tutorials - UiPath Community Forum

Thanks,
Ashok :slight_smile:

1 Like

Hi @miuku

Welcome to the Community! :smiley:
You can use something like

This will give you Seller address:
Seller Address: textVar.Split({“Seller”},StringSplitOptions.None)(1).Split({“mailing address of”},StringSplitOptions.None)(1).Split({“agrees”},StringSplitOptions.None)(0).Trim

This will give you Buyer address:
BuyerAddress : textVar.Split({“Buyer”},StringSplitOptions.None)(1).Split({“mailing address of”},StringSplitOptions.None)(1).Split({“for the purchase”},StringSplitOptions.None)(0).Trim

Thanks
Happy Automation! :smiley:

Thank you, I have to figure it out with this one, i think i have to create new variables bcs its still show error codes :smiley:

Unfortunately I have long code for this one already to change everything for regex :frowning:

Hi @miuku

Here’s a sample XAML, the output of your readpdf variable could be using instead of textVar in the expression
AddressExtractio.xaml (7.4 KB)

You might have to replace the Double Quotes if you’re copying the exact same.
Output:
image

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.