How to extract Address data type in Document manager or taxanomy in multiple lines

Hi Guys,

We see in taxanomy we have data type as address and in document manager it is multi line string. But when the address entity/label is extracted it is returned as string. Is there way to fetch in the same way it has extracted like multi line?

Thanks,
Aravind

Hi @aravindbalineni123

If there is possible definite format for the address then Regex-Based Extractor can return a multi-line address if the regex pattern is designed to capture the line breaks.
Else postprocessing is the best way

Hope this helps!

Actually the address is being extracted properly. But its returned as a single line. For example: The address in the receipt document is like below:
The Taj Mahal Palace,
Apollo Bandar, Colaba,
Mumbai, Maharashtra 400001.

The output string returned is concatenated lines of the above. The Taj Mahal Palace, Apollo Bandar, Colaba, Mumbai, Maharashtra 400001.

But I want the data to be returned or postprocessed in the same way it extracted. Since the multiline option enabled in Document Manager. I want to know if there is way to have them spearated.

Think so this is only possible by using Regex-based-extractor… example: for above text you mentioned you can extract it by using the regex :“(?<=below:)\n.^star\n.^star\n.*” and the output will be multiline. It may have some Cons please look into it.

Sharing my point of view, hope this helps
cheers

Yeah, but the format is never the same that’s the reason I opted for the Document Understanding extractor. We deal with cross-countries invoices and receipts.

The address can be extracted from the unformatted fields in extractionresults.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.