We see in taxanomy we have data type as address and in document manager it is multi line string. But when the address entity/label is extracted it is returned as string. Is there way to fetch in the same way it has extracted like multi line?
If there is possible definite format for the address then Regex-Based Extractor can return a multi-line address if the regex pattern is designed to capture the line breaks.
Else postprocessing is the best way
Actually the address is being extracted properly. But its returned as a single line. For example: The address in the receipt document is like below:
The Taj Mahal Palace,
Apollo Bandar, Colaba,
Mumbai, Maharashtra 400001.
The output string returned is concatenated lines of the above. The Taj Mahal Palace, Apollo Bandar, Colaba, Mumbai, Maharashtra 400001.
But I want the data to be returned or postprocessed in the same way it extracted. Since the multiline option enabled in Document Manager. I want to know if there is way to have them spearated.
Think so this is only possible by using Regex-based-extractor… example: for above text you mentioned you can extract it by using the regex :“(?<=below:)\n.^star\n.^star\n.*” and the output will be multiline. It may have some Cons please look into it.
Yeah, but the format is never the same that’s the reason I opted for the Document Understanding extractor. We deal with cross-countries invoices and receipts.