I have this data
i want to get the adress data of it i was thinking to use the regex based extraction as it is the best possible way to extract the exact data
there are 2 scenario in this
first doc (doc same but position of the adress field is below, with bare minimum 5 words to classify the doc)
second doc(doc same but position of aress field is on the right side , with the ‘same’ 5 words to classify the doc )
for both doc there are only 5 words (same 5 words) now in this it gets identify that its a adhar card style doc but the issue is with the position of the adress field which dosent gets idently becz of the position is changing
third issue is that the position of images are not fixed they are maybe little tilted or the aress position of the same axis are maye zomed in/out a litle
so for this reasons form extractor is out of question and same with ml extractor as it is also not provideing any data properly i tried the IDCards endpoint for it
so the solution i came up with was to use regex and extract data after the lable ‘Adress’
but what the issue im getting main is with the ocr itslef too for some docs in this
so like im getting some garbage data in between , i also just view the text only option and it is giving the same issue the garbage data is there too
my guess are it is maybe becuase of the hindi language adress that is used next to the english adress
this are some regex expressions i used
(?<=Address:\s).$
(?<=Address:\s).(?<=\d{6})