Looking for any help to solve my problem. So im reading whole pdf with OCR and i need to extract couple positions. The problem is with cash amounts, when the amount is less then >1k its fine but above it its problematic because im spliting whole string by spaces and extracting data by index, when the amount is bigger for example 1 250,50 its splitting it like index(0) - 1 and index(1) 250,50. Which is messing whole idea.
Example how extracted ocr string looks like:
installment number 2 for invoice numer AAA/BBB CC DDD (1 050,50) (23) (241,60) (1 292,42)
I have put in the brackets to show that i have always 4 numeric positions to extract + rest of data like invoice number etc, the order is always the same, but the amounts are not, so i have splited by spaces and it was fine, like last position is non-tax, etc just took last position and assigned as non-tax, but when the ammount is bigger than one thousand it`s splitting the amount and indexes are moved.
Thank you but, i have now data extracted as below:
So i have one string and i need 4 separated values:
1 021,01
23
234,83
1 255,84
The only fixed number is in invoices is: 23, rest of the values are random and unpredicatble. is there any change to extract all 4 values to variables like Val1, Val2… ?
yes, we would do it with regex and then e.g let return an array with the found values. As the occurence is also not fixed an approach like var1,var2… we would not recommend
getting an array with all regex matches woulfd look like this:
I just dont get how to extract values when there are thousands, because there is blank space "1 021,01". I dont think separating by comma would work beacause there is also value “23” which just dont have comma. Im totaly stuck there.
The it suddenly gets tricky. Do you expect there to ever be a case where the number consist of more than 4 digits in the first part? Or is 9 999,99 the absolute max?
Indeed it`s tricky, and yes i expect that amount above 4 digits will occur anywhere. Ammount changes per invoce, it may be 3 digits (hundreds) but do not i do not exclude posiblity to be (thousands) 4 or more digits.
In that case, it all comes down to how the string you will do the matching against looks like.
If it looks like the below example, I don’t think you will succeed