SenzoD
(Senzo Dlomo)
February 19, 2020, 11:47am
1
Hi all,
I am using a read pdf with ocr and i get a bunch of text that looks like this:
Grv # Date Returned Paid From Till Tax Amount (ExVat) (incl Vat)
3745 15 Feb 2020 Purchased No 6.43 42.84 49.27 ’
and from the text i need the number below Grv, i can find grv using regex but now i only need the number below the word, any ideas?
like 3745 ?
is that Date always fixed?? if yes you can try this expression to get the number
.*(?=\s\d{1,2}.[A-Za-z]) cheers
check this i’ve tested here!
Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET.
1 Like
SenzoD
(Senzo Dlomo)
February 19, 2020, 12:04pm
3
Pradeep_Shiv:
.*(?=\s\d{1,2}.[A-Za-z])
Thanks a lot @Pradeep_Shiv , this works
1 Like
cheers
@SenzoD
happy learning!
SenzoD:
.*(?=\s\d{1,2}.[A-Za-z])
Changing .* to \d+ could get you more security I think, as its would only catch a result if it’s number.
*just adding a thought that could help out!
gl
2 Likes
(?<=\n)\d+(?=\s)
This one works too…
2 Likes
system
(system)
Closed
February 23, 2020, 9:34am
7
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.