RPA3
(RPA3)
January 21, 2021, 10:23am
1
Hello everyone!
I have string
1 - vdaefad 28-9686-87-09 efwefefw 212 14 w424
2-3 jkawnnfno 12849182009 jefoe9nfiwnewn
sefcewfw 4-5 efqwffffffffffffffffffffffeqdqwd311235235 c 32535
ekfmqiefmpqiefmpqie 8308wjeifew 93203 iowefjoiqef
6-7 717 emqfemfoefm 7210
30239 wijfipejp-0-9nn
8-9 717 84202kdkdi ieofj9230804s sfmi
fsjienigw09209
10-13 313 jwirgjwrigjjrwpogpog49029 8240284
nlsnfiefnqei84984 2389284 14-15 30139103951 jenneofneofnoue2898r 3o28rujfeif02
2048204ujf
г.ewfiemp8302830 r823jacm 8038rujo
16-17 aijiqwfiqj932932-3 ejpeikqpeir 2930ri ewkflmf
oifoij429028842 284028 0239ielmkf"
I split and get array of string.
But I need get rows, whis start of digit in bolt (it’s page numbers)
Help me? pleeeease
Pages can be 3 digits also
J0ska
January 21, 2021, 10:57am
2
1/ In what for you get this data? From a file? As result of OCR?
2/ Could you supply expected result?
Cheers
RPA3
(RPA3)
January 21, 2021, 11:09am
3
Yes, from file with OCR
I need separate rows:
1 - vdaefad 28-9686-87-09 efwefefw 212 14 w424
2-3 jkawnnfno 12849182009 jefoe9nfiwnewn
sefcewfw
4-5 efqwffffffffffffffffffffffeqdqwd311235235 гр 32535
ekfmqiefmpqiefmpqie 8308wjeifew 93203 iowefjoiqef
6-7 717 emqfemfoefm 7210
30239 wijfipejp-0-9nn
8-9 717 84202kdkdi ieofj9230804s SFMI
fsjienigw09209
10-13 313 jwirgjwrigjjrwpogpog49029 8240284
nlsnfiefnqei84984 2389284
14-15 30139103951 jenneofneofnoue2898r 3o28rujfeif02
2048204ujf
г.ewfiemp8302830 r823jacm 8038rujo
16-17 aijiqwfiqj932932-3 ejpeikqpeir 2930ri ewkflmf
oifoij429028842 284028 0239ielmkf "
moenk
(Thomas Meier)
January 21, 2021, 11:44am
6
kantheshm
(Kanthesh Mallesh)
January 21, 2021, 12:42pm
10
Hi @J0ska ,
2nd solution, you have showed above…
As this…
1 - vdaefad 28-9686-87-09 efwefefw 212 14 w424
2-3 jkawnnfno 12849182009 jefoe9nfiwnewn sefcewfw
4-5 efqwffffffffffffffffffffffeqdqwd311235235 гр 32535 ekfmqiefmpqiefmpqie 8308wjeifew 93203 iowefjoiqef
Hi @RPA3 ,
Use can assign an array of string to this:
Regex.Replace(content.Trim(), @"^((1 - )|\d+-\d+)", "#$1", RegexOptions.Multiline).Split('#', StringSplitOptions.RemoveEmptyEntries)
content.Trim() is your string
^((1 - )|\d±\d+) is the pattern to search the page numbers
#$1 is the replacement that adds the ‘#’ before each page number, then we can split by it.
If you don’t want the page number just remove the $1 of the replacement string.