Getting text from a PDF extract

redanime94 · May 16, 2022, 9:52am

Hi,

I need to be able to extract some text within a PDF extract…

Below is the sample text:

The quick brown fox jumped over the lazy dog
ABC 02 1234 1234567 00 Y N N N 38 2468 1234567 00
ABC 02 1234 1234567 01 Y Y N N 38 2468 1234567 02
ABC 02 1234 1234567 03 N N Y N 38 2468 1234567 04
Additional Comments (if any)

In the above sample text, I want to be able to get the 3 lines that have the characters ‘ABC’ on it.

From those lines I want to be able to further extract text so that the line will be further subdivided to the following:

Text ABC will be saved as 1 group
Text where it begins with 02 will be saved as 1 group
Y - 1 group
N - 1 group
N - 1 group
Y - 1 group
Text where it begins with 38 will be saved as 1 group

Thanks.

How will I be able to accomplish these please?

supermanPunch · May 16, 2022, 10:20am

Hi @redanime94 ,

Could you Check with the Below Regex Expression :

(.*?)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\w)\s+(\w)\s+(\w)\s+(\w)\s+(\d+)

redanime94 · May 16, 2022, 8:22pm

Hi,

Thanks for that. Partly it works…

How do I bundle up the numbers after ABC, e.g. ‘02 1234 1234567 00’ together? Same with the last set of numbers, e.g. ‘38 2468 1234567 00’?

Also - in my extract I noticed that the second number is not in the correct number. So instead of ABC 02 1234 1234567 01 it appears as ABC 021234 1234567 01.

Thanks.

supermanPunch · May 17, 2022, 3:41am

@redanime94 , Do we have pattern for each of the group that you want to extract ?

For Example, After ABC, there will be only a 2 Digit number, after the 2 digit number there will be a 4 Digit Number, and so on.

So if we do know the exact characteristics/definite pattern for the groups to be extracted, then we may be able to separate them from a mixed group else it wouldn’t be possible.

redanime94 · May 17, 2022, 3:45am

Hi,

The pattern can be random as the input will be coming from a scanned document converted to a PDF. I was actually able to get it but it’s a not a straightforward solution. I needed to use regex to extract those lines that I need then used Substring to get those details within those line of text.

So it’s all sorted for me. Thanks for the initial suggestion, I was able to achieve my goal.

system · May 20, 2022, 3:45am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get text using Regex Activities pdf , activities , question	7	722	June 12, 2022
Regex issue, text group extract Activities pdf , activities , regex , question , regex-extractor	6	954	February 2, 2022
Need Regex code for my text Activities pdf , activities , regex , question	5	1050	August 10, 2021
Using Regex For Extracting info from Pdf Help activities , regex , question	2	738	September 6, 2020
Read specific pdf text using regular expressions Studio uiautomation , activities	34	5849	June 26, 2020

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Getting text from a PDF extract

Related Topics