Trying to extract the email address info@amazon.com from a pdf file using regex.
I have a few different regex codes for pulling email addresses which work perfectly but only on other pdfs, not this one.
This code returns nothing, \b[\w.-]+@[\w.-]+.\w{2,4}\b
This code includes the characters on the following page,
[a-z0-9!#$%&'+/=?^_{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_{|}~-]+)@(?:a-z0-9?.)+a-z0-9?
As you can see from the screenshot its pulling in extra characters from the following page.
Ive attached the pdf, the email is on the bottom of a particular page but the regex is pulling the first word on the next page also.
I can easily trim it off but wondering why its happening and can it be solved purely by regex.
Lorem Ipsum passages, and more recently with desktop publishing software
like Aldus PageMaker including versions of Lorem Ipsum. info@amazon.com
Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s,
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Its definately the pdf, this is the read pdf text, its joing the email with the word on the next page…
Lorem Ipsum passages, and more recently with desktop publishing software
like Aldus PageMaker including versions of Lorem Ipsum.
AMAZON
Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s,
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
when an unknown printer took a galley of type
and scrambled it to make a type specimen book.
It has survived not only five centuries,
but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software
like Aldus PageMaker including versions of Lorem Ipsum.
info@amazon.comLorem Ipsum has been the industry’s standard dummy text ever since the 1500s,
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
when an unknown printer took a galley of type
and scrambled it to make a type specimen book.
It has survived not only five centuries,
but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software
like Aldus PageMaker including versions of Lorem Ipsum.