Regex pattern for pyhton

can someone help me with regex, I have been trying to get the data from a pdf file and I cant seem to get the right regex for it.


I just want to get the names as highlighted and ignore *,A,B,C,ABC and all the rest to the right with long white space. Initially I got this regex:

[A-ZÑ ]+,\s\w*[A-ZÑ]\s\w*A-ZÑ.

Hi.
How many spaces or other white characters are between names (BERNANDO DABALOS) and column on the right (BALATGUTI)?
It will be helpfull information, which we need to separate words from the column on the right

Hey @Jhun_III_Daganzo ,
Try the below regex it might help you out

([A-Za-z]+, [A-Za-z]+\s[A-Za-z]+\.[A-Za-z]+)|([A-Za-z]+, [A-Za-z]+)

Below is the output screenshot

Hope it helps you out

Hi @Jhun_III_Daganzo

Provide the Input data as text then it will help us to extract the data easily.

1 Like

Thanks for the help but it still does not capture when using multiline.

Here-> regex101: build, test, and debug regex

sample data here →

@Jhun_III_Daganzo ,
Can you once try with below regex
\b\w+, \w+(?: \w+)?(?: \w+)?(?: \w+)\b

Its almost capturing the data but it still captures the data from the right with long spaces. Is there a way to not capture the data after a long space?

$ symbol will capture till end of the string

try with the below regex
\b\w+, \w+(?: \w+)?(?: \w+)?(?: \w+)?(?=\s|$)\b

Hi @Jhun_III_Daganzo

Check the below regular expression

[A-Z]+\,\s+[A-Z]+\s+[A-Z]+\s{1}[A-Z]+|[A-Z]+\,\s+[A-Z]+\s+[A-Z]+|[A-Z]+\,\s+[A-Z]+.*\s+[A-Z]+\s{1}[A-Z]+

Hope it helps!!

Thank you for your help. I really appreciate it.

Thank you @Jhun_III_Daganzo

Hope you find the solution. Make mark it as solution to close the loop.

Happy Automation!!

Thank you for your help.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.