Regex pattern for pyhton

Jhun_III_Daganzo · August 4, 2023, 7:44am

can someone help me with regex, I have been trying to get the data from a pdf file and I cant seem to get the right regex for it.

I just want to get the names as highlighted and ignore *,A,B,C,ABC and all the rest to the right with long white space. Initially I got this regex:

[A-ZÑ ]+,\s\w*[A-ZÑ]\s\w*A-ZÑ.

Konrad_Mierzwa · August 4, 2023, 8:39am

Hi.
How many spaces or other white characters are between names (BERNANDO DABALOS) and column on the right (BALATGUTI)?
It will be helpfull information, which we need to separate words from the column on the right

Vikas_M · August 4, 2023, 8:50am

Hey @Jhun_III_Daganzo ,
Try the below regex it might help you out

([A-Za-z]+, [A-Za-z]+\s[A-Za-z]+\.[A-Za-z]+)|([A-Za-z]+, [A-Za-z]+)

Below is the output screenshot

Hope it helps you out

mkankatala · August 4, 2023, 8:53am

Hi @Jhun_III_Daganzo

Provide the Input data as text then it will help us to extract the data easily.

Jhun_III_Daganzo · August 4, 2023, 9:59am

Jhun_III_Daganzo · August 4, 2023, 10:00am

Thanks for the help but it still does not capture when using multiline.

Here-> regex101: build, test, and debug regex

Jhun_III_Daganzo · August 4, 2023, 10:06am

sample data here →

Vikas_M · August 4, 2023, 10:07am

@Jhun_III_Daganzo ,
Can you once try with below regex
\b\w+, \w+(?: \w+)?(?: \w+)?(?: \w+)\b

Jhun_III_Daganzo · August 4, 2023, 10:13am

Its almost capturing the data but it still captures the data from the right with long spaces. Is there a way to not capture the data after a long space?

Vikas_M · August 4, 2023, 10:17am

$ symbol will capture till end of the string

try with the below regex
\b\w+, \w+(?: \w+)?(?: \w+)?(?: \w+)?(?=\s|$)\b

mkankatala · August 4, 2023, 10:28am

Hi @Jhun_III_Daganzo

Check the below regular expression

[A-Z]+\,\s+[A-Z]+\s+[A-Z]+\s{1}[A-Z]+|[A-Z]+\,\s+[A-Z]+\s+[A-Z]+|[A-Z]+\,\s+[A-Z]+.*\s+[A-Z]+\s{1}[A-Z]+

Hope it helps!!

Jhun_III_Daganzo · August 4, 2023, 10:39am

Thank you for your help. I really appreciate it.

mkankatala · August 4, 2023, 10:39am

Thank you @Jhun_III_Daganzo

Hope you find the solution. Make mark it as solution to close the loop.

Happy Automation!!

Jhun_III_Daganzo · August 4, 2023, 10:39am

Thank you for your help.

system · August 7, 2023, 10:40am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can help on regex activity Activities uiautomation , pdf-extraction , pdf-to-excel	9	64	December 21, 2024
PDF REGEX Activities pdf , activities , question	11	1554	March 4, 2021
Regex assistance Help pdf , data_scraping , regex , question	8	1336	November 11, 2019
Fetch the formatted word from the clipboard Help	13	2064	October 29, 2018
Regex Based Extractor Help activities , regex , question	5	1420	January 6, 2020

Regex pattern for pyhton

Related topics