Document Understanding: Data Extract

“I extracted these data below from a certificate and saved it to Text File, How do I extract specific fields as “University name,qualifcation name, date and ID number” as save to excel.”

UNIVERSITY
OF
JOHANNESBURG
The Council and the Senate of the
UNIVERSITY OF JOHANNESBURG
hereby certify that the degree
Bachelor of Commerce Honours
(with distinction)
with field of study
Information Technology Management
with all its associated rights and privileges
in accordance with the Statute of the
University has been conferred upón
COITSEONE DAPHNEY MOTLHOKI
'et a congregation of the University
Barmale
Purger
Vice-Chancellor and Principal
Registrar
23 March 2021
Johannesburg
or
ID/Passport No: 850425116
N00089027

Hi!

Have you tried with ML. if not try to use ML activities to extract the data.

Regards,
NaNi


Please find the attached I only want to read the highlighted data and write it on excel.

Hi!

RegEx

System.Text.RegularExpressions.regex.match("InputString","(?=UNIVERSITY OF ).+|(?<= degree[\r\n]).+|(?<= study[\r\n]).+|(?<=Registrar[\r\n]).+|(?<=Passport No: )\d+|(?<=upón[\r\n]).+")

Reference:

Regards,
NaNi

Thank you, let me try it.

Hi @Andile_Wayne_Lukhele ,

I have attached a workflow for you, please let me know if this is what you were looking for.

ExtractRequiredData.xaml (7.2 KB)

ExtractedData.txt (548 Bytes)

Kind Regards,
Ashwin A.K

Thank you, it is working fine.

1 Like

Glad I could help!

I’d appreciate it if you marked my answer as the ‘Solution’ so that others facing similar issues may benefit from it, and also so that we close this thread.

Kind Regards,
Ashwin A.K

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.