String Manipulation - Grouping

Hie guys. I have a pdf which i read using Read PDF Text and my output is as shown below.

University Name: H.I.T

Address: Harare

Program: Computer Science

Level: 1

Block 4:

Student Name: Albert

Course: Programming in C

Percentage: 70

Student Name: Maxwell

Course: Programming in C

Percentage: 85

University Name: H.I.T

Address: Harare

Program: Electrical Engineering

Level: 2

Block 4:

Student Name: Obert

Course: Analogue Electronics

Percentage: 85

Student Name: Charmaine

Course: Analogue Electronics

Percentage: 78

Student Name: Obey

Course: Analogue Electronics

Percentage: 99

My result should be in excel like the screenshot i have attached.

Output

1 Like

One of the basic techniques is about

  • split the text on Linebreaks into an array like
strText.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)

then loop over the lines and use the value for populating a previous prepared datatable

Hi @Tapiwa

Implementing the approach suggested by @ppr above. Please go through the xaml file attached.

StringManipulationGrouping.xaml (29.7 KB)
pdf.txt (561 Bytes)

Hey @Tapiwa

Here you go…

JsonConvert.DeserializeObject(Of DataTable)(JsonConvert.SerializeObject(new Dictionary(Of String, Object)(strText.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries).Select(Function(line) new KeyValuePair(Of String, Object)(line.Split(":"c).First().Trim, line.Split(":"c).Last().Trim)))))

Dependencies: Newtonsoft.Json

Dictionary conversion used here is optional, but I felt it will be more cleaner

Hope this helps.

Thanks
#nK

im getting an error on
pdftxt.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries) where pdftxt its my variable name having variable type String.
The error is
) expected
** ; expected**
** } expected**

Hey @Tapiwa

Could you please try this once…

Thanks
#nK

Hi @Tapiwa

Could you share the screenshot of your workflow and error

pdftxt.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries) that’s where the error is coming from on for each.

Hi @Tapiwa

Try this expression

pdfTxt.Split(new char[] {'\r', '\n'}, StringSplitOptions.RemoveEmptyEntries)

or

pdfTxt.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)

Thanks @kumar.varun2

pdfTxt.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries) worked

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.