How to split pdf into paragraphs?

Hi All,

I want to split a pdf into paragraphs. I can’t split it using a newline. because, it will result in each line items. I am using “read pdf text activity”

For Eg:

image

In the above pdf, i need to split into 2 paragraphs

pdfsample

Thanks in advance.

@athirapr After reading pdf split by new line

Hi @athirapr,

After using the read pdf text activity use write line activity and show the output. Based on that I could tell how you could split your text into paragraph.

Hello athira,
You can use the following regex to get all the paragraphs
[^\n]+
The paragraphs will come under the matches

Hi @vishal.kp

I tried your solution.It returns each lines in the pdf not paragraphs.

can you send me the pdf.

i don’t want each lines.I need each paragraphs as shown in the image.

@athirapr You can split by empty line

samplefile1.pdf (3.0 KB)

see this file.

How can i split by empty line.Would you please share the sample code snippet.

@athirapr Here you go with xaml file for splitting paragraph NotepadSpliByParagraph.zip (11.9 KB).Check this and let me know for further assistance.

@athirapr

Use below code in assign activity

system.Text.RegularExpressions.Regex.Split(YourInputStringVariable,“\n\s+”)

more details refer this post

Do you want to spilt one PDF into multiple PDFs having each paragraph ?

Thanks mate. It worked.:+1::grinning:

1 Like

@athirapr

You are welcome

hi

show the xaml

thanks
ashwin.S

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.