Can anyone help me to get the data in the PDF files which is having different paragraphs in it and each paragraph is having a heading .
I will pass the heading from asset and bot should be able to scrape the data below that heading.
can anyone please suggest me on how to do it ,. file-example_PDF_1MB.pdf (1017.7 KB) This is the sample PDF I found online.
or can we just divide the PDF and get the data into paragraphs which an empty line after each and every line ?
When I was working with read PDF as text , it was not giving me any empty lines after each paragraph .
If your heading are predefined/known, you can do that by making your PDFText into an array then searching the array for the heding text and take note of the heading text index in the array. Then extract all the lines/indexes after the heading until the next heading
If the headings are not known beforehand then that would be a big challenge as your PDF is already at text form now so no more identifiers unless the PDF is standard that there is no line breaks except when going into the next header
Im using UiPath PDF activities read pdf text and it is getting the text on the PDF exactly as it is in the PDF. Like how every line is in its own line and not a continuous text.