Digitize Document Putting Multiple Lines in PDF Onto One Line

I am using the digitize document activity to digitize a PDF’s first page that has this format:

I wasn’t sure why the Regex Extractor wasn’t recognizing new lines until I wrote the digitization output to a text file and found the output of the first page text looked like this (ignore mouse cursor before line 17):

I created a taxonomy for the first page with all the fields I need to extract to extract.

Is there a way to fix this? On every other page, it’s fine with newlines and bullet points. It’s just this the first page doing this. I don’t think the taxonomy is the reason.

Hi @Alex_Marasco: is this a native PDF? Have you tried using the Force OCR flag on Digitize Document?

Yes is it. What is the force OCR flag and how do I use that?

On the Digitize Document activity you have a flag that you can check called Force OCR. This will cause Digitize Document to treat even native PDFs as images, thus potentially improving the results of digitization in cases such as this one.