I am using the digitize document activity to digitize a PDF’s first page that has this format:
I wasn’t sure why the Regex Extractor wasn’t recognizing new lines until I wrote the digitization output to a text file and found the output of the first page text looked like this (ignore mouse cursor before line 17):
I created a taxonomy for the first page with all the fields I need to extract to extract.
Is there a way to fix this? On every other page, it’s fine with newlines and bullet points. It’s just this the first page doing this. I don’t think the taxonomy is the reason.