Read PDF Text Strange Anomaly

rmorgan · September 24, 2021, 9:39am

So all the pdfs are invoices from one company so the same formatting for everyone

Reading PDF to Text
Generates a Datatable
Saves as CSV

the anomaly is that when it gets down to the invoice rows it likes to concatenate 2 rows together, then the same amount of rows later the same thing again then again and again until the end of the row items

I’ve tried using OCR but Read ppdf text gives me the cleanest output for further manipulation

MateuszSzatkowski · September 24, 2021, 9:50am

Hi Rmorgan

If you save pdf to text. Copy text to f.e notepad ++ and check “hidden” characters . Maybe on the end of it there is missing of
and that is why you can get 2 rows in one line.

rmorgan · September 24, 2021, 11:21am

yes they are missing

uKisuke · September 24, 2021, 4:20pm

I believe this is happening at each pdf page.
Try to see if there is any char at the end of the page (when getting text from pdf), if there is nothing, the only way I can think is doing the pdf reading page to page.

rmorgan · October 1, 2021, 4:06pm

yeah, there’s no discernible char to use to signify, I think my only option is to split into single pages and do the read pdf text and stitch it together

uKisuke · October 1, 2021, 7:45pm

What I recommend that you do:
Get PDF Page count, loop through pages (based on the page range), get the page content to a temp variable and then concatenate with previous pages in another variable or use append to excel to put it directly. And don`t forget to add a new line when merging the pages.

PS: I would prefer to store everything in variables then export to excel (or csv).

Hope this helps you

Topic		Replies	Views
PDF Data Extraction in csv Activities pdf , activities , question	13	1751	June 23, 2021
Pdf text to excel Help	4	920	June 13, 2019
Text in pdf row is jumping to next row Studio uiautomation	7	311	September 28, 2023
PDF to Excel conversion in Irregular format Studio studio , question , activities_panel	13	494	August 10, 2023
Another PDF Table Extraction question Activities pdf , activities , question	1	300	July 18, 2023

Most Active Users - Yesterday
ashokkarale
sonaliaggarwal47
More details...

Read PDF Text Strange Anomaly

Related topics