I’m having issue in PDF Text testing.
I have pdf, inside of it i have table.
I’m taking the table text as a long string, and then I am generating table using the generate data table activity.
then, I’m going over each row and doing my test.

the problem is in the pdf there are sometimes areas that the text is too long, so it jump to another row.
when I’m taking this text , I’m getting this new row text and I cannot know which column it belongs to in this table.

I hope my question is clear, I’d like to know if someone was facing this issue too and can maybe help.

in the added picture- notice the number 98 that is jumping to next row.


Hi @shirH ,
I got it
You read pdf file to get String then generate to data table
that is right?
It will split by space character, It will fail value with long text,
have you try extract data form pdf file?
can you share your file?


you can extract entire table as a table structure no need to use generate data table activity

use document understanding activities and it will helpful to extract the table which is in the pdf


It’s basically due to pdf format change or instability
Can we standardise this step by any chance
Because i feel this is crucial and if it keeps changing then your string manipulation steps won’t work for different format
Format has to be same always and standard
That’s the basic requirement of an automation

Is this a package i need to install?

Thanks. I cant share the file =/
What i did it read the text of this pdf and use generate data table from text activity


Ya you need to install
Document understanding ML activities



You can see that topic
file same your file
hope it help,