Identify strikeout pages in pdf automation

For example, let’s assume that the pdf documents have 100 pages, the last 30 pages are strike out.(total Strike out pages may vary for each documents)

In these case, How to identify the strike out pages in the pdf documents and get the remaining pages from the documents for pdf automation?

@Iswarya_P1

I dont think there is a direct way to identify the strikeout pages…

We need to either find any keyword which says strikeout or anything else or try identifying some difference in text or data read from page for a strikeout page and the normal page

Can you provide few samples which might help in analyzing

cheers

@Iswarya_P1

I dont think its possible to identify normally as they are hand written and no possible data difference is observed in the data when read

you can try using any image classification models from ai centre and find the strike out page or not

https://docs.uipath.com/ai-fabric/v0/docs/image-classification

cheers

Okay, Will check. Thank you

1 Like