In pdf automation, I have saved the attachments(pdfs) from mail and I have extracted text from the pdfs by get ocr text activity. Daily the pdf names changes, how to set the selector.
If you are downloading new file to the same folder, then use the below expression to get the latest file in the folder.
- Assign -> NewFile = System.IO.Directory.GetFiles("Give the FolderPath here").OrderByDescending(Function(f) System.IO.File.GetLastWriteTime(f)).FirstOrDefault()
Pass the NewFile variable to the Get OCR Text activity then only it will call the new file in that folder.
Hope it helps!!
should I place variable (folder path) there?
You can remove it completely…without title also it would work if only one pdf is open at one time
cheers
the selector is not validating
Did you keep multiple pages open?
check if the cls is changing then use only app
cheers
there are so many pdfs I have extracted them by directory.getfiles
how to set the selector that it should accept any pdf file from the given path
My process is like
I get the mail so i save the attachments(pdf invocies) and directly the text is extracted from that pdf
this is the process so everytime I should open the pdf files
its a scanned pdf so i used get ocr text
Better approach is to -
Use Read PDF with OCR Activity
Change Image DPI according to your scanned PDF for better output and you can try with different OCR also. I’m here using Tesseract OCR
Hope it will helps you ![]()
Cheers!!
@anjani_priya
Okay so in use application/browser can you please check the window selector may be you missed it
and also in the get ocr text also
cheers

its not validating
can I use same activities for scanned and unscanned pdfs?

