Doubt in pdf automation selectors

In pdf automation, I have saved the attachments(pdfs) from mail and I have extracted text from the pdfs by get ocr text activity. Daily the pdf names changes, how to set the selector.

@anjani_priya

use * in title
or pass the file name as variable

cheers

Hi @anjani_priya

If you are downloading new file to the same folder, then use the below expression to get the latest file in the folder.

- Assign -> NewFile = System.IO.Directory.GetFiles("Give the FolderPath here").OrderByDescending(Function(f) System.IO.File.GetLastWriteTime(f)).FirstOrDefault()

Pass the NewFile variable to the Get OCR Text activity then only it will call the new file in that folder.

Hope it helps!!


how to change here ?


how to change at the attach window

@anjani_priya

title is what you need to remove

cheers

should I place variable (folder path) there?

@anjani_priya

You can remove it completely…without title also it would work if only one pdf is open at one time

cheers

the selector is not validating

@anjani_priya

Did you keep multiple pages open?

check if the cls is changing then use only app

cheers

there are so many pdfs I have extracted them by directory.getfiles
how to set the selector that it should accept any pdf file from the given path

@anjani_priya

Did you keep all of them open?

why not open each one after the other

cheers

My process is like
I get the mail so i save the attachments(pdf invocies) and directly the text is extracted from that pdf
this is the process so everytime I should open the pdf files

@anjani_priya

Why are you not usign read pdf text?

chees

its a scanned pdf so i used get ocr text

Hi @anjani_priya

Better approach is to -
Use Read PDF with OCR Activity

Change Image DPI according to your scanned PDF for better output and you can try with different OCR also. I’m here using Tesseract OCR

Hope it will helps you :slight_smile:
Cheers!!

@anjani_priya
Okay so in use application/browser can you please check the window selector may be you missed it

and also in the get ocr text also

cheers

Capture14
its not validating

@anjani_priya

remove title

or use *.pdf

Cheers

can I use same activities for scanned and unscanned pdfs?