PDF data extraction. Scrape PDF text - error


Hey guys, this rookie has an enigma that requires your UiPath expertise to help.
So i downloaded this project from UiPath website (https://www.uipath.com/kb-articles/pdf-data-extraction-scrape-pdf-text)
However, when i try to run the process, it gives me this error:

Does anyone know what is it error and how to fix it?

Many thanks.

@DanielNguyen Your assigning element of array which is not present in array to a variable. I unable to download the file. If u want further help u can upload the file here and i will look into it.

@Manjuts90, thanks for your reply :slight_smile:
There you go.
FromPdfToTextSample.zip (100.6 KB)

@DanielNguyen Add -1 in while condition, Rest of the things will work fine.


Thank you so much for your help :slight_smile:
There’s 2 more things i need to ask you. I’m trying to create an automation like the screenshot below.

It’s like you scan through the document, copy the content into another file (eg: text file), then lets say Count the total words in the documents. Then depends on the number of total words, the bot will take difference action in its next step.
For example. if a document has less than 10 words, print out a message say " there are 10 words in this document". if there are 10-30 words in the document, print out a different message and email the admin(which is me) at the same time. If there are 30-50 words in the document, email the admin and with the attachment of the document.
After all, i need an event log with: Case ID, action taken.

Do you know how can i create such automation and capture this even log?

@DanielNguyen After reading document split the text with respect to space and count the number of words. Based on number give appropriate messages, logs and other things. This can be done using decision activity or if loop. Better coding use decision flow.

@Manjuts90: Hi, thanks for coming back :slight_smile: below is my result. Did you get a notification when i comment on this thread, though i forgot to tag your name?
My pdf has more than 10 words but it triggers the wrong message box. How should i assign my TotalWordCount variable?
I tried to put another condion for PDFText.Split(" ") but it seems like no space trimming happened.

@DanielNguyen what result u getting?

@Manjuts90 Main.xaml (20.5 KB)

@DanielNguyen try below workflow and let me know.

Main.xaml (20.7 KB)

@Manjuts90, It still give the same message, though the document has more than 10 words, where did you make changes ?

@DanielNguyen I have made changes in if conditions. i checked with sample pdf from online it is working fine for me. Can u share sample PDF so that I can check with that PDF?

@Manjuts90customerInfo.pdf (102.2 KB)

@DanielNguyen In your workflow, ur counting the items in the array after splitting with NewLine. Dont split the string, Directly use urString.count to get the number of words in string(in ur case pdf).

for your refrence check below workflow

Main.xaml (23.7 KB)

it works perfect! thank you so much! :slight_smile:
By the way, why did you have a WriteLine at the end of your process? does it make any significant difference here?

1 Like

Hi @Manjuts90, i really need your help to take a further step here.
What I have done with my current process is that the ‘trace’ that I do are more or less the same, i.e. copy a pdf content to a tex file and display a message. The only ‘different’ thing done here is the message being displayed.

So, i want to run this process 10 times,
with 6-run noise free (e.g., it takes path A, B, C),
2-run with some noise (e.g. it takes the path A, B, X, C, Y, Z)
And another 2-run with high level of noise, (e.g. A, X, Y, Z, B, Z, Y, C, Y, Z, X)

I would want to see activity ‘X, Y, Z’ to be the ‘noise’ in the 40% of the cases. These X, Y, Z could be related to the opening of a news website, a facebook page, or ebay site, gmail, etc.

So in this case, the trace with noise will be something like:

  • Navigate to a folder (A)
  • Open a browser (X)
  • Copy a file from the folder (B)
  • Type in ‘facebook.com’ on the browser (Y)
  • Navigate facebook (Z)
  • Type in gmail.com on the browser address (Y)
  • Read an email from gmail (Z)
  • Display a message about the length of the copied file ©

How should i start with this?

@DanielNguyen I am not getting your question, i understood u want to use gmail in your workflow. I have not used it so far. May be u should check with others in the forum

@Manjuts90, Yeah i agree the scenario i provided was abit confusing. Lets work on the process that you help me fixed few days ago.
So the current process count total words in a pdf and display a message box.
Now lets improve that process and I’m looking for:

  1. multiple reading of few pdf in the same folder
  2. depends on the total word of the pdf, the bot will do either:
    2.1) email admin, which is me, the document
    2.2) email admin, and open Chrome
    2.3) open Chrome and go to gmail.com
    Please let me know if you have any suggestion on how i should do this.

@DanielNguyen u can do it these changes in the workflow provided by me before only na. What doubt u having in it.

Check below file for reference.

Main.xaml (25.0 KB)

Hi @Manjuts90, is the activity i cant load an “outlook send mail” activity?