Reading and moving files based on matching criteria


#1

Hi!

First of all; I just started my RPA journey and have no previous experience with programming. That said I find it very interesting and hope to some day be able to contribute to this community. I would be forever grateful if someone could help me out with my problem.

I am trying to create the following process:

I have a folder with a bunch of documents (“Data/Input”). Some are .pdf and others are .doc/.docx. I want to be able to read all of these documents and search for some specified keywords. If there is a match in the document for the specified keyword(s), I want to save the file into another folder.

To make it easy to change the keywords I would preferable like to store the keywords in a Config file.

So far I have been able to get the files, but as there are different activities for reading PDF and .doc/.docx i tried see if the string contained .pdf or not, and based on that tried to use the correct activity. For some reason, it thinks that non of the files contains “.pdf”. Furthermore, I am struggling to find out how to proceed based on this information - how can I select the correct files to read (pdf/word).

Again, it someone could help me out that would be really great. :slight_smile:

Test.xaml (9.9 KB)


#2

@odajohnsen Check the below file, it is identify pdf and doc files correctly.

Test.xaml (8.4 KB)


#3

See this workflow -

Main.xaml (11.0 KB)

You can have such folder structure that you can read all files from a folder and after reading them you can copy them according to extension

image

Workflow shared by @Manjuts90 is right too


#4

Thank you both (@Manjuts90 and @prankurjoshi)! :smile:

Could you (or anybody else) by any chance show me how to choose the different files. Let’s say that a item in the string array “FullPath” contains .pdf. How do I choose this specific file in the loop? I prefer not to use OCR, but the Read PDF text activity. I have the same question in regards to the reading of the word documents.


#5

You can write Directory.GetFiles(“DirectoryPath”,"*.pdf") such code will fetch files with specific extension. While you are in a loop and you have condition for specific file all the operation would work on that file only.
You can perform operations on different type of files in same loop only


#6

So it’s not possible to get all the files first, and then read them accordenly? You have to first only get the .pdf files etc.? I tried changing my workflow based on your first suggestions, but as you can see I am not able to run it.

Test2.xaml (11.0 KB)

I am sorry for all the “beginner level questions”, and I am very grateful for any help I can get.


#7

No no that was an example if you want a specific file type if you wont pass that parameter you will get all the files from a folder.

You are already having the full path of the file when you are in For Each loop so in that loop you do not need to pass Directory.GetFiles again assuming that you would be only having PDF and Word files. You can simply pass the item like this


#8

Aha - that makes a lot of sense. Sometimes it’s really easy to make things more complicated than neccesary. :stuck_out_tongue: Thank you so much!

I have one more question though; in the copy file activity i tried putting in the location of the folder I wanted to move the file to, but for some reason I have not done it correctly. Ideally it would go the output folder Data\Output.

image

Looking closer at it though it might not acually be the output location that is wrong but rather the input item? Let me know if you want me to post an updated workflow.