Hello
Using New Sr.StreamReader , Regex.Matches
I am able to pass all sets of 15 keyword one by one to search Bot read to #n number PDF and Extract required information like
PAGENO,
Start position of Kewword,
Kewword+10charector(Start positonn where keyword is found).
My Issue is, that there are more than 100 pdf to be read, each PDF has more than 3000 Pages.
My bot processing time is very bad – as it passes each Keyword from Array, it search through all 3000 Pages of Each PDF and extract PDF Information and moves to next KEyword.
Solution - I am looking for is, while reading PDF first time Search/SCAN it should look for all Keyword from array at one shot/in first run for Each line
This will save Bot Processing time and will give output fast and less load on system.
I am interested in the same solution for a similar process.
In our case, we have a sequence that will capture the text (Extract Stuctured Data ‘TBODY’) in a datatable variable dtClientNotes in which we then use to parse out keywords…
We then define (via an Assign activity) an array {“keyword1”,“keyword2”,“etc”} called Keywords and use a LINQ expression in an assign to locate a/many match(s).
(From notes In dtClientNotes Where notes(“Timestamp”).ToString >= in_FilterDate And Keywords.Any(Function(key) notes(“Note”).ToString.Contains(key)) Select notes).Any
The only problem is we don’t know which keyword(s) was/were match so if you do need this info, you can always loop through each keyword and execute the same LINQ expression, which will then let you log or display etc. This for each was used initially during dev/test but then removed for production use.