I need to know if pdf is invoice or reminder, The problem is the reminders contain the word invoice too

Hello,
I don’t know what I can do to fix this.
I have 2 lists stored as CSV data tables containing the keywords I use to search the pdf’s.
So 1 CSV list has the invoice keywords
The other the keywords for searching reminder keywords.

The problem is that the reminder pdf’s sometimes have invoice keywords as well.

Is there a way to fix this?

Please help

Hy @E_lanotte,

If one specific word matches both criteria it is not a useful keyword, I suggest you use this word combined with another so no duplicates are found in keywords

Regards

1 Like

Hy @E_lanotte,

I am glad I could help you :slight_smile:

Please mark this question as solved! Thanks!

Hi @E_lanotte

You might want to look at HashSet. You could create distinct HashSet from your two lists then check for their respective insersection with your pdf’s words: if no element, no match. While not required, you should make the pdf’s words an HashSet too for some performance boost.

2 Likes

Thank you for the answer.
I think it is this I need to do.
I just don’t know how to use this correctly.
Is there an example of the hashSet in a workflow?
I’m sorry for the questions

Hi @E_lanotte

EDIT: sorry I was a little of topic. I keep the example below anyway to help you construct HashSet. If you want to test first for remainder without word from invoice: reminderKeywords.ExceptWith(invoiceKeywords)

Very quick example.

image

lorem.txt (2.8 KB) HashSet.xaml (6.3 KB)