PDF Redaction Custom Activity

Hi all,
I am planning to redact a complete string like “this is not valid sentence”
But the pdf redact can only redact a keyword.
Even the regex is not helping if specified in formula.
Could you please advise,how I can redact the whole sentence

Hi Bernard,

Yes, I’m able to extract the phrase from the PDF . I also validate it using the Validation activity and the phrase is highlighted there. But when I pass the Validated result to DU Redact plugin the phrase is not redacted/highlighted.

Thank you!!

Hi Sridevi,

Extracting sentences require you to use the Redaction Plugin for DU. Document Understanding must find the sentence in order for the activity to redact it.

Best
Bernard

Can you share screenshots, or a brief screen recording. Otherwise, can you share the document you’re trying to process and indicate to me the phrase you are targeting?

Best
Bernard

Hi @lawes,

I have successfully completed it and thanks a lot for your support and guidance.

Thanks,
Ithikash

1 Like

Anyone able to get this working with the Google Cloud Vision API? The api call is successful, but nothing gets redacted.

Hi @lawes
This is quite an interesting feature!!
I am trying to use the DU Plugin by passing the extracted field in the “RedactFields” property. The name of the field is as per the taxonomy and I can see the field getting extracted in the ExtractionResult as well.
But I am getting an error -
image

through which I am unable to figure out where am I going wrong.
The field that I am trying to redact is a file number with 14 characters (3 sets of 4 digits each, and each set separated by a space).
Please let me know where could I be going wrong and if there’s anything that I could try?
This is what my property panel looks like -
image

Thanks in advance

does this activity requires document understanding and other activities api? Even if want to use simpler redact with regex/string without the use of doc understanding

i am just using the redaction part, and giving it a list of string to redact, its only successful 70%, though the pdf is very clear and when using pdf extract can get all text accurately, is this due to ocr ? why use ocr if the pdf is readable ?

Hi @Jay_Chacko - There are two activities in this package. One does not require Document Understanding. The other does.

“PDF Redaction”: Redacts single contiguous words, no spaces, based on regex patterns
“DU Redaction Plugin”: Redacts extracted content from Document Understanding

I’m disappointed to hear of a success rate of only 70% with “PDF Redaction”. Let me see if I can help you. Can you share a screenshot of the section where redaction failed, and let me know what regex patterns were used to redact them.

I’ll like to take a look at this? Is the document you want to redact shareable to me? Is Google Cloud Vision API returning the OCR’d text?

I will share a fully working XAML project in this thread shortly. Would you mind sharing your XAML with me so I can take a look?

i am using the pdf redaction and sending an array of key words, i also placed the array where the default social and other regex goes too.
its a large list , but that list came originally from the same document using UiPath pdf reader , so completely readably , any reason why the redaction is using ocr only instead of a regular pdf read ? Maybe it should be an option ? Unfortunately its phi data and hard to share.

if we can control the dimension of the redaction , it would work in my case also, since mine is a table inside pdf , so i can just look for the first row item and extends the redaction dimension all the way to the last column, instead of looking for every string

I did use the Google Cloud Vision OCR in a Get OCR Text activity with no issues. However, it doesn’t seem to work in the Redaction activities.

@Jay_Chacko - for your use case, redacting the entire table can certainly be done through using DU Redaction Plugin. That would be far more efficient and reliable than using a bunch of keywords.

The standard PDF Redaction requires OCR. Document Understanding can work without OCR. So once again, the best bet would be to use the DU Redaction Plugin for your usecase

Hi @AI_DEV, thank you for letting me know. Allow me some time to investigate into this on my end.

1 Like

Hi Bernard,

Great tool for redaction, I just learned a lot from your samples and documentation. Great job, thanks for your contributions. Just asking if adding text or confidential on the redacted is already in the released. Thanks in advance.

Keep it up,
Al

Hi Bernard
Apologies for the delayed response

We had figured out why the DU plugin was causing these issues and had sent a detailed mail to the support team. I am not sure whether this mail reached you. If you could share your email with me, I will forward the same.

A high level overview of the issue -
In DU Taxonomy, there is a provision to add a Table as a type for a field. This fails in one of the Invoke Code activity of the project (EOM to DT), since the piece of code only takes care of the primitive data types. (A Table traversal method should be enough to solve this issue, because tables contain fields which are of primitive data types)

The way we surpassed this issue, was to use KeyWords instead of ExtractionResults properties, because then that particular Invoke Code activity is not used. Let me know if this helps anyone
image

Bernard,

Very cool Redaction Activity. Just got it running and need to tweak some things. How can I just limit the redaction to DOB and Driver License numbers?

Thanks in advance,
Mike

Hi @lawes , you have done a great job by creating this tool! But when I am upgrading all packages from old to latest one, it is showing error. Should I not upgrade any package and use it as it is?