Train DU using one type of document and applying it to another one

Hello,

I am new to DU and need to clarify my mind and some guidance. How would you approach this case?
I have documents which have a notes section from which the bot has to deduct an “issue code” based on these notes. After we identify which notes corresponds to which code, at that point we need to apply these knowledge to PDF documents which may or may not have “issue codes”, if it does not specify the actual code it needs to deduce it from its contents.

what would be the best approach?

Thank you

@michelle.soto

As per your case using generative extractor might make sense as it is mostly semi structured or unstructured doc

Gen extractor works based on the prompt ..so you can explain it in the prompt

Cheers

Could you please elaborate on your suggestion?
Can I retrieve these descriptions and let the bot know that a certain description belongs to a certain code if code is not within thw form?
Can I train the model to relate a description to a code that may or may not be in the code?
Once model is deployed can I use these model to identify the codes based on a pdf content?

Thank you

@michelle.soto

You dont need to train but need to provide prompts explaining what you need ..where it would be present and what to do if not found and keep on refining prompt to get proper details

please check ixp

here is a demo

cheers

1 Like

My understanding is that I can use Generative Extraction to read the notes from a document and based on the table (other document) we can deduce which code the notes are referring to, correct?

Once the above part is completed, how do we apply this knowledge to find out which code is related to a certain pdf file by only having descriptions on it, not the actual code.

@michelle.soto

the first part is also same right based on note from doc you are trying to identify code..so base is the note and you know from which pdf you got the note from too

cheers

I apologized if I am not being cleared… I have 3 different documents.

  1. One form which contains notes,
  2. one excel file which contains codes and descriptions and
  3. one PDF.
    I want to train the model based on the notes and excel and than apply this model to identify the code within the PDF. Is this possible? You already mentioned that we can get a code based on the notes by using doc 1 and 2 but can I apply this knowledge to PDF ? Meaning getting the code based on the PDF contents?

@michelle.soto

So if i understand correctly excel has both description and codes

And two pdfs are present which have descriptions and not codes and you want the data to be extracted from them

For second part you can ready the full pdf and then send it to generate text activity which can help you extract the part which is relavant in the sencond pdf too

Cheers

  1. Excel file with codes and descriptions
  2. form with notes
  3. pdf containing context of the code but not actual code

Yes, but the descriptions from the excel file would not be found word by word in the pdf file. Therefore, it has to learn to identify a code by it description and context; the notes from the form can provide more details about the codes.

Is this possible?

Thanks for you quick and smart responses.

@michelle.soto

I understand it does not match exactly ..that is why recommended to use generative capabilities, as they can understand context based on the details provided

cheers

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.