Identify and Extract Dynamic section from long document (pdf)

We have very challenging task that we are trying to identify a particular section (title, text, footer ID) in a long document. Since the section is very dynamic, I mean its title, content and even the ID could vary, so AI and/ML should be used. AI center model was trained for this purpose but it is not able to identify the section properly from beginning to ending. Also Generative extractor is tested but due to UiPath is using gpt3.5 and doing some filtering before calling the LLM, its performance is poor. We have tested our own azure openAI and it works pretty good. We would like to use only UiPath functionality to mitigate this problem. Any idea is valuable. Thanks.

@barissecen

welcome to the community

section cannot be extracted via regex also?mostly with explaination I might think no…but just to confirm

also did we happen to try with ai center and trainign any existing model

as Chatgpt connector is not working…if you have your own subscription then try to use http request and call those services directly

cheers

1 Like

Hi @barissecen ,

If possible, Could you let us know what was the prompt used with Generative Extractor ?

Also, could you try with Extract Document Data activity and check if it is able to provide a better result ?

Additionally could check on the below docs as well if not already done :

Correct, regex is not suitable. We want to solve only using UiPath functionality, do not want to use our own LLM. Thanks.

@barissecen

Then try to make the prompt better in the openai connector

Try different ways to properly get the result

Cheers