Hi Everyone,
This is a project which I am working on and below mentioned is my current approach.
Kindly review and let me know if there is a more efficient method of doing this project using Python +UiPath.
Common for every approach -
1 -Extract all required images.
(
1.1 - Being able to identify and remove all images not required.
1.2 - Being able to identify all images required
)
2 - Generate a summary for the pdf. Fine tune the ai to generate the context
( context -
1)caption +2)pdf summary + 3) Table of contents + 4) Chapter Heading + 5) Chapter Brief + 6)Content around the image according to a set criteria )
3 - Validation process to be deployed + Fine tuning
3.1- approach 1 - landing page ( website / UiPath action path centre)
options - apprrove, reject, update, feedback
3.2 - Fine Tune - Existing vision model and store the data in a structured database - input - image Plus context - Target - contextual alt text after human validation
how to fine tune the vision model - from a dataset - according to the category of the iamge or category of the pdf.
This will be approach 1 - (Python + UiPath + language model)(image + context( context -1)caption +2)pdf summary + 3) Table of contents + 4) Chapter Heading + 5) Chapter Brief + 6)Content around the image )- using python - where we extract the context
To generate contextually aware alternate text for images of a pdf according to the clients requirements.
Document context
Chapter heading
Section purpose
Document type (manual, book, technical guide)
Local context
Caption (highest priority)
Nearby explanatory text
Figure references
Functional intent
Why the image exists
What the reader is supposed to learn
Client constraints
Sentence length limits
Terminology restrictions
Verb usage rules
Prohibited assumptions
Compliance tone (neutral, instructional, etc.)
Thanks and Regards