To generate contextually aware alternate text for images of a pdf according to the clients requirements.

Hi Everyone,

This is a project which I am working on and below mentioned is my current approach.

Kindly review and let me know if there is a more efficient method of doing this project using Python +UiPath.
Common for every approach -
1 -Extract all required images.
(
1.1 - Being able to identify and remove all images not required.
1.2 - Being able to identify all images required
)
2 - Generate a summary for the pdf. Fine tune the ai to generate the context
( context -
1)caption +2)pdf summary + 3) Table of contents + 4) Chapter Heading + 5) Chapter Brief + 6)Content around the image according to a set criteria )
3 - Validation process to be deployed + Fine tuning
3.1- approach 1 - landing page ( website / UiPath action path centre)
options - apprrove, reject, update, feedback
3.2 - Fine Tune - Existing vision model and store the data in a structured database - input - image Plus context - Target - contextual alt text after human validation
how to fine tune the vision model - from a dataset - according to the category of the iamge or category of the pdf.

This will be approach 1 - (Python + UiPath + language model)(image + context( context -1)caption +2)pdf summary + 3) Table of contents + 4) Chapter Heading + 5) Chapter Brief + 6)Content around the image )- using python - where we extract the context

To generate contextually aware alternate text for images of a pdf according to the clients requirements.

:one: Document context

Chapter heading

Section purpose

Document type (manual, book, technical guide)

:two: Local context

Caption (highest priority)

Nearby explanatory text

Figure references

:three: Functional intent

Why the image exists

What the reader is supposed to learn

:four: Client constraints

Sentence length limits

Terminology restrictions

Verb usage rules

Prohibited assumptions

Compliance tone (neutral, instructional, etc.)

Thanks and Regards