AgentHack 2025 - SpeakUP AI

AgentHack submission type

Agentic UI Automation

Name

Asude SENOZLULER

Team name

ABII

Team members

@Berat_Berber, @IremHasretVBM, @ismet.onat

How many agents do you use

Multiple agents

Industry category in which use case would best fit in (Select up to 2 industries)

HR
Other sector

Complexity level

Advanced

Summary (abstract)

This agent automatically transforms videos into multilingual, accessible content by adding translated subtitles and voiceovers. It uses UiPath, AI tools, and FFmpeg to handle the entire process — from audio extraction to final video delivery — saving time and improving accessibility across languages.

Detailed problem statement

We aim to solve the accessibility, language barrier, and manual effort challenges that limit the reach and effectiveness of educational or instructional video content.

Lack of accessibility for hearing-impaired users
→ By adding accurate subtitles and translated voiceovers, we ensure all users can access and understand the content, regardless of their hearing ability.

Language limitations for non-native speakers
→ Many training videos are only available in one language. Our automation translates both text and voice, allowing users to learn in their preferred language.

Manual and time-consuming localization processes
→ Subtitling and dubbing videos manually requires multiple roles: transcribers, translators, voice artists, editors. Our agent automates this entire pipeline, saving time and cost.

Inaccessible content in sound-restricted environments
→ With subtitles, users can consume content even when they can’t use audio (e.g., in public or noisy places).

Limited scalability of traditional workflows
→ Human-led video localization isn’t scalable for organizations with large video libraries. Our agent can process multiple videos quickly, making localization truly scalable.

Detailed solution

1-The user uploads a video or places it in a watched folder. This initiates the automation.
2-The agent uses FFmpeg to extract the audio from the video in .wav format, optimized for speech processing.
3-The audio is processed by OpenAI Whisper (or Whisper.cpp for offline use), generating a highly accurate transcript with timestamps.
4-The transcript is automatically translated into the user’s preferred language using Google Translate API or Azure Translator.
5-The translated text is converted into natural-sounding speech using Text-to-Speech (TTS) services like Google TTS or Azure TTS.
6-Using FFmpeg, the agent:
7-Replaces the original audio with the newly generated voiceover
8-Outputs a final .mp4 file ready for use
9-The localized video is saved to the desired location (e.g., folder, Teams, SharePoint). Temporary files are deleted to keep the system clean.

This solution instantly transforms videos into fully localized content with translated subtitles and voiceover—produced in minutes, without human intervention—empowering organizations and individuals to scale learning, reach global audiences, and ensure inclusive access efficiently.

Demo Video

Expected impact of this automation

This solution drastically reduces video localization time from hours or days to just minutes per video and supports parallel processing of multiple videos without human delays.

It eliminates the need for external transcription, translation, and voiceover services, cutting costs by up to 80–90%. The automation replaces slow manual workflows, providing rapid ROI and scaling without extra staff.

Tasks such as audio extraction, speech-to-text, translation, voiceover, subtitle formatting, and video editing are fully automated, freeing teams to focus on content creation and quality control.

It improves accessibility by supporting hearing-impaired users and multilingual delivery, helping companies meet accessibility and diversity goals.

The process ensures consistent quality across large volumes of videos, reducing reliance on subjective editing.

UiPath products used (select up to 4 items)

UiPath AI Center
UiPath Agent Builder
UiPath Orchestrator
UiPath Studio

Automation Applications

Integration with external technologies

Open AI, FFMPEG

Agentic solution architecture (file size up to 4 MB)

Sample inputs and outputs for solution execution

Inputs: videoFile, fileName, action*, targetlanguage, voiceselection
Outputs: status, message, nextStep, audiofile, detectedLanguage, plainTextTranscript, srtTranscript, plainTextTranslation, srtTranslation, ttsAudioFile

Other resources

3 Likes