INSANE Codex CLI MCP AI Video Automation Workflow

merefield · 11 September 2025 13:01

The video demonstrates how to use MCP servers within Codex to automate the creation of dynamic avatar videos by processing a single image and segmented audio, incorporating tools like Nano Banana and the Omni model for varied camera angles and video generation. It also showcases integrating external data via the Reddit MCP server to generate content-rich voiceovers, highlighting the workflow’s flexibility and encouraging viewers to explore MCP automation for streamlined video production.

merefield · 11 September 2025 13:23

In this video, the creator explores setting up and using MCP (Modular Code Processing) servers within Codex to automate the creation of avatar videos from a single image and an audio file. This is the first time they have experimented with MCP servers in Codex, having previously tested similar workflows on Cloud Code. The video demonstrates how to generate dynamic avatar videos by splitting audio into segments and creating different camera angles for each segment using various MCP tools, including the Nano Banana server for image edits and the Omni model from ByteDance for video generation.

The workflow begins with a user providing an image and an audio file, or alternatively generating audio using the 11 Labs MCP server. The audio is split into 5-second chunks using ffmpeg, allowing the system to create multiple video segments with varying camera angles and perspectives. These segments are then merged together with background music to produce an immersive and engaging avatar video. The creator shares that all MCP servers used in the workflow are available on their GitHub repository for members, complete with instructions for use.

After successfully recreating the initial avatar video workflow in Codex, the creator experiments with a new approach by integrating the Reddit MCP server. This server is used to fetch top posts from the Singularity subreddit, which are then converted into voiceover scripts using the 11 Labs MCP server. This script is used to generate a longer, more content-rich avatar video, demonstrating the flexibility and power of combining multiple MCP servers to automate content creation from external data sources.

The creator walks through the process of generating a new voiceover script, creating corresponding audio, and pairing it with a new image to produce a talking head video suitable for social media platforms like TikTok or YouTube Shorts. Despite some minor tool call errors during the process, the overall workflow performs well, producing a polished video with varied camera angles and synchronized audio. The creator notes that while Codex performs admirably, Cloud Code still offers slightly better integration with MCP servers, but Codex’s ability to follow complex instructions is impressive.

In conclusion, the video serves as an inspiring demonstration of how MCP servers can be leveraged within Codex to automate the creation of dynamic avatar videos from minimal inputs. The creator encourages viewers to explore MCP workflows themselves and highlights the potential for further enhancements and experimentation. They also invite interested viewers to become members to access the shared MCP servers and resources, promising ongoing development and future content on this topic. Overall, the experiment showcases the exciting possibilities of combining AI tools, modular servers, and automation to streamline video content production.