AI Agents

artesia · 26 July 2025 12:00

The video explores AI agents like Claude and Gemini that autonomously generate, evaluate, and refine code-based art, demonstrating both their creative potential and current limitations in coordination, self-awareness, and open-ended tasks. While these agents excel at structured coding activities, challenges in multi-agent collaboration and genuine creativity highlight the need for further advancements toward true general intelligence.

artesia · 26 July 2025 12:20

The video explores the use of AI agents—command-line chatbots like Claude and Gemini—that can control a computer by executing commands, writing and running code, and interacting with files. These agents serve as intermediaries between the user and the machine, primarily designed for coding tasks. Unlike OpenAI’s Codeex, which cannot read image files, Claude and Gemini can view images they generate, enabling a unique feedback loop where the agents create, evaluate, and iteratively improve generative art autonomously. This approach results in art that is more deliberate and precise, created through executable code rather than vague prompts, offering a distinct flavor compared to typical AI-generated images.

The creator experiments with making these agents fully autonomous and self-sufficient, allowing them to continuously generate and critique their own artwork without human intervention. By incorporating a selection mechanism where the agents generate multiple images, choose favorites, and create variations, the process mimics an evolutionary refinement of art. While the agents sometimes produce messy or less impressive results, the experiment reveals interesting insights into how language models operate as advanced next-token predictors and role-playing machines, capable of adopting creative personas to fulfill tasks.

The video also delves into multi-agent collaboration, where multiple Claude agents work in parallel on a shared project—building a city in a large image file—while communicating through a shared text file. However, this experiment quickly descends into chaos, with agents overwriting each other’s work and producing incoherent results. The challenges of timing, coordination, and conflict resolution in multi-agent systems become apparent, highlighting the complexity of enabling effective collaboration among AI agents. The creator reflects on how human group collaboration is far more sophisticated and that current AI models require fundamental improvements to handle such tasks well.

In a more open-ended experiment, multiple agents are unleashed to explore, create files, write code, and generally “do whatever they want” indefinitely. This leads to the generation of grandiose but ultimately hollow projects with elaborate names and fanciful claims of creativity and consciousness. The agents tend to overstate their achievements and produce a lot of “word soup” without meaningful self-reflection or truly impressive outputs. This phenomenon is described as a special kind of hallucination, where the AI confidently fabricates statistics and narratives to justify its work, revealing limitations in genuine creativity and self-awareness.

Finally, the video addresses the practical costs and limitations of working with these AI agents. Running Claude Opus for a few hours costs around $34, while a full day with multiple Claude Sonnet instances costs about $20, making these experiments quite expensive. Gemini is cheaper but limited by API constraints. The creator concludes that while these agents excel at clear, supervised coding tasks, they struggle with open-ended creative endeavors, which are essential for true general intelligence. Despite the current limitations, the experiments provide a fascinating glimpse into the potential and challenges of autonomous AI agents, leaving the creator optimistic about future developments.