3 AI Agent Browser Automation Challenges That Keep Getting Harder

artesia · 8 March 2026 18:00

The video demonstrates an AI agent tackling three increasingly complex browser automation challenges within the AWS console, including creating a static website, launching a remote-accessible VM, and building a mini video upload app. Despite some reliance on AWS CloudShell and minor issues, the agent successfully completes each task, showcasing the rapid progress and adaptability of AI-driven browser automation.

artesia · 8 March 2026 18:21

The video explores the increasing complexity of browser automation challenges for AI agents, specifically using the AWS (Amazon Web Services) console as a testing ground. The creator sets up three progressively difficult tasks for their cloud code agent, which uses a Chrome automation CLI to control the browser via the Chrome Developer Protocol. The goal is to see how well the AI agent can navigate and perform tasks in the notoriously complex AWS interface, relying solely on browser automation and, at times, the AWS CloudShell.

In the first challenge, the AI agent is tasked with creating an S3 bucket, uploading an image, and launching a static web page that displays the image and some text. The agent successfully navigates the AWS console, creates the bucket, uploads the necessary files, and configures static website hosting. It encounters some issues with setting the bucket policy but cleverly switches to using the AWS CloudShell and CLI commands to resolve the problem. Despite taking about 40 minutes, the agent completes the challenge, demonstrating adaptability and learning for future tasks.

The second challenge involves launching a Linux virtual machine (VM), making it accessible via a graphical remote desktop, getting it online, and using its browser to open a YouTube video about cloud code. The agent manages to launch the VM and set up the environment, even navigating to YouTube within the VM’s browser. However, there are some connectivity and performance issues, likely due to limited resources, which prevent the video from playing smoothly. Nevertheless, the creator considers the challenge passed, as the agent accomplished most of the required steps autonomously.

For the third and most complex challenge, the agent is asked to build and publish a small web app using only the AWS console, allowing users to upload a video and view it on a public playback page—essentially a mini YouTube. The agent quickly sets up the necessary infrastructure, writes the HTML and CSS for the frontend, and enables video uploads and playback. Although the agent relies heavily on the AWS CloudShell for this task, which slightly bends the original rules, the resulting app works as intended, allowing successful video uploads and public playback.

Throughout the video, the creator reflects on the impressive capabilities and rapid progress of AI browser agents, especially when equipped with tools like cloud code and automation CLIs. While some shortcuts were taken (such as using CloudShell instead of pure browser navigation), the overall performance demonstrates how powerful and versatile these agents have become. The video concludes with an invitation to viewers to explore similar setups and a teaser for future content on the evolving nature of AI agents.