I Tested 4 AI Agents That Promise to Do My Work. The Best Scored 1 Out of 3

The video evaluates four AI agents promising autonomous work completion, finding that while tools like co-work show potential with memory and editable outputs, none fully achieve persistent memory, editable artifacts, and compounding context necessary for truly effective AI agents. It emphasizes a three-layer architecture—knowledge storage, agent workflows, and continuous learning—and encourages a critical approach to AI tools, highlighting the importance of these foundational principles for future development and adoption.

The video explores the emerging trend of AI agents that promise to autonomously complete outcome-focused work, allowing users to “sit back and have coffee.” Despite the hype around tools like Anthropic’s co-work, Microsoft’s co-pilot co-work, Lindy, Sauna, and Google Opal, most of these agents struggle with the core challenges of persistent memory, producing editable artifacts, and compounding context over time. The speaker emphasizes that while co-work caused significant market disruption by threatening traditional SaaS companies, it still falls short in key areas such as persistent memory and continuous context improvement, highlighting the difficulty of building truly effective AI agents.

The speaker introduces three critical questions to evaluate AI agents: whether they have persistent memory, if they produce tangible and editable artifacts, and whether their architecture allows context to compound over time. Applying this framework to co-work reveals mixed results—it has some memory capabilities and excels at producing editable artifacts, especially in Excel, but lacks the ability to improve context over multiple sessions. This partial fulfillment explains why co-work, despite its flaws, has gained significant adoption and excitement in the market.

Four other AI agents are reviewed against these criteria. Lindy, aimed at busy executives, offers a simple natural language interface but struggles with transparency and artifact production, leading to mixed user reviews. Sauna, formerly Wordware, is praised for its strong memory architecture and ambition to build a workspace that compounds context, though it remains early-stage and demo-heavy. Google Opal is a free, lightweight workflow builder with some memory features but limited artifact production and durability, raising concerns about Google’s history of abandoning projects. Lastly, Obvious is the most ambitious, offering a full AI workspace with interconnected artifacts, but it is very new and unproven.

The video stresses that successful AI agents must integrate memory as a foundational element, produce editable outcomes, and allow context to build over time. These principles form a three-layer architecture: a knowledge store for memory, agent recipes for pre-wired workflows, and a scheduling loop to enable continuous learning and improvement. The speaker highlights the Open Brain project as an example of building such an infrastructure affordably, encouraging viewers to consider building their own agents rather than relying solely on expensive commercial options.

In conclusion, the video urges viewers to approach AI agents with a critical mindset, focusing on these three foundational questions to avoid being misled by hype and demos. While the current crop of AI agents shows promise, none fully meet the criteria for dependable, outcome-focused work. The future of AI agents lies in persistent memory, editable artifacts, and compounding context, and understanding these principles is crucial whether you choose to build your own or adopt existing solutions. The speaker invites feedback and experimentation with the reviewed tools to better understand their practical value.