Generalizable LLM solution generator with tree search and self-reflection and 2 stage evaluation

artesia · 13 September 2025 05:30

The video introduces a generalizable in-context goal solver that leverages Monte Carlo Tree Search, self-reflection, and a two-stage rubric evaluation to generate and iteratively refine high-quality solutions across diverse tasks. By combining broad initial filtering with detailed scoring, along with strategies to avoid stagnation and promote creativity, the system effectively produces varied and practical outputs such as SaaS ideas and home exercise plans, demonstrating promising results and inviting community collaboration.

artesia · 13 September 2025 05:52

The video presents a generalizable in-context goal solver system designed to generate solutions for a wide range of tasks using a combination of Monte Carlo Tree Search (MCTS), self-reflection, and a two-stage evaluation process. The system takes a simple goal as input and produces high-quality results by exploring multiple solution paths and iteratively refining them. A key innovation is the two-stage rubric evaluation: stage one uses broader, coarser rubrics to filter out weaker solutions, while stage two applies more granular rubrics to fine-tune and score the best candidates. This approach helps overcome the challenge of scoring diverse and creative outputs, such as idea generation, where uniform scoring often limits model performance.

Stage one rubrics allocate 70 points distributed across several criteria, each worth multiple points, to quickly assess the general quality of solutions. Only solutions that meet a minimum threshold in stage one proceed to stage two, where 30 additional points are distributed in smaller increments across many more detailed rubrics. This two-tiered scoring system encourages diversity and depth in solution evaluation. The system generates solutions using Monte Carlo Tree Search with adjustable parameters like the number of nodes explored, branching factors, and exploration-exploitation balance, allowing it to efficiently navigate the solution space.

The system also incorporates mechanisms to prevent stagnation during the search process. It periodically reflects on its progress, generating plans for future exploration, and uses prompt-based jolts to escape local optima. Hybridization techniques combine the best solutions at intervals to create improved candidates. These strategies collectively enhance the system’s ability to explore novel ideas and improve solution quality over time. The video demonstrates the system’s effectiveness with examples such as generating unique SaaS ideas using large language models and designing effective five-minute home exercise regimens.

Results from the system show promising diversity and quality, with solutions passing the 90% quality threshold after iterative refinement. For instance, the SaaS idea generator produced varied and creative concepts like a ripple choice simulator for decision-making and a platform for urban gardeners. Similarly, the exercise regimen generator created practical, high-intensity interval training plans tailored for home workouts. The video highlights the importance of fine-tuning parameters like the minimum passing score for stage one and the reflection frequency to optimize performance and avoid premature stagnation.

Finally, the creator invites viewers to explore and improve the system, offering access to the code and additional resources through Patreon. Subscribers gain access to a large collection of language model applications, exclusive content, and consulting opportunities. The video emphasizes that while the system is still a work in progress, it represents a promising approach to automating complex, creative problem-solving tasks using advanced AI techniques like Monte Carlo Tree Search combined with self-reflection and multi-stage evaluation.