Self-improving LLM learns continually (in context) to solve hard math problems

The video showcases a novel approach enabling a large language model to continually self-improve within a single session by maintaining message history and using reflective prompting, resulting in a significant accuracy boost from around 50% to over 63% on a complex math categorization task. Despite challenges like crashes and limited impact from focusing on hard examples, the method marks a breakthrough in in-context continual learning for LLMs, with code and further resources available via the creator’s Patreon.

In this video, the creator demonstrates a breakthrough in enabling a large language model (LLM) to learn continually in context to solve a challenging math problem involving categorizing data based on complex relationships among five numbers. Initially, the model’s accuracy was around 50%, but through iterative self-improvement over 500+ iterations, the model’s performance increased by approximately 15%, reaching around 63-64% accuracy. This marks a significant advancement compared to previous attempts where accuracy plateaued around 40-47% despite numerous training runs.

The problem setup involves predicting the correct category for each data row based on mathematical relationships among five variables. The creator uses a Python-based predictor generator that can interface with OpenAI or OpenRouter models. Earlier experiments involved making separate API calls for each iteration, passing distilled knowledge or examples between calls, but these approaches resulted in flat learning curves with little improvement over time. The key innovation was to maintain a continuous message history within a single session, allowing the model to accumulate knowledge and improve progressively.

To further enhance learning, the creator introduced a cyclical training approach where the model runs multiple iterations per cycle, and after each cycle, it reflects on its performance and creates a strategic plan for the next cycle. This reflective prompting, combined with passing the best-performing solutions and their errors to subsequent cycles, helped overcome stagnation and led to steady accuracy improvements. The system also tests solutions against a large dataset of 10,000 rows, ensuring robust evaluation of the model’s predictive capabilities.

Additional experiments included identifying unsolved or hard examples within the dataset to potentially guide the model’s learning focus. However, incorporating these hard examples did not significantly impact performance in the short runs conducted. The creator also explored a tree search method to explore multiple solution paths in parallel, which shows promise but requires further development and tuning. Unfortunately, some runs crashed, necessitating reruns and additional refinement.

The video concludes with an invitation to access the code and resources on the creator’s Patreon, where numerous other LLM-related projects are shared. The creator also offers consulting services and weekly meetings for members interested in deepening their understanding or collaborating on similar projects. Overall, this work represents a meaningful step toward enabling LLMs to self-improve continuously in context, particularly for solving complex mathematical problems.