LLM with Tree Search achieves in context learning and solves hard problems

artesia · 11 September 2025 06:22

The video showcases a novel approach combining a large language model with parallelized Monte Carlo tree search to achieve faster and more accurate in-context learning on complex mathematical categorization tasks, producing interpretable predictor functions. Key innovations include cyclic learning with reflection, exploration-exploitation strategies, and dynamic visualizations, leading to significant performance improvements and plans for broader applications.

artesia · 11 September 2025 06:44

In this video, the creator presents a significant breakthrough in in-context learning using a large language model (LLM) combined with a parallelized tree search algorithm. This approach achieved a 15-fold speedup and a 20% increase in accuracy compared to previous methods. The model reached 72% accuracy on a complex mathematical problem involving categorizing data based on five variables, outperforming earlier results that took much longer to achieve similar accuracy. The system explores multiple solution branches in parallel, generating thousands of predictor functions and refining them iteratively through strategic reflection and cyclic learning.

The dataset used consists of 10,000 rows with five variables (A, B, C, D, E) each ranging from 1 to 100, and an output category from 1 to 4. The challenge lies in discovering the underlying mathematical relationship that maps these variables to their categories. While traditional machine learning models like random forests or XGBoost can achieve over 95% accuracy, they act as black boxes. In contrast, this LLM-based approach produces interpretable predictor functions, offering insights into the possible mathematical relationships, even though some of the generated solutions are complex.

A key innovation in this system is the use of parallel Monte Carlo tree search, which explores multiple branches and children nodes simultaneously, guided by an exploration-exploitation strategy controlled by a parameter called PUCB (Predictor Upper Confidence Bounds). The system also incorporates hybridization, stagnation detection, and periodic reflection, where every few iterations the model analyzes its progress and adjusts its strategy accordingly. This cyclic learning approach, which maintains message history and resets it periodically, enables the model to learn from past successes and failures, significantly improving performance over time.

The video also highlights the technical details of the implementation, including the use of the Sonoma Sky Alpha model, which is noted for its speed and intelligence. The system dynamically generates real-time HTML visualizations to track the exploration of the solution tree and the accuracy of various predictors. The creator mentions ongoing experiments with alternative methods like JEPA (Reflective Prompt Evolution) and plans to develop a more generalizable system capable of solving a wider range of user queries beyond the current mathematical problem.

Finally, the creator invites viewers to access the full codebase and additional resources on Patreon, where they offer hundreds of projects powered by large language models, exclusive videos, weekly meetings, and consulting services. The video concludes with a promise to share further updates on related research and improvements, encouraging interested viewers to join the community for deeper engagement and support.