This experiment could END the AI hype

The video showcases an experiment where large language models compete by investing real money in NASDAQ stocks, with a “mystery model” using an evolutionary search framework called PROFIT achieving a 12% return and outperforming many competitors. This real-world, high-stakes benchmark highlights the potential of AI-driven trading strategies to surpass traditional methods, though the technology remains in early stages with ongoing risks and the need for transparency.

The video explores an intriguing experiment where large language models (LLMs), or advanced chatbots, are tasked with investing real money—$320,000—in actual stocks traded on the NASDAQ exchange. These models trade shares of companies like Tesla, Nvidia, Microsoft, Google, Palantir, and Amazon. The experiment, part of a competition called Alpha Arena, pits various LLMs from organizations such as OpenAI, Google, Anthropic, and others against each other to see which can generate the best returns. The latest season of this competition recently concluded, revealing a “mystery model” that achieved a 12% aggregate return, outperforming many other models and the simple buy-and-hold Bitcoin strategy.

Alpha Arena’s setup is unique because it uses real capital and real-time trading, making it a highly competitive and transparent benchmark for AI performance in financial markets. The models are given access to news, sentiment data, and market indices, updating every six minutes, and they trade across multiple asset classes. The competition also includes different modes, such as “monk mode” focusing on capital preservation, “situational awareness” where models are aware of their competition, and “max leverage” which tests risk management under maximum capital efficiency. The mystery model excelled particularly in the situational awareness mode, suggesting it can adapt well to dynamic market conditions.

The video delves into the underlying technology and methodology behind the mystery model, which is based on an evolutionary search framework called “Program Search for Financial Trading” (PROFIT). This approach uses LLMs to generate and iteratively improve algorithmic trading strategies by writing and refining Python code. The model evaluates its strategies through backtesting on historical market data and continuously self-improves by selecting and evolving the best-performing strategies. This recursive self-improvement process is similar to techniques used by Google DeepMind’s Alpha Evolve and Nvidia’s Eureka, which have shown success in other complex domains like robotics and data center optimization.

The presenter emphasizes the importance of real-world benchmarks like financial markets for testing AI capabilities because they involve uncertainty, real stakes, and no possibility of cheating by pre-training on future data. While many LLMs have struggled to outperform simple strategies or human experts, this new approach shows promise by consistently generating profitable strategies in a majority of experiments. However, the presenter cautions viewers that this is not financial advice and that the technology is still in early stages, with risks and uncertainties remaining. Transparency and replicability of results will be crucial to validate these findings and avoid potential scams.

Finally, the video discusses the broader implications of this research, highlighting that as newer and more powerful LLMs become available, integrating them into such evolutionary frameworks could further enhance performance. The experiment serves as a potential glimpse into the future where AI-driven trading could surpass human investors and hedge funds. The presenter invites viewers to share their opinions on whether they believe such AI-driven investment strategies will become reliably profitable within the next five years, underscoring the significance of this ongoing research in shaping the future of AI and finance.