New AI models, token minimization and IBM’s new sub-1nm chip

artesia · 26 June 2026 10:00

The video discusses IBM’s breakthrough sub-1 nanometer chip featuring a novel vertical “nano stack” architecture that significantly enhances performance and efficiency for AI computing, alongside emerging AI models like Japan’s Sakana Fugu and China’s GLM 5.2, highlighting shifts toward multi-model orchestration and token minimization strategies in enterprise AI. It also covers the Google DeepMind and A24 partnership to develop AI tools that augment rather than replace creative workflows in filmmaking, emphasizing evolving industry dynamics around AI governance, accessibility, and sustainable adoption.

artesia · 26 June 2026 10:20

The video begins with a detailed interview featuring Houming Bu, VP of silicon technology research and development at IBM, discussing IBM’s breakthrough in semiconductor technology with their new sub-1 nanometer chip. This chip introduces a novel “nano stack” architecture that stacks transistors vertically, moving beyond the traditional two-dimensional scaling that has dominated the industry for over 60 years. This innovation enables significant improvements in performance and power efficiency—offering 50% better performance or 70% power savings compared to the current 2-nanometer technology—and also achieves a 40% reduction in SRAM area, which is critical for AI computing. The chip packs nearly 100 billion transistors on a fingernail-sized wafer, with challenges such as heat dissipation and manufacturing precision being addressed through innovative engineering.

The discussion then shifts to new AI models making waves in the community, particularly the Japanese Sakana Fugu model and the Chinese GLM 5.2 coding model. The panelists highlight that Sakana’s approach is less about a single novel model and more about orchestrating multiple existing models to achieve superior performance, especially in coding tasks. This multi-model orchestration offers resilience and flexibility but introduces variability in output quality due to the dynamic routing of requests. The GLM 5.2 model, meanwhile, is noted for its large size and strong coding capabilities, reflecting the growing competitiveness of Chinese AI labs in the global landscape, although its size poses challenges for local deployment and accessibility.

The conversation explores the implications of these developments on AI industry dynamics, with panelists noting that orchestration platforms may become the new product focus rather than individual models. This shift could democratize access to frontier capabilities by combining strengths of multiple models and potentially enable smaller, more efficient models to run locally on consumer hardware. The discussion also touches on the challenges of governance, access, and trust in hosting these large models, especially as new players emerge rapidly in markets like China, contrasting with the more consolidated US AI landscape.

A significant segment covers the recent partnership between Google DeepMind and the film studio A24, aimed at developing AI tools to assist filmmakers rather than replace them. The panelists view this as a strategic olive branch to the entertainment industry, which has had a fraught relationship with AI. The collaboration focuses on augmenting creative workflows, such as storyboarding, to enhance productivity without compromising artistic control. The deal is also seen as a learning opportunity for DeepMind to understand film production processes, with hopes that AI tools will become integrated aids rather than disruptive replacements in creative industries.

Finally, the panel discusses the emerging trend of “token minimization” in enterprise AI usage, where companies are shifting from maximizing token consumption to optimizing token efficiency due to the high costs associated with AI model usage. They highlight the difficulty in measuring the true business value derived per token and caution against simplistic metrics like token counts as proxies for productivity. Instead, they advocate for a nuanced approach that considers the quality and context of token usage, including leveraging local, less expensive models for routine tasks. The conversation concludes with reflections on how AI adoption metrics will evolve as users become more proficient, emphasizing efficiency over volume as the key to sustainable AI integration in business.