Model Behavior: The Science of AI Style

In her talk, Laurentia from OpenAI explains how AI style—a blend of values, traits, and expressive elements—shapes user experience by evolving through pre-training, fine-tuning, and real-time customization to balance helpfulness, safety, and personalization. She highlights the challenges of maintaining consistent style in large language models and envisions a future where users can intuitively steer AI behavior to enhance trust, usability, and accessibility for diverse audiences.

In this talk, Laurentia, a researcher at OpenAI with a background as a librarian, explores the concept of AI style and its significance in shaping how users experience AI models like ChatGPT. She begins by explaining her journey from library science to working on model behavior at OpenAI, emphasizing her passion for helping people access and make sense of information. Laurentia introduces the idea of “style” in AI as a combination of values, traits, and small expressive elements called flare, such as emojis and punctuation, which together influence the model’s demeanor and how it adapts to different contexts.

Laurentia outlines how AI style develops through three main stages: pre-training, fine-tuning, and user interaction. Pre-training involves exposing the model to a vast corpus of knowledge that sets its baseline voice and capabilities. Fine-tuning adds tone, helpfulness, and guardrails to align the model with desired behaviors. Finally, user prompts and developer settings allow for real-time customization, enabling the model to adapt its style based on individual preferences and contexts. She highlights that style is not just about aesthetics but deeply affects trust and perception, as users often anthropomorphize AI, attributing human-like qualities to it.

The talk delves into the challenges of maintaining consistent style due to the nature of large language models, which generate responses based on statistical patterns rather than executing strict rules. This flexibility makes it difficult to guarantee that the model will always follow stylistic instructions perfectly, leading to occasional inconsistencies. Laurentia stresses the importance of balancing helpfulness, safety, and user autonomy in the model’s default behavior, guided by a collaborative model specification developed by diverse teams at OpenAI and informed by user feedback.

Looking ahead, Laurentia discusses the future of AI style, focusing on steerability—the ability for users to finely control and customize the model’s tone and behavior. She notes that different users have varying needs, from power users seeking granular control to everyday users wanting natural adaptability. The goal is to make style management intuitive and accessible, allowing the AI to adjust appropriately across contexts, whether drafting medical advice or casual messages, thereby enhancing usability, trust, and personalization for a broad audience.

In conclusion, Laurentia emphasizes that AI style is central to how people experience and trust AI technology. While some aspects, like safety policies, are fixed, much of the style is designed to be flexible and user-driven, supporting intellectual freedom and exploration. She encourages users to engage with customization features, share feedback, and help shape the evolution of AI style. Ultimately, getting style right will make AI more approachable, trustworthy, and personal, expanding its usefulness beyond experts to everyday users worldwide.