What's the Magic Word? A Control Theory of LLM Prompting

The speaker discusses similarities between the behavior of language models and human social engineering tricks. They mention that language models can perform better when certain techniques, like promising a reward, are used. Additionally, they describe a “perceptual layer” where giving the model strange and inhuman prompts can influence its output. This chaotic regime of prompts is likened to hypnosis or magic, where specific inputs can make a certain output highly likely. This observation leads to a comparison between studying language models and understanding magic and human perceptual systems. By analyzing language models, researchers are gaining insights into how these models interact with the world and the dynamics that govern their functioning.

