OpenAI Believes DeepSeek ‘Distilled’ Its Data For Training—Here's What To Know About The Technique

OpenAI has raised concerns that the Chinese startup DeepSeek may have used a technique called “distillation” to train its new AI model using outputs from OpenAI’s models without authorization, prompting investigations by OpenAI and Microsoft. This situation has significant implications for corporate competition and national security, leading to calls for stronger protections against unauthorized data usage in the AI industry.

OpenAI has raised concerns that its AI model outputs may have been used by the Chinese startup DeepSeek to train its new open-source model, which has garnered significant attention and impacted U.S. financial markets. According to a report by the Financial Times, OpenAI has found evidence suggesting that DeepSeek employed a technique known as “distillation.” This method involves using the outputs from a larger, more advanced AI model to enhance and train a smaller model, potentially allowing DeepSeek to leverage OpenAI’s technology without authorization.

The situation has prompted OpenAI and its major backer, Microsoft, to investigate whether DeepSeek accessed OpenAI’s application programming interface (API). The API allows other businesses to utilize OpenAI’s AI models, but the companies suspect that DeepSeek may have violated terms and conditions by engaging in distillation. Last year, OpenAI and Microsoft took action to block accounts that were suspected of using the API for this purpose, indicating ongoing concerns about unauthorized data usage.

David Saxs, a former appointee of President Donald Trump focused on AI, highlighted the issue, stating that there is substantial evidence supporting the claim that DeepSeek distilled outputs from OpenAI’s models. Saxs expressed that OpenAI is likely displeased with this situation and predicted that leading AI companies in the U.S. would soon implement measures to prevent distillation practices. Such actions could hinder the development of similar models that attempt to replicate the capabilities of established AI systems.

The implications of DeepSeek’s actions extend beyond corporate competition, raising national security concerns that have caught the attention of the White House. The National Security Council is currently reviewing the potential impacts of this situation on U.S. interests, emphasizing the need for vigilance in the rapidly evolving AI landscape. White House Press Secretary Caroline Levitt described the situation as a “wake-up call” for the American AI industry, signaling the importance of safeguarding proprietary technology.

In summary, the controversy surrounding DeepSeek’s use of OpenAI’s outputs through distillation has sparked investigations and raised alarms about the integrity of AI development practices. As the industry grapples with these challenges, the focus will likely shift toward establishing stronger protections against unauthorized data usage and ensuring that innovation does not come at the expense of ethical standards and national security.