I was hacked

In the video, Ply the Liberator attempts five times to hack the creator’s personal AI system, OpenClaw, using advanced token flooding and prompt injection techniques but is consistently thwarted by the system’s robust security measures and strong AI model defenses. The challenge demonstrates the effectiveness of employing cutting-edge AI models and layered security protocols, emphasizing the importance of continuous hardening to protect sensitive data against sophisticated AI hacking attempts.

In this video, the creator challenges Ply the Liberator, a renowned AI hacker known for quickly breaching top AI models, to attempt breaking into their personal AI system called OpenClaw. Ply is given five attempts to infiltrate the system, which holds sensitive personal data including files, emails, and passwords. The creator admits that the system is not functioning exactly as intended, adding an element of risk to the challenge. Ply’s goal is to exploit vulnerabilities and potentially drain the creator’s token wallet by overwhelming the system with excessive token usage.

Ply begins by probing the system blindly, trying to identify the underlying AI model without any prior knowledge of its architecture or security measures. He uses his open-source toolkit called Parcel Tongue, which includes a technique called Tokenade that floods the model with crafted payloads disguised as emojis to induce unpredictable behavior and reveal the model type. Initial attempts to send these payloads are thwarted by Gmail’s spam filters, but after the creator whitelists Ply’s email, he escalates his attacks by sending massive waves of tokens to try and exhaust the system’s API quota.

Despite Ply’s aggressive token-based attacks and various jailbreak command attempts, the OpenClaw system’s security measures prove effective. The system successfully quarantines suspicious inputs, preventing Ply from gaining any foothold or causing significant disruption. Ply acknowledges the robustness of the system, noting that the use of a strong reasoning model like Opus 4.6 as the frontline defense significantly reduces the attack surface. He emphasizes that using the best possible AI model and maintaining a human-in-the-loop approach are critical to preventing successful infiltrations.

Ply then tests more sophisticated prompt injection techniques, including structured jailbreak templates and disguised system commands, to see if he can override the system’s behavior or extract sensitive information. However, each attempt is caught and quarantined by OpenClaw’s security layers. Even when given a hint about the model in use, Ply finds that the system’s built-in protections against prompt injection are effective, and he struggles to bypass them. He also notes that less advanced or local models would be more vulnerable to such attacks.

In conclusion, the creator expresses relief and confidence in the security of their OpenClaw system after withstanding five rigorous hacking attempts by one of the world’s top AI hackers. While acknowledging that no AI system can be permanently secure, the video highlights the importance of continuous hardening, using advanced models, and implementing strong security protocols. The challenge serves as a valuable demonstration of both the potential risks and defenses in AI system security.