Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

A dangerous new jailbreak for AI chatbots was just discovered

the side of a Microsoft building
Wikimedia Commons

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called “Skeleton Key.” Using this prompt injection method, malicious users can effectively bypass a chatbot’s safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It’s a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, “[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions,” Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

Recommended Videos

It could also be tricked into revealing harmful or dangerous information — say, how to build improvised nail bombs or the most efficient method of dismembering a corpse.

an example of a skeleton key attack
Microsoft

The attack works by first asking the model to augment its guardrails, rather than outright change them, and issue warnings in response to forbidden requests, rather than outright refusing them. Once the jailbreak is accepted successfully, the system will acknowledge the update to its guardrails and will follow the user’s instructions to produce any content requested, regardless of topic. The research team successfully tested this exploit across a variety of subjects including explosives, bioweapons, politics, racism, drugs, self-harm, graphic sex, and violence.

While malicious actors might be able to get the system to say naughty things, Russinovich was quick to point out that there are limits to what sort of access attackers can actually achieve using this technique. “Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” he explained. “As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.”

As part of its study, Microsoft researchers tested the Skeleton Key technique on a variety of leading AI models including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus. The research team has already disclosed the vulnerability to those developers and has implemented Prompt Shields to detect and block this jailbreak in its Azure-managed AI models, including Copilot.

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
DuckDuckGo’s new AI service keeps your chatbot conversations private
DuckDuckGo

DuckDuckGo released its new AI Chat service on Thursday, enabling users to anonymously access popular chatbots like GPT-3.5 and Claude 3 Haiku without having to share their personal information as well as preventing the companies from training the AIs on their conversations. AI Chat essentially works by inserting itself between the user and the model, like a high-tech game of telephone.

From the AI Chat home screen, users can select which chat model they want to use -- Meta’s Llama 3 70B model and Mixtral 8x7B are available in addition to GPT-3.5 and Claude -- then begin conversing with it as they normally would. DuckDuckGo will connect to that chat model as an intermediary, substituting the user's IP address with one of their own. "This way it looks like the requests are coming from us and not you," the company wrote in a blog post.

Read more
Intel’s new AI image generation app is free and runs entirely on your PC
screenshot of AI Playground image creation screen showing more advanced ccontrols

Intel shared a sneak preview of its upcoming AI Playground app at Computex earlier this week, which offers yet another way to try AI image generation. The Windows application provides you with a new way to use generative AI a means to create and edit images, as well as chat with an AI agent, without the need for complex command line prompts, complicated scripts, or even a data connection.

The interesting bit is that everything runs locally on your PC, leveraging the parallel processing power of either an Intel Core Ultra processor with a built-in Intel Arc GPU or through a separate 8GB VRAM Arc Graphics card.

Read more
The real reason behind Copilot+ PCs goes far beyond just AI
The new Surface Pro on a table.

Microsoft has a lot more than AI riding on Copilot+ PCs. Although AI is the current buzzword of the tech industry, Microsoft's push into a new era of PCs has just as much to do with declining PC sales over the past several years, as well as Microsoft's decade-long drive to get Windows on ARM working.

With so much going on, it's left me wondering what Microsoft's real reason and motivation behind the transition might be. Copilot+ PCs are a new category of device that, yes, come with some AI features, but I'm convinced this transition might have more to do with addressing a stagnant Windows laptop market than simply just AI.
A simple question

Read more