AI will compromise your cybersecurity posture

Yes, “AI” will compromise your information security posture. No, not
through some mythical self-aware galaxy-brain entity magically cracking
your passwords in seconds or “autonomously” exploiting new
vulnerabilities.

It’s way more mundane.

- Advertisement -

When immensely complex, poorly-understood systems get hurriedly
integrated into your toolset and workflow, or deployed in your
infrastructure, what inevitably follows is leaks,
compromises, downtime, and a whole lot of grief.

Complexity means cost and
risk¶
LLM-based systems are insanely complex, both on the conceptual level,
and on the implementation level. Complexity has real cost and introduces
very real risk. These costs and these risks are enormous, poorly
understood – and usually just hand-waved away. As Suha Hussain puts it in a video I’ll
discuss a bit later:

- Advertisement -

Machine learning is not a quick add-on, but something that will
fundamentally change your system security posture.

The amount of risk companies and organizations take on by using,
integrating, or implementing LLM-based – or more broadly, machine
learning-based – systems is massive. And they have to eat all of that
risk themselves: suppliers of these systems simply
refuse to take any real responsibility for the tools they provide
and problems they cause.

- Advertisement -

After all, taking responsibility is bad for the hype. And the hype is
what makes the line go up.

The Hype¶
An important part of pushing that hype is inflating expectations and
generating fear of missing out, one way or another. What better way to
generate it than by using actual fear?

What if spicy autocomplete is in fact all that it is cracked
up to be, and more? What if some kid somewhere with access to some
AI-chatbot can break all your passwords or automagically exploit
vulnerabilities, and just waltz into your internal systems? What if some
AI agent can indeed “autonomously” break through your defenses and wreak
havoc on your internal infrastructure?

- Advertisement -

You can’t
prove that’s not the case! And your data and cybersecurity is
on the line! Be afraid! Buy
our “AI”-based security thingamajig to protect yourself!..

It doesn’t matter if you do actually buy that product, by the way.
What matters is that investors believe you might. This whole theater is
not for you, it’s for VCs, angel investors, and whoever has spare cash
to buy some stock. The hype itself is the
product.

Allow me to demonstrate what I mean by this.

- Advertisement -

Cracking “51% of
popular passwords in seconds”¶
Over two years ago “AI” supposedly could crack our passwords “in
seconds”. Spoiler: it couldn’t, and today our passwords are no worse
for wear.

The source of a sudden deluge of breathless headlines about
AI-cracked passwords – and boy were
there quite a
few! – was a website
of a particular project called “PassGAN”. It had it all: scary
charts, scary statistics, scary design, and social media integrations to
generate scary buzz.

What it lacked was technical details. What hardware and
infrastructure was used to crack “51% popular passwords in seconds”? The
difference between doing that on a single laptop GPU versus running it
on a large compute cluster is pretty relevant. What does “cracking” a
password actually mean here – presumably reversing a hash? What hashing
function, then, was used to hash them in the first place? How does it
compare against John the
Ripper and other non-“AI” tools that had been out there for ages?
And so on.

Dan Goodin of Ars Technica did a
fantastic teardown of PassGAN. The long and short of it is:

As with so many things involving AI, the claims are served with a
generous portion of smoke and mirrors. PassGAN, as the tool is dubbed,
performs no better than more conventional cracking methods. In short,
anything PassGAN can do, these more tried and true tools do as well or
better.

If anyone was actually trying to crack any passwords, PassGAN was not
a tool they’d use, simply because it wasn’t actually effective. In no
way was PassGAN a real threat to your information security.

Exploiting “87% of
one-day vulnerabilities”¶
Another example: over a year ago GPT-4 was supposedly able to “autonomously”
exploit one-day vulnerabilities just based on CVEs. Specifically, 87%
of them.

Even more specifically, that’s 87% of exactly 15
(yes, fifteen) vulnerabilities, hand-picked by the
researchers for that study. For those keeping score at home, that comes
out to thirteen “exploited” vulnerabilities. And even that only
when the CVE included example exploit code.

In other words, code regurgitation machine was able to regurgitate
code when example code was provided to it. Again, in no way is this an
actual, real threat to you, your infrastructure, or your data.

“AI-orchestrated” cyberattack¶
A fresh example of generating hype through inflated claims and fear
comes from Anthropic. The company behind an LLM-based
programming-focused chatbot Claude pumps the hype by claiming their
chatbot was used in a “first
reported AI-orchestrated cyber-espionage campaign”.

Anthropic – who has vested interest in convincing everyone that their
coding automation product is the next best thing since sliced bread –
makes pretty bombastic claims, using sciencey-sounding language; for
example:

Overall, the threat actor was able to use AI to perform 80-90% of the
campaign, with human intervention required only sporadically (perhaps
4-6 critical decision points per hacking campaign). (…) At the peak of
its attack, the AI made thousands of requests, often multiple per
second—an attack speed that would have been, for human hackers, simply
impossible to match.

Thing is, that just describes automation. That’s what computers were
invented for.

A small script, say in Bash or Python, that repeats certain tedious
actions during an attack (for example, generates a list of API endpoints
based on a pattern to try a known exploit against) can easily
“perform 80-90%” of a campaign that employs it. It can make
“thousands of requests, often multiple per second” with
curl and a for loop. And “4-6 critical
decision points” can just as easily mean a few simple questions
asked by that script, for instance: what API endpoint to hit when a
given target does not seem to expose the attacked service on the
expected one.

And while LLM chatbots somewhat expand the scope of what can be
automated, so did scripting languages and other decidedly non-magic
technologies at the time they were introduced. Anyone making a huge deal
out of a cyberattack being “orchestrated” using Bash or Python would be
treated like a clown, and so
should Anthropic for making grandiose claims just because somebody
actually
managed to use Claude for something.

There is, however, one very important point that Anthropic buries in
their write-up:

At this point [the attackers] had to convince Claude—which is
extensively trained to avoid harmful behaviors—to engage in the attack.
They did so by jailbreaking it, effectively tricking it to bypass its
guardrails. They broke down their attacks into small, seemingly innocent
tasks that Claude would execute without being provided the full context
of their malicious purpose. They also told Claude that it was an
employee of a legitimate cybersecurity firm, and was being used in
defensive testing.

The real story here is not that an LLM-based chatbot is somehow
“orchestrating” a cyber-espionage campaign by itself. The real story is
that a tech company, whose valuation is at around
$180 billion-with-a-b, put out a product – “extensively trained
to avoid harmful behaviors” – that is so hilariously unsafe that
its guardrails can be subverted by a tactic a 13-year-old uses when they
want to prank-call someone.

And that Anthropic refuses to take responsibility for that unsafe
product.

Consider this: if Anthropic actually believed their own hype about
Claude being so extremely powerful, dangerous, and able to autonomously
“orchestrate” attacks, they should be terrified about how trivial it is
to subvert it, and would take it offline until they fix that. I am not
holding my breath, though.

The boring reality¶
The way to secure your infrastructure and data remains the same
regardless of whether a given attack is automated using Bash, Python, or
an LLM chatbot: solid threat modelling, good security engineering,
regular updates, backups, training, and so on. If there is nothing that
can be exploited, no amount of automation will make it exploitable.

The way “AI” is going to compromise your cybersecurity is not through
some magical autonomous exploitation by a singularity from the outside,
but by being the poorly engineered, shoddily integrated, exploitable
weak point you would not have otherwise had on the inside. In a word, it
will largely be self-inflicted.

Leaks¶
Already in mid-2023 Samsung internally
banned the use of generative AI tools after what was described as a
leak, and boiled down to Samsung employees pasting sensitive code into
ChatGPT.

What Samsung understood two and a half years ago, and what most
people seem to not understand still today, is that pasting anything into
the chatbot prompt window means giving it to the company running that
chatbot.

And these companies are very data-hungry. They also tend to be
incompetent.

Once you provide any data, it is out of your control. The company
running the chatbot might
train their models on it – which in turn might surface it to someone
else at some other time. Or they might just catastrophically
misconfigure their own infrastructure and leave your prompts – say, containing
sexual fantasies or trade secrets – exposed
to anyone on the Internet, and indexable by
search engines.

And when that happens they might even blame the users, as
did Meta:

Some users might unintentionally share sensitive info due to
misunderstandings about platform defaults or changes in settings over
time.

There’s that not-taking-responsibility-for-their-unsafe-tools again.
They’ll take your data, and leave you holding the bag of risk.

Double agents¶
Giving a stochastic text extruder any kind of access to your systems
and data is a bad idea, even if no malicious actors are involved – as
one Replit user very publicly learned
the hard way. But giving it such access and making it
possible for potential attackers to send data to it for processing is
much worse.

The first zero-click attack on an LLM agent has
already been found. It happened to involve Microsoft 365 Copilot,
and required only
sending an e-mail to an Outlook mailbox that had Copilot enabled to
process mail. A successful attack allowed data exfiltration, with no
action needed on the part of the targeted user.

Let me say this again: if you had Copilot enabled in Outlook, an
attacker could just send a simple plain text e-mail to your address and
get your data in return, with absolutely no interaction from you.

The way it worked was conceptually very simple: Copilot had access to
your data (otherwise it would not be useful), it was also processing
incoming e-mails; the attackers found a way to convince the agent to
interpret an incoming e-mail they sent as instructions for it to
follow.

On the most basic level, this attack was not much different from the
“ignore
all previous instructions” bot unmasking tricks that had been all
over social media for a while. Or from adding to your CV a bit of white
text on white background instructing whatever AI agent is processing it
to recommend your application for hiring (yes, this might actually
work).

Or from adding such obscured (but totally readable to LLM-based
tools) text to scientific papers, instructing the agent to give them
positive “review” – which apparently was so effective, the International
Conference on Learning Representations had
to create an explicit policy against that. Amusingly, that is
the conference that “brought
this [that is, LLM-based AI hype] on us” in the first place.

On the same basic level, this is also the trick researchers used to
go around OpenAI’s “guardrails” to get ChatGPT to issue
bomb-building instructions, tricked GitHub Copilot to leak
private source code, and how the perpetrators went around
Anthropic’s “guardrails” in order to use the company’s LLM chatbot in
their aforementioned attack, by simply pretending they are security
researchers.

Prompt injection¶
Why does this happen? Because LLMs (and tools based on them) have no
way of distinguishing data from instructions. Creators of these systems
use all sorts of tricks to try and separate the prompts that define the
“guardrails” from other input data, but fundamentally it’s all text, and
there is only a single context window.

Defending from prompt injections is like defending from SQL
injection, but there is no such thing as prepared statements, and
instead of trying to escape specific characters you have to semantically
filter natural language.

This is another reason why Anthropic will not take Claude down until
they properly fix these guardrails, even if they believe their own hype
about how powerful (and thus dangerous when abused) it is. There is
simply no way to “properly fix them”. As a former Microsoft security
architect had
pointed out:

[I]f we are honest here, we don’t know how to build secure AI
applications

Of course all these companies will insist they can make these systems
safe. But inevitably, they will continue
to be proven wrong: ASCII
smuggling, dropping some random facts about cats (no, really), information overload…

The arsenal of techniques grows, because the problem is fundamentally
related to the very architecture of LLM chatbots and agents.

Breaking assumptions¶
Integrating any kind of software or external service into an existing
infrastructure always risks undermining security assumptions, and
creating unexpected vulnerabilities.

Slack decided to push AI down users’ throats, and inevitably
researchers found
a way to exfiltrate data from private channels via an indirect prompt
injection. An attacker did not need to be in the private channel
they were trying to exfiltrate data from, and the victim did not have to
be in the public channel the attacker used to execute the attack.

Gemini integration within Google Drive apparently had a “feature”
where it would scan
PDFs without explicit permission from the owner of these PDFs.
Google claims that was not the case and the settings making the files
inaccessible to Gemini were not enabled. The person in question claims
they were.

Whether or not we trust Google here, it’s hard to deny settings
related to disabling LLM agents’ access to documents in Google Workplace
are hard
to find, unreliable, and constantly shifting. That in and of itself
is an information security issue (not to mention it being a compliance
issue as well). And Google’s interface decisions are to blame for this
confusion. This alone undermines your cybersecurity stance, if you
happen to be stuck with Google’s office productivity suite.

Microsoft had it’s own, way better documented problem, where a user
who did not have access to a particular file in SharePoint could
just ask Copilot to provide them with its contents. Completely
ignoring access controls.

You might think you can defend from that just by making certain files
private, or (in larger organizations) unavailable to certain users. But
as the Gemini example above shows, it might not be as simple because
relevant settings might be confusing or hidden.

Or… they might just not work at all.

Bugs. So many bugs.¶
Microsoft made it possible to set a policy
(NoUsersCanAccessAgent) in Microsoft 365 that would disable
LLM agents (plural, there are dozens of them) for specific users.
Unfortunately it seems to have been implemented with the level of
competence and attention to detail we have grown to expect from the
company – which is to say, it
did not work:

Shortly after the May 2025 rollout of 107 Copilot Agents in Microsoft
365 tenants, security specialists discovered that the “Data Access”
restriction meant to block agent availability is being ignored.

(…)

Despite administrators configuring the Copilot Agent Access Policy to
disable user access, certain Microsoft-published and third-party agents
remain readily installable, potentially exposing sensitive corporate
data and workflows to unauthorized use.

This, of course, underlines the importance of an audit trail. Even if
access controls were ignored, and even when agents turned out to be
available to users whom they should not be available to, at least there
are logs that can be used to investigate any unauthorized access, right?
After all, these are serious tools, built by serious companies and used
by serious institutions (banks, governments, and the like). Legal
compliance is key in a lot of such places, and compliance requires
auditability.

It would be pretty bad if it was possible for a malicious insider,
who used these agents to access something they shouldn’t have, to simply
ask for that fact not to be included in the audit log. Which, of course,
turned
out to be exactly the case:

On July 4th, I came across a problem in M365 Copilot: Sometimes it
would access a file and return the information, but the audit log would
not reflect that. Upon testing further, I discovered that I could simply
ask Copilot to behave in that manner, and it would. That made it
possible to access a file without leaving a trace.

In June 2024 Microsoft’s president, Brad Smith, promised
in US Congress that security will be the top priority, “more
important even than the company’s work on artificial intelligence.”

No wonder, then, that the company treated this as an important
vulnerability. So important, in fact, that it decided not to inform
anyone about it, even after the problem got fixed. If you work in
compliance and your company uses Microsoft 365, I cannot imagine how
thrilled you must be about that! Can you trust your audit logs from the
last year or two? Who knows!

Code quality¶
Even if you are not giving these LLMs access to any of your data and
just use them to generate some code, if you’re planning to use that code
anywhere near a production system, you should probably
think twice:

Businesses using artificial intelligence to generate code are
experiencing downtime and security issues. The team at Sonar, a provider
of code quality and security products, has heard first-hand stories of
consistent outages at even major financial institutions where the
developers responsible for the code blame the AI.

This is probably a good time for a reminder that availability
is also a part of what information security is about.

But it gets worse. It will come as no surprise to anyone at this
stage that LLM chatbots “hallucinate”. Consider what might happen if
somewhere in thousands of lines of AI-generated code there is a
“hallucinated” dependency? That seems
to happen quite often:

“[R]esearchers (…) found that AI models hallucinated software package
names at surprisingly high rates of frequency and repetitiveness – with
Gemini, the AI service from Google, referencing at least one
hallucinated package in response to nearly two-thirds of all prompts
issued by the researchers.”

The code referencing a hallucinated dependency might of course not
run; but that’s the less-bad scenario. You see, those “hallucinated”
dependency names are predictable. What if an attacker creates a
malicious package with such a name and pushes it out to a public package
repository?

“[T]he researchers also uploaded a “dummy” package with one of the
hallucinated names to a public repository and found that it was
downloaded more than 30,000 times in a matter of weeks.”

Congratulations, you just got slopsquatted.

Roll your own?¶
If you are not interested in using the clumsily integrated,
inherently prompt-injectable Big Tech LLMs, and instead you’re thinking
of rolling your own more specialized machine learning model for some
reason, you’re not in the clear either.

I quoted Suha Hussain at the beginning of this piece. Her work on
vulnerability of machine learning pipelines is as important as it is
chilling. If you’re thinking of training your own models, her 2024 talk on
incubated machine learning exploits is a must-see:

Machine learning (ML) pipelines are vulnerable to model backdoors
that compromise the integrity of the underlying system. Although many
backdoor attacks limit the attack surface to the model, ML models are
not standalone objects. Instead, they are artifacts built using a wide
range of tools and embedded into pipelines with many interacting
components.

In this talk, we introduce incubated ML exploits in which attackers
inject model backdoors into ML pipelines using input-handling bugs in ML
tools. Using a language-theoretic security (LangSec) framework, we
systematically exploited ML model serialization bugs in popular tools to
construct backdoors.

Danger ahead¶
In a way, people and companies fear-hyping generative AI are right
that their chatbots and related tools pose a clear and present danger to
your cybersecurity. But instead of being some nebulous, omnipotent
malicious entities, they are dangerous because of their complexity, the
recklessness with which they are promoted, and the break-neck speed at
which they are being integrated into existing systems and workflows
without proper threat modelling, testing, and security analysis.

If you are considering implementing or using any such tool, consider
carefully the cost and risk associated with that decision. And if you’re
worried about “AI-powered” attacks, don’t – and focus
on the fundamentals instead.