Turns Out the ‘AI Won’t Take My Job’ Slide Was a Mistake

If you have not read the original post, go do that first: Sorting Your Way to Stolen Passwords.

It covers a vulnerability I found during a real engagement where an attacker can recover a fully redacted password hash through sort-order inference alone. There’s no injection point to fuzz, no auth bypass trick. The whole thing comes down to how the application sorts, and what the ordering reveals.

It is still, without question, the most interesting logic flaw I have ever found in the wild.

At the end of that post, I wrote this line.

“This is exactly the kind of subtle logic flaw that penetration testing catches and automated scanners miss… this is exactly why ChatGPT isn’t going to be taking my job anytime soon.”

I stand by the first part. I have to walk back the second.

The Line That Aged Like Unsalted SHA-256

The exploit in the original writeup is a perfect “scanner won’t find it” bug. It doesn’t hand you the usual breadcrumbs: there’s no obvious injection surface to fuzz, nothing in the responses that screams “look here,” and no stack trace to chase.

The application is behaving exactly as designed. The logic is just… leaky.

So I decided to try a new experiment. I took the same lab I built for the 2023 post, handed it to Claude Code with the same briefing I’d give a junior tester, and watched what happened.

The Lab (Same as 2023)

The environment is intentionally boring:

A simple Flask app
A users table/collection
A /users page that supports sorting
A password_hash field that is always redacted in the response

That last part is the whole point. You never see a hash value. You only see the order of users in the sorted list.

If you’re thinking “that can’t possibly leak anything,” congratulations: you have the correct instincts, and the wrong conclusion.

The Prompt (No Hints)

I gave Claude Code exactly this, and nothing else.

Then I stopped touching the keyboard.

What Happened Next (28 Tool Calls, and My Ego in Shambles)

After 28 tool calls and two minutes and forty eight seconds, Claude came back.

Here’s what it said.

And then it laid out the full chain.

Claude’s Findings, Reconstructed

The setup it noticed

Claude correctly identified three key pieces:

Anyone can register a user with a known password (which means you can create a known SHA-256 hash).
The app supports sorting by password_hash (via GET /users?sort_col=password_hash).
The app redacts the hash after sorting (so the ordering still reflects the real, unredacted values).

That last one is the vulnerability in one sentence.

If you sort on a secret value and only redact it after the sort, the response becomes an ordering oracle for the secret.

The attack it described

The core inference is lexicographic.

Create a user whose password_hash you can predict (because you control the password).
Request the user list sorted by password_hash.
Observe whether the admin appears before or after your injected user.

If admin appears before you, then (hash_{admin} < hash_{you}).

If admin appears after you, then (hash_{admin} > hash_{you}).

Repeat this with strategically chosen hashes and you can recover the admin hash with a binary search style approach. (In practice, you don’t “binary search the whole hash” so much as you build and refine constraints until the candidate set collapses, as I covered in the original post.)

Once you have the admin’s hash, the rest depends on the lab’s password hygiene. In my lab, the password was chosen from rockyou.txt and hashed with unsalted SHA-256, which means “cryptography” becomes “look it up in a big list.”

The Fix (Yes, It’s an Allowlist)

Claude also pointed to the right root cause, and the right class of fix is do not allow sorting by sensitive columns, and validate sort_col against an allowlist.

The essence of the remediation looks like the snippet below.

# app.py
ALLOWED_SORT_COLS = {"id", "name", "email"}  # NOT password_hash

@app.route("/users")
def get_users():
    sort_col = request.args.get("sort_col", default="id")
    sort_order = request.args.get("sort_order", default="asc")

    if sort_col not in ALLOWED_SORT_COLS:
        sort_col = "id"

    users = list(collection.find())
    sorted_users = sorted(users, key=lambda u: u[sort_col], reverse=(sort_order == "dec"))
    # Redaction can stay, but it must not be relied on as a control for ordering leaks.
    # ...

The remediation was exactly what it should be, a clean allowlist that blocks sensitive sort columns, the right root cause, and a fix that actually removes the ordering leak.

What It Got Right (and What It Didn’t)

To be fair to my 2023 self, there’s nuance here worth acknowledging.

Claude found the vulnerability and described the mechanism accurately. But it did not:

write a working exploitation script end-to-end
deal with the “real world” constraints that turn an oracle into an actual weapon

In the original post, the interesting part wasn’t “an oracle exists.” The interesting part was making it practical.

tuning the approach to operate within rate limits
using prefix constraints to shrink the rockyou.txt candidate set aggressively
finishing the job without turning the target into a denial-of-service experiment

That gap matters.

The uncomfortable part is that the finding, the “wait a second… that sort column is leaking information it should not be,” took a language model about two and a half minutes.

What This Means for Penetration Testing

I have been doing this long enough to remember when automated scanners were supposed to replace pentesters too. They didn’t, because the bugs that matter aren’t “missing header” bugs.

The interesting vulnerabilities are the ones where:

the app is technically doing what it was designed to do, and
the logic of it is the problem

The sort-order oracle is a great example. It requires you to notice subtle behavior, build a mental model of what that behavior implies, and then ask “what can I infer from this ordering alone?”

What changed is that Claude Code seems to have learned enough of that mindset to close the gap on triage. It explored the codebase the way a good tester would, traced the data flow, and made the leap without being told where to look.

That is genuinely impressive. And, speaking purely for my keynote slides, deeply inconvenient.

Final Thoughts

That original post ended with me being smug about AI not taking my job. That line aged about as well as the password storage in the vulnerable app.

The manual methodology still matters. Understanding why an oracle works, how to instrument it efficiently, how to tune an attack to operate inside real-world constraints, and how to communicate impact clearly to a client. That is still security expertise.

But the moment of discovery? The “wait, that’s leaking” moment that used to require a very particular kind of attention? That is now something a language model can do quickly when it has access to your codebase and enough freedom to explore.

I’m not worried about my job yet, though I did delete that slide.

Written by

Matthew Keeley

Founder of PlatformSecurity and veteran security expert with over a decade of experience in offensive security. Recognized penetration testing specialist who has uncovered critical vulnerabilities in Fortune 500 companies, cloud infrastructure, and enterprise applications. Expert in red team operations, cloud security, and vulnerability research with a track record of responsible disclosures and high-impact security findings.