Buying security services is weird: you’re paying someone to try to break your systems, then you’re going to hand that report to the exact engineers who are already overloaded. If the engagement doesn’t result in clear fixes that actually land in production, you didn’t buy “security”—you bought a document.
There are a lot of security companies. Some run a high-volume model that looks like: scan, paste, ship. Others do deep, manual work that mirrors how real attackers move. Both can claim “pen test” on the website. The difference shows up in three places:
- Whether findings are real (validated, reproducible, and prioritized).
- Whether the report is usable (developer-ready, with remediation that fits your stack).
- Whether the vendor acts like a partner (communicates, adapts, and helps you close risk).
This guide is the checklist I wish every buyer used before signing an SOW.
Before You Talk to Vendors: Define the Outcome
Most vendor evaluations start with scope and pricing. That’s backwards. Start with the outcome you need, then work backwards into the right kind of engagement.
What are you trying to accomplish?
- A real security signal: “What would an attacker actually be able to do?” This is classic penetration testing: manual, exploit-driven, and focused on impact.
- A baseline hygiene sweep: “Are we missing patches/misconfigs across a big footprint?” That’s closer to vulnerability management than a pen test. If you’re mixing the two, reread penetration testing vs vulnerability scanning.
- A launch gate: “We need confidence before go-live.” That usually means tight scope + fast turnaround + engineering availability for fixes.
- A procurement/compliance milestone: “Our customers/assessor require this.” That’s fine—just don’t let “paperwork” drive the entire testing strategy. If PCI is in play, align expectations early with what assessors typically want in a report (see PCI DSS pentesting requirements).
If you’re trying to build this into a broader program (not a once-a-year scramble), you’ll also want to think about how testing fits into your AppSec operating model. Our longer take is in Building an AppSec Program.
Make a short list of what “good” looks like
Write this down and share it with every vendor so you can compare apples-to-apples:
- Top 5 questions you want answered (examples below)
- Success criteria (what would make you renew next year?)
- Who will consume the output (engineering? leadership? customers? assessor?)
- Constraints (time window, environments, no-downtime systems, legal)
Here are example “top questions” that lead to useful testing:
- “Can an external attacker get to our customer data?”
- “Can a low-privileged user become admin?”
- “If one service account is compromised, what’s the blast radius?”
- “Can our auth/session model be bypassed in a way scanners won’t catch?”
- “Can an attacker pivot from the app into cloud/IAM and take over infrastructure?”
1. Technical Depth: Are You Buying Human Testing or a Scanner Report?
High quality penetration testing services rely on human judgment. The easiest way to spot depth is to ask questions that can’t be answered with marketing fluff.
Ask who actually does the work (and get specific)
- Will I speak to the testers directly? If not, assume details will get lost.
- What will the team composition be? (seniority mix, who leads, who reviews)
- Can you share a sanitized example finding that shows exploit validation + remediation?
- What kinds of issues do you typically find in our stack? (not “OWASP Top 10”—your stack)
Certifications are fine, but don’t over-index on them. What matters is whether the team consistently produces validated findings and understands modern stacks (SSO, JWT, OAuth flows, cloud IAM, CI/CD, Kubernetes, modern web frameworks, etc.).
Evaluate methodology (how they test)
Good vendors can articulate how their testing adapts. Look for:
- Manual verification: they don’t dump unverified scanner output on you.
- Exploitation and chaining: they prove impact (within safe rules of engagement).
- Attack-path thinking: they map how a bug becomes a breach, not just “bug exists.”
- Modern auth expertise: identity and session flaws are where a lot of real risk hides.
If you want a quick gut check, ask:
“How do you decide what to try next when the obvious things don’t work?”
Teams that only have a playbook will struggle here.
Ask what “evidence” looks like
For a finding to be actionable, it needs enough detail to reproduce. Ask whether reports include:
- The exact request/response evidence (or step-by-step reproduction)
- Screenshots only where helpful (not as a substitute for technical proof)
- Clear impact statements (“what can an attacker do?”)
- Severity that matches your risk model, not a generic CVSS-only approach
2. Reporting Quality: Can Your Engineers Fix This Without a 2-Week Back-and-Forth?
An engagement succeeds or fails on the report. The best technical testing in the world still creates zero security value if the output is confusing, overly generic, or impossible to reproduce.
What a strong report looks like
- A short executive summary that prioritizes the few things that truly matter.
- Developer-ready findings with reproduction steps, evidence, and practical fixes.
- Prioritization by exploitability and business impact, not “everything is high.”
- Clear remediation guidance that fits your tech stack, not copy-pasted generic advice.
- A clear timeline and communication artifacts (what was tested, when, what was blocked)
The fastest way to assess report quality
Ask for a sanitized sample report and review one or two findings with an engineer on your team. You’re looking for:
- Is it immediately reproducible?
- Does it explain “why this matters” in plain language?
- Does it propose a fix you’d actually ship?
- Does it avoid busywork (e.g., “add more logging”) as the primary remediation?
Communication during the test
Agree up front on:
- Real-time notification for critical findings (and what qualifies as “critical”)
- A channel for blockers (access, environment issues, scope questions)
- A readout / walkthrough that includes the actual testers
If you’ve done pen tests before, you know the failure mode: a vendor disappears for two weeks, then drops a PDF on a Friday.
3. Scope & Engagement Design: The Contract Shapes the Outcome
Scope isn’t just a boundary; it’s the thing that determines whether you get depth or a superficial pass.
Avoid “scope by asset list” only
An SOW that is just “these URLs and IPs” can miss the real risk. Encourage scope that reflects how you actually get breached:
- user roles and permission boundaries
- critical workflows (payments, auth flows, admin actions)
- sensitive data stores and paths to reach them
- integration points (SSO, webhooks, third parties)
- cloud/IAM pivot potential (where applicable)
If you’re unsure how to prepare scope and access, reuse the checklist in How to prepare for a penetration test.
Clarify what type of test you’re buying
Vendors use different words for similar things. Nail down what you want:
- Web app / API testing (manual testing of business logic, auth, data access controls)
- Cloud testing (IAM, permissions, misconfigurations, attack paths) via cloud penetration testing
- Adversary simulation / red team (objective-based, stealthy, broader) via red teaming
If a vendor claims they do all of these “equally well,” ask how they staff each one.
Retesting and “fix validation”
Retesting is where you actually reduce risk. Ask:
- Is retesting included? If so, what’s the time window?
- Do you retest only fixed items, or do you also validate adjacent attack paths?
- What evidence is provided to confirm the fix (not just “closed”)?
4. Trust, Safety, and Data Handling: Don’t Treat This as Boilerplate
You are granting a third party deep access to your systems. Treat vendor operational maturity as part of the evaluation.
Minimum expectations
- Clear rules of engagement (what will/won’t be done, test hours, no-go systems)
- Secure data handling (where evidence is stored, how long it’s retained, who can access it)
- Clear escalation process for critical findings
- Jurisdiction / contract clarity that matches your risk/compliance needs
Ask a blunt question:
“Where will my data live, who can access it, and when is it deleted?”
If the answer is fuzzy, that’s a red flag.
Red Flags: How “Pen Tests” Quietly Become Useless
Not every red flag is malicious—some are just misaligned incentives. But they all lead to the same outcome: low signal.
- They won’t let you talk to the testers.
- They can’t explain their methodology beyond buzzwords.
- They won’t share a sample report (even sanitized).
- They over-promise speed (“full app pen test in 2 days”) without explaining trade-offs.
- They sell you volume (huge scope) without confirming engineering bandwidth to remediate.
- They push scanners as “pen testing.”
- Everything is rated high (or everything is rated medium). Both usually indicate weak prioritization.
- They avoid discussing retesting because “that’s a separate project.”
Questions to Ask on the Sales Call (Copy/Paste)
Technical & staffing
- Who will lead the engagement and how senior are they?
- Will we have direct access to the testers during and after the test?
- What does your testing methodology look like for our stack?
- Do you validate findings manually and prove exploitability where safe?
- How do you handle auth/session/business-logic testing?
Deliverables & communication
- Can you share a sanitized sample report?
- What does “critical finding” escalation look like?
- Do we get a readout with the testers?
- What does remediation guidance include (code-level suggestions, config, compensating controls)?
Scope & logistics
- What do you need from us on day 1 to avoid blockers?
- How do you handle scope changes mid-engagement?
- Is retesting included, and what’s the window?
- How do you handle sensitive data in evidence?
Pricing & fit
- What is explicitly out of scope at this price?
- What would cause a change order?
- What does a “small” engagement look like for a startup vs an enterprise?
Quick Procurement Checklist
Use this to compare vendors consistently.
- People: Named lead tester, senior reviewer, and clear communication path.
- Method: Manual testing + validation + chaining (where safe) documented.
- Report: Sample report reviewed; developer-ready reproduction and remediation.
- Process: Kickoff, in-test comms, critical escalation, and readout defined.
- Retest: Included terms and timing.
- Safety: Rules of engagement + data retention/deletion policies.
- Fit: Scope aligns to your outcomes (not just an asset list).
Conclusion
The best security companies act like an extension of your engineering team. They don’t just find problems; they prove impact, help you prioritize, and give you fixes you can actually ship.
If you’re looking for a boutique partner focused on depth, communication, and actionable outcomes, take a look at our penetration testing services (including red teaming and cloud penetration testing), or reach out to us to discuss scope and timing.