Penetration Testing: What It Is, What to Expect, and How to Get Value From It

The short answer: A penetration test is a structured, authorised attempt to exploit your systems using the same techniques a real attacker would use. Done well, it finds vulnerabilities that automated scanners consistently miss, including business logic flaws, chained attacks, and privilege escalation paths, and produces actionable evidence that drives real security improvements. Done poorly, it produces a list of scanner output with CVSS scores that no one acts on. The difference is mostly in how you scope and manage the engagement.

What Penetration Testing Is (And What It Isn't)

A penetration test is a manual, skilled security assessment conducted by trained engineers. It is not:

A vulnerability scan with a human watching
A checklist exercise that produces a compliance certificate
A guarantee that no vulnerabilities remain after the test

A penetration test has defined scope, defined duration, and produces evidence-based findings, meaning the tester has demonstrated that the vulnerability is exploitable, not just that it theoretically exists. This distinction matters enormously when prioritising remediation.

Types of Penetration Test

By Knowledge Level

Black box: The tester has no prior knowledge of the system, only a target (URL, IP range, or application name). This simulates an external attacker with no insider knowledge. Black box testing is the most realistic simulation of an opportunistic external attack but covers less ground per day of testing than approaches with more context.

Grey box: The tester has partial knowledge, typically user-level credentials and some documentation. This is the most common approach for web application testing because it allows meaningful testing of authenticated functionality without spending test time on reconnaissance that is less relevant to the risk profile.

White box: The tester has full access to source code, architecture documentation, infrastructure diagrams, and admin credentials. White box testing is the most thorough and efficient use of testing time, allowing the tester to find vulnerabilities that would take weeks to discover from the outside. It is particularly valuable for identifying source-level vulnerabilities and complex multi-component attack paths.

By Target

Web application penetration testing: The most common type. Tests web applications for OWASP Top 10 vulnerabilities, authentication flaws, authorisation issues, session management, business logic, and client-side security.

API penetration testing: Tests REST, GraphQL, and SOAP APIs. Focuses on OWASP API Security Top 10, particularly BOLA, broken authentication, and excessive data exposure. Often conducted as part of web application testing but increasingly scoped separately.

Infrastructure / network penetration testing: Tests network infrastructure, internal systems, and cloud configuration. Identifies misconfigured services, unpatched systems, weak credentials, and lateral movement opportunities.

Mobile application penetration testing: Tests iOS and Android apps for insecure data storage, weak cryptography, insecure communication, and improper authentication. Requires device access and specialist tooling (Frida, objection, apktool).

How to Scope a Penetration Test Properly

Poor scoping is the most common reason penetration tests fail to deliver value. Over-scoped tests spread effort too thin. Under-scoped tests miss the highest-risk components.

Define the target clearly. List specific URLs, IP ranges, API endpoints, or application modules in scope. Define what is explicitly out of scope (production databases, third-party services, infrastructure owned by others).

Specify the testing approach. Black, grey, or white box, and which credentials or documentation the tester will have.

Define the threat model. Who are you trying to defend against? An external attacker with no prior access? A malicious insider? A compromised third-party vendor? The threat model should drive what the tester focuses on.

Set realistic time allocation. A web application with 50+ endpoints cannot be thoroughly tested in one day. As a rough guide, plan for 3–5 days for a medium-complexity web application, 5–10 days for a large application or API, and 5–10 days for an internal network assessment.

Agree on rules of engagement. Define permitted testing techniques, what constitutes a stop condition (actual data exfiltration, service disruption), out-of-hours testing permissions, and who to contact if a critical vulnerability is found during testing.

Understanding the Findings Report

A professional penetration test report contains:

Executive summary: A non-technical overview of the engagement, key findings, and overall risk rating suitable for senior stakeholders.

Findings list with CVSS scores: Each finding is rated using the Common Vulnerability Scoring System (CVSS), which produces a score from 0–10 indicating severity. Critical (9.0–10.0) and High (7.0–8.9) findings require immediate attention. Medium (4.0–6.9) and Low (0.1–3.9) findings should be addressed in your normal remediation cycle.

Evidence of exploitation: For each finding, the report should include screenshots, request/response examples, or other evidence demonstrating that the vulnerability is actually exploitable, not just theoretically possible. If a finding lacks exploitation evidence, push back and ask for it.

Remediation guidance: Specific technical guidance on how to fix each finding. Generic recommendations like "implement input validation" are insufficient. You want specific guidance tied to your technology stack.

Getting Value From the Results

The penetration test report is the beginning of the work, not the end. Most of the value comes from what happens after.

Triage by actual risk, not just CVSS score. CVSS scores rate the vulnerability in isolation. Your actual risk depends on context: how exposed is the endpoint? What data could be accessed? What compensating controls exist? A medium-CVSS finding on a public-facing payment endpoint may be higher priority than a high-CVSS finding on an internal admin panel with restricted access.

Fix, retest, verify. For critical and high findings, fix the issue and request a retest within the engagement window. A penetration test without retest verification is a finding list, not a security improvement.

Use findings to drive systemic changes. Penetration test findings are symptoms. The same SQL injection pattern appearing in three endpoints indicates a systemic developer training issue, not three isolated bugs. Use findings to identify training needs, process gaps, and architectural issues that need addressing at the root.

Track over time. Run penetration tests annually at minimum, comparing findings between engagements. The goal is not zero findings (which is unrealistic for any complex application) but a decreasing trend in critical and high findings over time.

Key Takeaways

Penetration testing is manual, skilled, adversarial assessment, not automated scanning
Grey box testing is most efficient for web applications; white box is most thorough overall
Scope precisely: undefined scope produces unfocused testing and weak findings
Demand exploitation evidence for every finding. Theoretical vulnerabilities have limited remediation priority
The report is the start of the work: triage by real risk, retest fixes, and use findings to drive systemic improvements