BLOG

What Happens During a Penetration Test?

Step-by-step penetration test process showing scoping, scanning, exploitation, reporting, and retesting

What Happens During a Penetration Test?

From the first scoping call through to the final retest, a practical look at what a modern pen test actually involves.

Most organisations know they should be getting a penetration test. Fewer have a clear picture of what actually happens once the engagement starts, and that gap tends to cost them in wasted budget, missed remediation work, and a lingering sense that they paid for something they never quite understood.
The numbers make the case for doing it properly. Vonahi Security’s 2025 Top Pentest Findings Report, based on more than 50,000 automated internal network tests, attributed 50% of critical findings to misconfigurations and 30% to missing patches. Astra’s 2025 State of Continuous Pentesting flagged an 83% jump in critical vulnerabilities year-over-year. And Pentera’s 2025 State of Pentesting Report found that 96% of organisations change their IT environment at least quarterly, which means annual-only testing now leaves most organisations blind for most of the year.
To get real value from a pen test, though, the engagement has to be set up properly. What follows is a walkthrough of what actually happens during a modern pen test, from the first scoping call through to the final retest. The structure mirrors the seven phases used across the Penetration Testing Execution Standard (PTES), NIST SP 800-115, and the OWASP Testing Guide.

Cybersecurity penetration test process covering reconnaissance, enumeration, exploitation, and reporting

A pen test vs. A vulnerability scan

A pen test is an authorised, manual attack against specific systems. The goal is to prove which weaknesses can actually be exploited, and show what an attacker could do once they are in. A vulnerability scan does something useful, but it is not the same thing. Three differences matter most:

  • A scan flags issues that might exist. A pen test shows which ones can actually be exploited.
  • A scan gives you a spreadsheet. A pen test gives you a story: how the attacker got in and how far they got.
  • A scan is finished in minutes. A pen test takes days or weeks of hands-on work.

Pen tests are usually described by two things. The first is scope: external network, internal network, web application, API, mobile, cloud, wireless, physical, social engineering, or a full red team. The second is how much the tester knows going in: black box (nothing), grey box (some information, often credentials for a regular user), or white box (full architectural access). Grey box is the most common choice for commercial work because it strikes a sensible balance between realism and getting useful results within a reasonable timeframe.

BEFORE THE FIRST PAYLOAD: PRE-ENGAGEMENT

Every legitimate pen test starts with paperwork, not payloads. This phase locks down the scope (which IPs, domains, applications, and accounts are in play), rules of engagement (what is off-limits, what hours testing can happen, how fragile systems are handled), written authorisation, objectives, a plan for who to call when something breaks, and rules for how findings are stored and transmitted.

According to Pentera’s 2025 State of Pentesting Report, only 29% of organisations now commission penetration testing primarily for compliance. Most are testing to validate their security controls, help prioritise where to invest, or work out what a real attack would actually cost them. For many organisations, the scoping phase is also where they assess whether their cybersecurity services provider understands the business context behind the test, not just the technical checklist. The scoping conversation should reflect that.

The seven phases: a step-by-step walkthrough

The phases below describe a typical engagement. Individual providers may combine or split steps, but the underlying flow (recon, scan, analyse, exploit, escalate, report, retest) is remarkably consistent across frameworks.

PHASE 1: Reconnaissance (Intelligence Gathering)

Before any packets hit your network, the tester spends time putting together a picture of your organisation. There are two flavours of this work: passive and active.

Passive reconnaissance uses only public sources: WHOIS records, DNS data, Shodan, Censys, social media, leaked credential databases, regulatory filings, GitHub repositories. No traffic touches your systems at this stage. Active reconnaissance then starts lightly probing the infrastructure itself to identify live hosts and services.

What testers care about at this point: employees (names, roles, email formats), technology stack, exposed assets, third-party dependencies, and anything that may have leaked in a prior breach. On external engagements this is where real attackers spend most of their time, and it is often where defenders have the least visibility.

PHASE 2: Scanning and Enumeration

Now the tester maps out the attack surface in more detail. Tools like Nmap, Masscan, Nessus, and Burp Suite are common network security tools used to identify open ports, running services, software versions, network paths, segmentation, web application endpoints, authentication mechanisms, and session handling across the network security system.

Enumeration takes scanning a step further. It is where testers coax services into giving up usernames, file shares, directory structures, and configuration details. A lot of the most damaging findings start here, because misconfigurations and outdated services are the root cause of most breaches. Vonahi Security’s 2025 analysis of more than 50,000 automated penetration tests put 50% of critical findings down to misconfigurations and 30% to missing patches.

PHASE 3: Vulnerability Analysis and Threat Modelling

Raw scanner output is noisy. A good tester will not just hand over the scanner’s list. They validate each finding, look at how the issues relate to one another, and rank them by real risk. Threat modelling asks a better question: given this target’s business and the way its environment is set up, what attack paths would a real adversary actually go after? In that sense, this phase supports a broader cybersecurity risk assessment by showing which weaknesses matter most in practice, not just on paper.

The output of this phase is usually a working hypothesis. Something along the lines of: we think we can reach the customer database by chaining A, then B, then C. That hypothesis is what the next phase tries to prove.

PHASE 4: Exploitation

This is the phase most people picture when they hear the words pen test, and it is often the shortest. Testers try to exploit vulnerabilities they have already validated, so they can prove those issues are real rather than theoretical. The techniques vary: credential attacks (password spraying, hash cracking, Kerberoasting), web application exploitation (injection, broken access control, server-side request forgery), network-layer attacks (mDNS, NBNS, and LLMNR spoofing, NTLM relay), and social engineering where it is in scope.

Vonahi’s 2025 Top Pentest Findings Report found mDNS spoofing in 60.5% of environments tested, NBNS spoofing in 57.9%, and LLMNR spoofing in 52.0%. Exploitation is the phase where those weaknesses stop being theoretical risks on paper and become something the tester can actually demonstrate in front of your team.

One important point: exploitation is done carefully. Production systems are fragile. A responsible tester will not run denial-of-service attacks, destructive payloads, or anything that could corrupt data, unless that has been specifically agreed in scope.

PHASE 5: Post-Exploitation

Getting in is the start, not the end. Post-exploitation answers a more important business question: what can an attacker actually do with that foothold? The work at this stage covers privilege escalation (moving from a regular user account to admin), lateral movement (pivoting from one compromised system to others), persistence (simulating how an attacker would keep access across reboots), and data access (showing that sensitive data, such as customer records, intellectual property, or credentials, was reachable).
This is where the real risk story usually shows up. Testers will often chain several low-severity issues together into a single high-impact path. Astra’s 2025 State of Continuous Pentesting reported an 83% year-over-year rise in critical vulnerabilities, many of them emerging from low-severity issues that became the launchpad for deeper access once chained together. In practice, the risk rarely comes from one single critical bug. It comes from the combination.

PHASE 6: Reporting

The report is the deliverable. A good one is written for two audiences at the same time.

Executives want an executive summary: business impact in plain language, an overall picture of the risk posture, and a short list of prioritised recommendations. Engineers need the technical detail: each finding with reproduction steps, evidence (screenshots, request and response pairs, hashes), a CVSS or equivalent risk rating, and guidance on how to fix it — especially where remediation may require input from a network security architect to address underlying design or segmentation weaknesses.

Pentera’s 2025 data shows 62% of organisations immediately pass findings to their IT security teams for remediation, 47% share the results with executives or senior management, and 21% hand findings to regulators or the board. The report has to work for all of those readers at once.

PHASE 7: Remediation and Retesting

A pen test without a retest is homework without a grade. Once the fixes are in place, the tester runs the original exploits again to check that the issue is actually closed, and that the fix has not just moved the problem somewhere else.

Remediation timelines are longer than most organisations like to admit. Edgescan’s 2025 data puts the average mean-time-to-remediate for critical and high-severity application vulnerabilities at 74.3 days, and 54.8 days for network and device vulnerabilities. Retest windows range from 30 days (the HackerOne standard) up to 90-day structured remediation programmes used by more mature security teams. Whatever the window, the work is not finished until someone has verified the fixes.

COMMON MISTAKE: TREATING THE REPORT AS THE FINISH LINE

A frequent cause of failed pen test engagements is treating the delivery of the report as the finish line. In practice, the report is closer to the halfway mark. Verizon’s 2025 Data Breach Investigations Report found only around 54% of edge-device and VPN vulnerabilities were fully remediated, with a median of 32 days to remediate. If you pay for a test but have not planned time and people for the remediation and retest work, most of the value stays on the page.

What Pen Testers Actually Find

Vonahi Security’s 2025 Top Pentest Findings Report, based on more than 50,000 automated internal network tests, shows how consistent the top critical findings have become. The three most common are all protocol-level name-resolution spoofing attacks. These are not patchable CVEs. They are configuration weaknesses that most scanners flag only as informational, but that attackers use to steal credentials and move laterally across the network.

Penetration test walkthrough explaining each phase from pre-engagement to remediation and retesting

WHAT THIS PATTERN TELLS YOU

These are not complex, zero-day exploits. They are protocol and configuration weaknesses that can be fixed with registry changes, group policy settings, and disabling legacy services no one uses anymore. Vonahi’s own analysis attributes 50% of critical findings to misconfigurations and 30% to missing patches. Which means most organisations are not being breached in pen tests because the attackers are unusually clever. They are being breached because the defenders never closed the easy doors. That is actually good news. The work is known, not mysterious.

How Often Should You Test?

PCI DSS requires at least one penetration test a year, plus another after any significant change. A proposed 2025 update to HIPAA pushes toward mandatory annual testing as well. That cadence is a floor, though, not a target to aim for.

Pentera’s 2025 research found 96% of organisations change their IT environment at least every quarter. Against that kind of change rate, testing once a year leaves long periods where new issues simply build up unchecked. That is why continuous or quarterly testing, often delivered as Penetration Testing as a Service (PTaaS), is becoming more common. Bright Defense’s 2026 stats compilation found over 70% of firms now use PTaaS, and 85% have increased their pen test spending in the past year.

How Armour Approaches Penetration Testing

At Armour Cybersecurity, we scope every engagement around a business question, not a tool list. We work to PTES, NIST SP 800-115, and OWASP, test manually where that matters, and keep retesting until the fix holds. Every report is written for two readers at the same time, the CISO and the engineer, because a risk rarely gets fixed if the people who need to fix it do not really understand it.

If you are about to commission your first pen test, or thinking about changing providers, the most useful place to start is the scoping conversation. Many organisations comparing security as a service providers use that conversation to judge whether the engagement will be shaped around real business risk rather than a generic testing checklist. What are you actually trying to protect? What would a successful attack really cost you? And how will you know the fix has worked?


Frequently Asked Questions

Q.  How long does a penetration test take?

A focused web application or external network test typically takes one to two weeks of active testing, plus another week to write up the report. Larger internal, cloud, or red team engagements normally run between four and eight weeks.

Q.  Will the test break anything?

A responsible tester will not run destructive payloads, and they will coordinate with you in real time on anything that carries meaningful risk. Any systems known to be fragile are flagged during pre-engagement and either taken out of scope or handled with extra care.

Q.  Do I need a pen test if I already run vulnerability scans?

Yes. Scanners produce a list of potential issues. A pen test shows which of those issues can actually be exploited, and how far an attacker could get once in. Manual, human-led testing typically finds issues that automated scans miss entirely, particularly in APIs, cloud configurations, and chained exploit paths.

Q.  Black box, grey box, or white box, which is better?

White box is usually the most thorough. Black box is the most realistic. Grey box sits in the middle (the tester gets partial information, such as a standard user account) and for most commercial engagements it tends to offer the best balance of cost, realism, and coverage.

Q.  Is one test a year enough?

For most environments, no. If your infrastructure, applications, or third-party integrations change faster than your test schedule, you are effectively flying blind between engagements. Continuous or quarterly testing is closing that gap for a growing number of mid-market and enterprise organisations.

Q.  What should a good pen test report contain?

It should have an executive summary written in plain language, a methodology and scope section, detailed findings with reproduction steps, evidence, and risk ratings, prioritised remediation guidance with clear ownership, and a retest schedule. The appendices should include a change log and the artefacts (payloads, screenshots) that back up each finding.


Leave the first comment