Introduction
Building a recovery program with a team of five is not a thought experiment. It is the reality for most security leaders outside the Fortune 500. You have a small team, a constrained budget, and a board that wants to know you can survive a ransomware hit without paying the ransom. The pressure is real and the margin for error is thin.
Most recovery frameworks were designed by consultants who have never had to justify a $40,000 tabletop exercise to a CFO who thinks backups are an IT problem. NIST SP 800-61 and the NIST Cybersecurity Framework's Recover function are solid foundations, but they assume you have people to staff the roles they describe. With five people, you are the incident commander, the communications lead, and the post-incident reviewer. Sometimes on the same day.
This article is about building a recovery program that actually works at your scale. Not a program that looks good in a board deck and falls apart the first time a storage admin accidentally deletes a production database. The goal is a program that degrades gracefully under pressure, recovers faster than your business loses money, and gives your board a metric they can actually understand.
Browse the Full Cybersecurity Market: 118 Categories, 9,000+ Tools.
Define Recovery Before You Build Anything
Most teams skip this step. They buy a backup tool, write a runbook, and call it a recovery program. Then an incident happens and nobody agrees on what "recovered" means. Is it when the systems are back online? When data integrity is confirmed? When the business declares normal operations?
You need a shared definition before you build anything. Recovery means the business can operate at an acceptable level of function within a defined time window. That definition has to come from the business, not from your team.
Start with two numbers: Recovery Time Objective (RTO) and Recovery Point Objective (RPO). Get them in writing from business unit owners, not from IT. A 4-hour RTO for your ERP system means something very different to the CFO than it does to the sysadmin who has to execute it. Align on those numbers before you spend a dollar on tooling.
Map Your Five People to Recovery Functions, Not Job Titles
With a team of five, you cannot afford role ambiguity during an incident. Everyone needs a primary function and a backup function. The org chart does not matter. What matters is who does what when the phone rings at 2am.
A practical mapping for a five-person team looks like this:
- Incident Commander (IC): Owns the response timeline, makes escalation calls, communicates with leadership. Usually the CISO or VP.
- Technical Lead: Owns system isolation, forensic preservation, and recovery execution. Your most senior engineer.
- Communications Lead: Owns internal and external messaging, legal coordination, and regulatory notification timelines. Can be a security analyst with strong writing skills.
- Recovery Operator: Executes the actual restore procedures, validates data integrity, and confirms system function. Your second engineer or a senior ops person.
- Documentation Lead: Maintains the incident timeline, captures decisions, and owns the post-incident report. Often underestimated. This role protects you legally and operationally.
Every person should know their backup role. If your Technical Lead is unavailable, who runs recovery execution? Document it. Test it. The answer cannot be "we figure it out."
Your Backup Architecture Is a Recovery Architecture Decision
Most backup strategies are designed for convenience, not recovery speed. Nightly full backups to a NAS that lives on the same network segment as your production systems is not a recovery architecture. It is a false sense of security with a bill attached.
With a small team, you need a backup architecture that reduces the cognitive load during an incident. That means immutable backups, offsite or cloud-isolated copies, and tested restore procedures that any team member can execute without tribal knowledge.
The 3-2-1-1 rule is a practical starting point: three copies of data, on two different media types, with one offsite copy, and one air-gapped or immutable copy. The last "1" is what ransomware cannot touch. If your current architecture does not include an immutable copy, that is your first capital ask.
Tools like Veeam, Cohesity, and Rubrik all offer immutable backup capabilities. The differentiator for a small team is not features. It is how fast your least experienced team member can execute a restore from a clean copy without calling the vendor.
Runbooks Are Not Documentation. They Are Decision Trees.
A runbook that says "restore from backup" is not a runbook. It is a placeholder. Real runbooks answer the questions your team will actually ask during an incident: Which backup? From what point in time? How do you validate integrity before you reconnect to the network? Who approves the reconnection?
Write runbooks as decision trees, not prose. Each step should have a clear action, a success condition, and a branch for what to do if the success condition is not met. If a step requires judgment, name the person who makes that call.
For a five-person team, you need runbooks for at least these scenarios:
- Ransomware: full environment encryption, unknown patient zero
- Data destruction: accidental or malicious deletion of critical data
- Cloud account compromise: loss of access to cloud-hosted workloads
- Key person unavailability: your Technical Lead is unreachable during an active incident
- Third-party failure: a critical SaaS vendor goes down or is breached
Keep runbooks in a location that is accessible when your primary systems are down. A SharePoint folder that requires SSO to access is not accessible during an identity compromise. Print them. Store them in a cloud tenant that is isolated from your production environment. Both.
Tabletop Exercises With Five People Look Different Than You Think
Most tabletop guidance assumes you have a dedicated incident response team, a legal team in the room, and a communications function that is not also your security analyst. With five people, the tabletop is smaller, faster, and more focused on decision quality than process compliance.
Run tabletops quarterly. Each one should test a different failure mode. Do not run the same ransomware scenario four times a year. Rotate through your runbook scenarios. The goal is not to pass the exercise. The goal is to find the gaps before an attacker does.
Invite one business stakeholder to each tabletop. Rotate through the CFO, General Counsel, and a business unit head over the course of a year. This does two things: it forces your team to communicate in business terms, and it gives your board sponsors firsthand exposure to what recovery actually requires. That exposure is worth more than any board presentation you will ever give.
Track one metric from each tabletop: time to decision on a critical branch point. If your team takes 45 minutes to decide whether to pay a ransom or restore from backup, that is a gap. The goal is to get that decision time under 15 minutes through pre-authorization and documented decision criteria.
What to Tell the Board Without Lying to Them
Boards want to know two things about your recovery program: can you survive an attack, and how long will it take. They do not want to hear about your backup architecture or your NIST alignment. They want a number and a confidence level.
Give them both. A board-ready recovery metric looks like this: "Based on our last tested restore, we can recover our five most critical systems within [X] hours of a ransomware event, with a data loss window of no more than [Y] hours. We test this capability quarterly." That is it. That is the whole slide.
If you cannot say that with confidence, do not fake it. Boards have a long memory for security leaders who overpromised and underdelivered during an incident. Tell them where you are, what the gap is, and what you need to close it. That is a more credible conversation than a green dashboard that falls apart under scrutiny.
Budget Reality: What a Five-Person Recovery Program Actually Costs
A functional recovery program for a mid-market organization with a five-person team does not require a seven-figure budget. It requires disciplined prioritization and a willingness to say no to tools that do not directly reduce your recovery time or data loss window.
A realistic annual budget breakdown for a lean recovery program:
- Immutable backup solution (cloud or on-prem): $30,000 to $80,000 depending on data volume
- Incident response retainer (external IR firm): $20,000 to $50,000 for a named-account retainer
- Tabletop facilitation (external, 1-2 per year): $10,000 to $25,000
- Recovery testing and validation (internal labor + tooling): $15,000 to $30,000
- Documentation and runbook tooling: $5,000 to $10,000
Total range: $80,000 to $195,000 annually. That is a defensible number for a board conversation.
The IR retainer is the line item most small teams skip. Do not skip it. When you are five people and a ransomware event hits on a Friday night, you need a phone number that connects to a team that has done this before. The retainer pays for itself the first time you use it, and it signals to your board that you have thought through the scenario where your team is overwhelmed.
Entropy Is the Real Threat to Your Recovery Program
Recovery programs do not fail because they were never built. They fail because they were built once and never maintained. Runbooks go stale. Backup jobs fail silently. The engineer who knew the restore procedure left the company. The immutable copy retention policy was quietly shortened to save storage costs.
Build entropy checks into your calendar, not your good intentions. A monthly 30-minute review of backup job success rates, a quarterly runbook review tied to any infrastructure change, and an annual full recovery test are the minimum maintenance schedule for a program that will actually work when you need it.
Assign ownership explicitly. One person on your team owns backup integrity. One person owns runbook currency. If everyone owns it, nobody owns it. With five people, you cannot afford that ambiguity.
Frequently Asked Questions
Translate recovery capability into business downtime cost. If your organization loses $50,000 per hour of ERP downtime, a 12-hour recovery gap represents $600,000 in exposure. Frame the budget ask as the cost of reducing that exposure to 2 hours, not as the cost of a backup tool. CFOs understand expected loss calculations better than they understand security frameworks.
Conclusion
A recovery program built by five people can outperform one built by fifty if it is designed for the team that actually has to run it. The difference is not budget or headcount. It is clarity: clear roles, clear runbooks, clear metrics, and a board conversation that does not require a translator. Start with your RTO and RPO commitments, map your team to recovery functions, and build the entropy checks that keep the program from degrading quietly over time. The goal is not a perfect program. The goal is a program that works on the worst day your organization has ever had.
Stop Guessing About Vendor Health. Start Querying It with MCP.
