How to Do a Knowledge Audit in 30 Minutes
You suspect your team has a knowledge problem. Maybe someone left recently and things got rough. Maybe you noticed the same questions getting asked over and over. Maybe you just have a gut feeling that too much lives in too few heads.
You're probably right. But "we have a knowledge problem" is too vague to act on. You need specifics. Which knowledge? Whose heads? How exposed are you?
That's what a knowledge audit gives you. And you don't need a consultant or a six-week project to do one. You need 30 focused minutes and a spreadsheet.
Here's exactly how.
Before You Start (2 Minutes)
Open a spreadsheet. Create five columns:
| Process / System | Owner(s) | Backup? | Documentation? | Risk Level | |---|---|---|---|---|
That's it. No fancy templates. No project management tool. A spreadsheet.
Now set a timer for 28 minutes.
Step 1: List Your Critical Processes (8 Minutes)
Write down every process, system, or knowledge area that would cause problems if the person responsible disappeared tomorrow. Don't filter. Don't rank. Just list.
Start with these prompts:
Deployments & Infrastructure:
- How does code get to production? Who can deploy?
- Who manages your cloud infrastructure? AWS/GCP accounts, Terraform, Kubernetes?
- Who handles DNS, SSL certificates, domain registrations?
- Who knows how to restore from backups? Has anyone actually tested it?
Critical Systems:
- Which services would cause a P0 incident if they went down?
- Who understands the authentication/authorization system?
- Who built the data pipeline? Who fixes it when it breaks?
- Who manages third-party integrations (payment processor, email provider, analytics)?
Tribal Knowledge:
- Which parts of the codebase are "here be dragons" — areas nobody touches without asking a specific person?
- Are there manual processes that happen on a schedule that only one person knows about?
- What vendor relationships depend on a single person's contacts?
- Which historical decisions would be impossible to understand without asking someone?
Business-Critical Skills:
- Who handles incident response? What if they're on vacation?
- Who manages database migrations? Schema changes?
- Who can debug performance issues in production?
- Who knows how to operate your internal tools (admin panels, data scripts, reporting)?
You should have 15-30 items. If you have fewer than 10, you're not thinking hard enough. If you have more than 40, group related items together.
Step 2: Map the Owners (7 Minutes)
For each item, write down who holds the knowledge. Not who's "responsible" on paper — who actually knows how to do it.
Rules:
- Use real names, not team names. "The platform team" doesn't count. Which specific person on the platform team?
- If multiple people can truly do it (not just theoretically), list them all.
- Be honest. "James could probably figure it out" is not the same as "James has done this before and can do it independently."
Now fill in the Backup column. For each item, is there someone who could step in within 24 hours if the primary owner was unreachable?
- Yes = someone has actually done this task before
- Partial = someone could figure it out with effort, but would be slow
- No = if the owner is gone, this process stops
Step 3: Check Documentation Status (5 Minutes)
For each item, rate the documentation:
- Good = written docs exist, they're current (updated in the last 6 months), and someone other than the author has used them successfully
- Stale = docs exist but haven't been updated in 6+ months — may be misleading
- Minimal = a README or wiki page exists but it's surface-level, missing the "why" and edge cases
- None = no documentation at all
Be ruthless here. A Confluence page from 2023 that describes a system architecture that's been rewritten twice is worse than no documentation — it's actively misleading.
Step 4: Score the Risk (5 Minutes)
For each row, assign a risk level based on this matrix:
🔴 Critical — Single owner + No backup + No documentation. If this person leaves, the process breaks and nobody knows how to fix it.
🟠 High — Single owner + Partial backup OR single owner + Stale documentation. You could recover, but it would be painful and slow.
🟡 Medium — Multiple owners but poor documentation, OR single owner with good documentation. The knowledge exists in more than one form, but there's still concentration risk.
🟢 Low — Multiple owners + Good documentation. The team can handle this even if someone leaves.
Step 3: Read the Results (3 Minutes)
Your timer should be close to done. Look at what you've got.
Count your reds. Every red item is a ticking clock. Not a matter of if it causes problems, but when.
Look for name concentration. If one person's name appears in the Owner column more than 5 times across critical processes, that's your biggest single point of failure. They're not a hero — they're a risk.
Check for false greens. Documentation that says "deploy using the standard process" isn't documentation. Systems where the "backup" person has never actually done the task solo aren't backed up.
What to Do With Your Audit
You now have something most engineering leaders don't: a concrete map of where knowledge risk lives.
Immediate actions (this week):
- Pick your top 3 red items. These are urgent.
- Schedule 30-minute knowledge transfer sessions with the owners. Not "write docs" — sit with them while they walk through the process. Record it, transcribe it, turn it into a playbook.
- For any red item where the owner is a flight risk (low engagement, hasn't been promoted recently, been in role 3+ years), escalate to top priority.
Short-term (this month): 4. Address all remaining red items. 5. Start working through orange items, prioritizing by business impact. 6. Establish a cadence: 1-2 knowledge transfer sessions per week. Make it a recurring calendar event.
Ongoing: 7. Re-run this audit quarterly. It takes 30 minutes. No excuse not to. 8. Track your red count over time. It should be going down. 9. Add knowledge coverage to your team health metrics alongside velocity and reliability.
The Spreadsheet Is a Starting Point
A spreadsheet tells you where the gaps are. It doesn't fill them.
The hard part is actually extracting knowledge from people's heads and turning it into something useful. That's where most teams stall — the audit reveals 15 red items, the team does two knowledge transfer sessions, and then it falls off the priority list because feature work is always more urgent.
Understudy automates the extraction part. Instead of scheduling sessions and hoping someone takes good notes, Understudy works with your engineers to capture their knowledge conversationally and structures it into playbooks that are actually usable. The audit tells you what to capture. Understudy does the capturing.
Your Audit Template
Copy this into a spreadsheet and start filling it in. Right now. Before you close this tab and forget.
| Process / System | Owner | Backup | Docs | Risk | Notes | |---|---|---|---|---|---| | Production deployments | | | | | | | Database migrations | | | | | | | Incident response | | | | | | | CI/CD pipeline | | | | | | | Cloud infrastructure | | | | | | | Auth system | | | | | | | Data pipeline | | | | | | | Third-party integrations | | | | | | | Monitoring & alerting | | | | | | | Customer data exports | | | | | |
Thirty minutes. One spreadsheet. You'll know more about your team's knowledge risk than most CTOs learn in a year.