How I Assess Engineering Health in My First Week at a New Org
The diagnostic framework I run every time I join a new engineering organization, and why the data is only half the picture.
The first week at a new company is a lot. You're meeting strangers, trying to read rooms you've never been in, figuring out who the actual decision-makers are and how decisions get made. For me, as an introvert, there's also the sheer exhaustion of performing extroversion for eight hours a day while trying to simultaneously form a coherent picture of a place I've never seen before.
And everyone is watching. The people above you want to know if they made the right hire. The people who report to you are wondering whether their jobs are safe, whether you're going to blow up the way they've been doing things, whether you're someone who actually gives a damn or just another executive who issues mandates from a conference room. You try not to let that pressure paralyze you. But you feel it.
The thing that helps me most is having a framework. Not because frameworks give you answers, but because a good framework gives you the right questions to ask. The two main questions I'm trying to answer in week one each open into several more questions, and the answers to those won't come from the data. They'll come from the team. The data just tells me where to look and what to ask about.
Here's how I do it.
The Two Questions
Everything I'm trying to learn in the first week boils down to two things: what is the state of quality, and how healthy are the individual Product Engineering teams?
Those sound related, and they are, but they're measuring different things. Quality is about the product: How much defect debt has accumulated? How are fixes being prioritized? Are bugs being worked on in any coherent order? Team health is about execution: Can these teams actually deliver what they say they're going to deliver, and if not, what's getting in their way?
I use both because either one alone tells an incomplete story. A team executing beautifully might be doing so while the product quietly burns down around them. A product that's relatively stable might be maintained by a team that's one bad quarter away from collapse. You need both pictures.
First, the Tooling
Before I run a single calculation, I spend time in whatever ticketing system the org is using and I just look around. How many projects are there relative to the number of teams? How are bugs being categorized? Are there priority labels, and does anyone seem to be using them consistently?
This sounds mundane, but it tells you something before any numbers do. An org that has carefully structured its tooling to reflect how it actually works is a different kind of organization than one that crammed everything into a single project and built seventeen custom dashboards to compensate. You can see the history of decisions people made and abandoned just by looking at the archaeology of the project structure.
One thing I've formed a strong opinion on over the years: severity and priority should not both be in play for bug classification. I've seen orgs use both, and what you end up with is a combinatorics problem. High severity, low priority. High priority, low severity. It becomes harder to reason about, not easier. I collapse those into a single priority axis (urgent/P0, high/P1, medium/P2, low/P3) and work from there.
Assessing Quality
For quality, I'm looking at week-over-week trends. How many bugs are open total? How many were opened this week? How many were resolved? What's the net delta? Then I break all of that down by priority. How many P0s are open? P1s? What's changing week to week?
The parallel question, and this one matters a lot, is which priorities are actually getting worked. It's a remarkably common pattern that organizations fix the wrong things. Lower-priority issues get closed out while urgent and high-priority bugs sit open for weeks or months. Nobody made a decision to do that. It just happens, because individuals are making judgment calls without any system enforcing priority order. Nobody is managing the queue.
When I see that pattern, it tells me several things: there's no triage process, individual engineers are deciding on their own what to pick up, and nobody in leadership has visibility on what's actually being worked on. But those are all symptoms of the same root problem: Nobody is accountable for ensuring the most important things get done. That role exists in every healthy Product Engineering org, and when it's vacant, you get a queue that sorts itself by whatever is easiest or most interesting, not by what actually matters. And it doesn't stay contained to Product Engineering. Customer Support is responsible for managing unhappy customers, and when those customers' complaints aren't being addressed, Support can't do their job. Sales runs into the same wall. Eventually it trickles up to executive leadership, and what started as a prioritization gap becomes a trust problem. The Product Engineering org loses credibility with the rest of the company. Everything else you see in the data flows from that gap.
I also pay attention to where bugs are coming from. If almost everything in the ticketing system was filed by engineers, that's strange. You'd expect support teams, QA, or customers to be the primary sources. When engineers are filing most of the bugs, it often means there's a shadow process somewhere. Bugs are being reported through some other channel: a spreadsheet someone built, a shared inbox, the customer support tool that Support and Sales are living in, or an entirely separate ticketing system that another department decided to stand up on their own. Only occasionally does any of it make it into the official system, and only when an engineer happens to create a ticket. That shadow system is worth finding, because it's where the real backlog lives.
Assessing Team Health
For individual Product Engineering teams, I'm looking sprint over sprint rather than week over week. The core metric I care about most is predictability: how much work did the team commit to at the start of the sprint, and how much did they actually complete?
A healthy team delivers somewhere between 85% and 115% of what they committed to. Too far under means they don't have a grip on their velocity, or work is being added mid-sprint and eating their capacity. Too far over usually means they're padding estimates, or the commitment target is so conservative that any reasonable sprint will sail past it. Both directions are worth investigating.
Predictability is also genuinely hard to game, which I appreciate. When deployment frequency became a headline DORA metric, some teams just started deploying more often without the underlying reliability improvements the metric was supposed to represent. Predictability is a relationship between two numbers (commitment and completion) and manipulating that relationship requires coordination across the whole team in ways that tend to fall apart. It's not a perfect metric, but it's more honest than most.
From there, I want to know how much of the completed work was planned at the start of the sprint versus added mid-sprint. Unplanned work is not inherently bad. Sometimes a critical bug lands and you deal with it. But the ratio matters. A team where 40% of completed work was unplanned has a disruption problem, and that disruption is probably coming from somewhere specific. Maybe it's customer escalations. Maybe it's an executive who keeps dropping things on the team. Maybe it's a quality problem generating constant reactive work. The metric points you toward the right questions.
I also track points added to and removed from sprints after they begin. This one reveals something about organizational culture that tends to be uncomfortable to surface: most places love to add work mid-sprint and hate to remove it. The implicit assumption is that the team just absorbs whatever gets added. But teams have a fixed capacity and prioritization is a hard choice. Work added without anything being removed rolls over to the next sprint, affects how much new work can be taken on, and compounds from there. Watching that pattern over several sprints tells you whether the organization actually respects the team's capacity or just treats sprint planning as a rough suggestion.
I look at velocity as a rolling three-sprint average rather than a single-sprint number. Any given sprint's velocity is noisy. Someone's sick, there's a holiday, scope changed late. The rolling average smooths that out and gives you a more honest baseline to plan against.
Finally, cycle time. I want to know how long work is spending in each phase of the delivery process: in progress, in review, waiting to deploy. The goal isn't to make all the numbers small. It's to find the bottlenecks. If code review time is twice as long as implementation time, something is creating friction in review. Maybe there are too few reviewers. Maybe the PRs are too large. Maybe one person is doing all the reviews and they're buried. The metric doesn't tell you the cause, but it tells you where to look.
The Organizational View
Once I have the individual team picture, I pull it up to the org level. I plot all the teams' metrics on the same charts, which makes outliers obvious in a way they wouldn't be if you were looking at each team in isolation.
At the org level, the two numbers I talk about publicly are organizational predictability (the average across all teams) and organizational percentage of planned work completed. The second one is worth raising openly because it sends a signal to the whole company: if 40% of what we delivered last sprint was unplanned work, that's a disruption problem that affects everyone, not just Product Engineering. Planned quality improvements get deprioritized. Promised bug fixes slip, degrading customer trust. Predicted delivery dates become unreliable. And when in-progress projects fall behind, other departments absorb the cost too. Marketing may have already invested weeks preparing launch content for a feature that's now delayed, time they could have spent on higher-value work had they known sooner. Left unchecked, it derails the roadmap entirely and can put revenue plans at risk. That's a useful conversation to have in the open.
Individual team performance is a different matter. Cross-team metrics are useful for coaching, but not for broadcasting. Putting comparative performance data in front of the whole company without context creates exactly the wrong dynamic: teams start optimizing for how they look rather than how they're actually doing. The right venue for this is the sprint review, and the right approach is to open it by sharing what the metrics are showing and asking the team what they think. What do you see here? Is there something we should be focusing on improving? Start with questions, not conclusions. That framing turns the data into a conversation starter rather than a verdict.
You can't walk into a room with bad graphs and tell a team they're failing. Or, technically, you can, but you won't be there much longer if you do. My preferred approach is Socratic. I share what I'm seeing and ask what the team thinks is happening. "Your cycle time in review is notably higher than in progress. What's your sense of why that is?" They usually know. They've been living with it. What they didn't have was someone asking the question and treating the answer as worth addressing.
That matters for a reason beyond just being diplomatic about it. If I solve every problem myself, I've made myself indispensable, and that's exactly the wrong outcome. The goal is to teach people to read these signals themselves, to build the habit of asking these questions every sprint, to create a team that can self-correct without me in the room. The sprint review, done right, is where that habit forms.
The Data Gets You to the Door
I've refined this diagnostic approach across a lot of organizations over a lot of years, and at this point I can build the spreadsheet in my sleep. But I want to be clear about what it actually is: a tool. The methodology gives me the right questions to ask. It gives me almost no answers. The answers come from experience, from having seen the same patterns play out enough times that when the data points in a certain direction, I have a pretty good idea of what I'm looking at. But recognizing a pattern and knowing exactly how to address it in this organization, with these specific people, inside this particular culture, are two very different things. The first tool I reach for might not be the right one here. That part still requires doing the work.
None of this touches the technical side of the house directly. Deployment pipelines, production infrastructure, development environments, architectural decisions: those are also part of the role, and eventually they become a significant part of it. But trying to assess all of that in week one is the wrong sequence. The framework tends to surface the technical problems anyway. When cycle time in the deployment column is consistently an outlier, that's usually an infrastructure conversation waiting to happen. When unplanned work keeps spiking, it often traces back to architectural debt generating a steady stream of reactive work. When defect volume is high and resolution is slow, sometimes the answer is process, but sometimes the answer is that the system is just hard to work in. The data points you toward those doors. Then you go look.
The methodology is the easy part. The hard part is figuring out which conversations to have, in what order, with people you've known for less than a week. You have a picture of what's broken, but no relationships yet, no trust yet, and no real sense of who in the room is ready to hear what. Coaching someone toward a better way of working looks completely different from lecturing them about what they're doing wrong, and the line between those two things is mostly about whether the person on the other end of the conversation feels supported versus scrutinized.
If the first thing people feel when you walk in is that they're being watched and judged, you've already failed, regardless of how accurate your metrics are. If you walk in with genuine curiosity, about the work and about the people doing it, and you back that curiosity up with data, you have a shot at actually changing something.
That's the whole thing, really. The rest is just spreadsheets.
Raleigh Schickel has spent 15+ years leading engineering organizations. He is building an engineering health diagnostic platform designed as a coaching tool, not a surveillance tool. Find him on LinkedIn and Substack.