Steven's Knowledge

Technical Decisions

How to tell compelling stories about making, defending, and learning from technical decisions in interviews

What interviewers are evaluating

When they ask about technical decisions, they want to see:

  1. Structured thinking — you consider options systematically, not impulsively
  2. Trade-off awareness — you understand that every choice has costs
  3. Context sensitivity — your decisions account for team size, timeline, and constraints
  4. Outcome ownership — you own both successes and failures
  5. Learning orientation — wrong decisions taught you something

The question is never really "did you pick the right technology?" It is "how do you make decisions under uncertainty?"

Decision-making framework

Use this structure when preparing and telling decision stories:

1. Problem definition

What problem were you solving? Why did it matter? What were the constraints?

"Our API response times had degraded from 200ms to 1.2s over six months as data grew. Our largest customer was threatening to leave. We had four weeks before their contract renewal."

2. Options considered

What alternatives did you evaluate? Show that you did not just jump to the first solution.

"We considered three approaches: (A) add caching layer with Redis, (B) rewrite the slow queries and add indexes, (C) migrate from PostgreSQL to a read-optimised store like DynamoDB for the hot path."

3. Decision criteria

What factors did you weigh? Common criteria:

CriterionExample
Time to implement"We had four weeks, not four months"
Team expertise"Nobody on the team had DynamoDB experience"
Reversibility"Caching is additive; migration is not"
Maintenance cost"A cache layer adds operational complexity"
Risk"Query rewrite might not be enough if data doubles again"
Business alignment"We needed results before the renewal date"

4. The decision

What did you choose and why? Be specific about the reasoning.

"We chose option B (query rewrite) as the immediate fix with option A (caching) as a follow-up. The reasoning: query analysis showed three specific queries causing 80% of the latency — they were missing composite indexes and doing unnecessary joins. This was fixable in days, not weeks. Caching would mask the problem without solving it, and DynamoDB migration was too risky given the timeline and team experience."

5. Outcome and learning

What happened? What would you do differently?

"Response times dropped to 180ms within a week. We implemented caching for the most-hit endpoints the following month, bringing p99 to 50ms. In hindsight, we should have set up query performance monitoring earlier — the degradation was gradual and we only noticed when it became critical."

Common decision scenarios

Build vs. buy

One of the most common technical decisions. The trap is being dogmatic either way.

Story template:

We needed [capability]. Options: build custom, use open-source [tool], or buy [SaaS product].

Build: full control, fits exactly, but 3 months dev + ongoing maintenance. Open-source: free, community-supported, but needs customisation and we own the ops. Buy: fastest to integrate (days), managed service, but $X/month and we are locked to their API.

We chose [X] because [specific reasoning tied to context].

Key insight for interviews: The right answer depends on context. "We built it because we could" is weak. "We built it because our requirements diverged significantly from any available tool and we had a dedicated team to maintain it" is strong.

Technology migration

Moving from one stack to another. Shows long-term thinking and execution planning.

Good story structure:

  1. Why the current system was not working (concrete problems, not just "it was old")
  2. How you evaluated alternatives (not just picking the trendy thing)
  3. The migration strategy (big bang vs. strangler fig vs. parallel run)
  4. How you managed risk (rollback plan, feature flags, incremental rollout)
  5. The outcome (including what went wrong)

Example:

Our monolithic Rails app was hitting scaling limits — deploy times were 45 minutes, test suite took 2 hours, and a bug in one module could take down the entire application. We evaluated three approaches: (1) modular monolith with better boundaries, (2) full microservices, (3) extract the two highest-traffic domains into services, keep the rest.

We chose option 3. Full microservices for a 6-person team would be over-engineering — the operational overhead would consume us. The modular monolith was tempting but would not solve the deploy-time or blast-radius problems. Extracting just the payment and notification domains gave us 80% of the benefit with 20% of the effort.

We used the strangler fig pattern: new features in the extracted services, old code gradually retired. The migration took 8 months. Deploy times for the extracted services dropped to 3 minutes. The main learning: we underestimated the complexity of data consistency between the monolith and the new services. If I did it again, I would invest more upfront in defining the data contract.

Performance vs. maintainability

When clean code is too slow, or fast code is too complex.

Example:

Our recommendation engine processed user history to generate personalised feeds. The clean, readable version using array methods processed one user in 50ms — fine for a single request, but we needed to batch-process 100k users nightly. At 50ms/user, that was 83 minutes.

I rewrote the hot path to use pre-allocated buffers, bitwise operations for set intersections, and streaming I/O. The code was harder to read but processed one user in 0.8ms — the full batch ran in 80 seconds.

The trade-off: I added extensive comments explaining the optimisations, wrote a comprehensive test suite as a safety net, and kept the readable version as a reference implementation in a separate file. I also wrote a benchmark suite so future engineers could verify that changes did not regress performance. The decision was right for this case because the code changes rarely (stable algorithm), the performance gain was 60x, and the batch job was on a critical path.

When to take on tech debt

Shows pragmatic engineering judgment.

Framework: Tech debt is acceptable when:

  • The timeline is genuinely fixed and the cost of delay is high
  • The debt is isolated and containable (not systemic)
  • You have a concrete plan to repay it (not "we will fix it later")
  • The team understands and agrees

Example:

We were launching a new feature for a conference demo in two weeks. The ideal implementation required a new permissions model that would take four weeks to build properly. I proposed a shortcut: hard-code the permissions for the demo audience using a feature flag, ship the feature, then build the real permissions model in the following sprint.

I documented the debt explicitly: created a ticket, tagged it as tech-debt, set a calendar reminder for the sprint after launch, and added a code comment linking to the ticket. The demo went well, we signed three new customers, and we repaid the debt in the next sprint as planned.

I would NOT take this approach if the debt was in authentication/security code, if it affected data integrity, or if there was no concrete timeline to fix it.

One-way vs. two-way door decisions

A framework from Amazon that resonates well in NZ interviews.

Two-way doors (reversible): choice of library, API response format, internal tooling, feature flag rollout. Decide quickly, iterate.

One-way doors (irreversible or very expensive to reverse): database schema for high-traffic tables, public API contracts, data deletion, choosing a cloud provider for a multi-year commitment.

Interview application:

I categorise decisions as reversible or irreversible. For our internal dashboard tooling, I chose Retool quickly — if it does not work, we switch in a week. But for our public webhook API format, I spent three weeks gathering feedback from integration partners before finalising, because changing it later would break every customer's integration.

Discussing decisions that were wrong

This is where growth shines. Strong candidates can articulate:

  1. What you decided — clearly, without hedging
  2. Why it seemed right at the time — the reasoning was sound given information available
  3. What actually happened — the outcome, without blame
  4. What you learned — specific and actionable
  5. What you do differently now — proof of growth

Example:

Early in my senior role, I chose to rewrite a working but messy authentication module from scratch rather than refactoring incrementally. My reasoning: the code was tangled, had no tests, and every change broke something. A rewrite seemed faster than untangling it.

It took three months instead of the estimated six weeks. During that time, we could not ship auth-related features, and we introduced two security bugs that the old code did not have. The lesson: working code, however ugly, encodes years of edge-case handling that is invisible until it is gone. I now default to incremental refactoring — boy scout rule, one improvement per PR — unless I can prove the rewrite is genuinely smaller than the accumulated refactoring. I also learned to timebox rewrites with clear abort criteria.

Practice questions

  1. "Tell me about a technical decision you made that you are proud of"
  2. "Describe a time you chose a simpler solution over a more sophisticated one"
  3. "Tell me about a decision that did not work out as planned"
  4. "How do you decide between building something custom vs. using an existing tool?"
  5. "Describe a time you had to make a decision with incomplete information"
  6. "Tell me about a time you changed your mind about a technical approach"

On this page