David Shortland

Building a Mathematics Interpreter in F#: From Parser to Symbolic Calculus

David Shortland — Fri, 17 Apr 2026 12:00:00 GMT

Most university programming projects are CRUD apps or data pipelines. This one was a compiler.

For the Advanced Programming module at UEA, we built a mathematics interpreter from scratch in F#: a system that takes a string like d/dx(sin(x^2) + 3/4), parses it into a tree, differentiates it symbolically, simplifies the result, and plots it on an interactive graph.

This post walks through the decisions that shaped the interpreter, what made them interesting, and what I learned about building systems that are designed to grow.

The Pipeline

Every interpreter follows roughly the same shape. Raw text goes in, structured meaning comes out. Ours has four stages:

Interpreter pipeline diagram

The lexer breaks the input into tokens. The parser arranges those tokens into a tree that represents the mathematical structure. The evaluator walks the tree and computes a result. Each stage knows nothing about the others.

The feedback arrows at the bottom are what make this more than a calculator. The AST can be fed back into the pipeline: transformed by the differentiator into a new AST, or re-evaluated hundreds of times by the plotter at different x-values. That reuse is the most important property of the architecture, and it was not the one we started with.

The Decision That Changed Everything

The original stub we were given combined parsing and evaluation into a single pass. The parser would see 3 + 4, and instead of building a tree node, it would immediately compute 7. This works for simple arithmetic, but it creates a ceiling. You cannot differentiate a number. You cannot plot 7. You need the structure of the expression, not just its result.

Our first major decision was to separate these concerns completely. The parser returns an abstract syntax tree:

type Expr =
    | Number of NumberType
    | Variable of string
    | BinaryOp of BinaryOperator * Expr * Expr
    | UnaryOp of UnaryOperator * Expr
    | FunctionCall of string * Expr
    | VectorLiteral of Expr list
    | MatrixLiteral of Expr list list

This is a discriminated union in F#, essentially a type that says "an expression is one of these seven things." A BinaryOp contains an operator and two sub-expressions, which are themselves Expr values. It is trees all the way down.

Here is what the tree looks like for 2 + 3 * x:

AST example diagram

The parser respects operator precedence. Multiplication binds tighter than addition, so Mul sits lower in the tree. When the evaluator walks this tree depth-first, it naturally evaluates 3 * x before adding 2. The structure encodes the mathematics.

This separation unlocked everything that followed. Symbolic differentiation works by transforming one AST into another. Integration evaluates the same AST at hundreds of points. The GUI stores the AST and re-evaluates it whenever the user pans or zooms the graph. None of that is possible if you throw away the structure during parsing.

The principle generalises: separate what something means from what you do with it. Parse once, use many times. It is the same idea behind domain models in backend systems, and it shows up everywhere once you start looking.

Two-Phase Lexing

The lexer has a subtle problem to solve: is - subtraction or negation?

In 3 - 5, it is subtraction. In 3 * -5, it is negation. In (-5), it is negation. The character is identical but the meaning depends on context. Our lexer handles this with a two-phase approach:

Phase 1 tokenises everything naively. Every - becomes a Sub token.

Phase 2 walks the token stream and applies a context rule: if a Sub token appears at the start of the input, after an opening parenthesis, after an operator, or after an assignment, it gets reclassified as UnaryMinus.

This is cleaner than trying to handle it inline during scanning. Phase 1 does not need to track state. Phase 2 does not need to understand character-level parsing. Each phase has a single concern.

The same pattern applies to rational numbers. When the lexer sees 3/4, it needs to decide: is this the rational number three-quarters, or is it integer division? The rule: only treat / as a rational separator if the accumulator has no decimal point and the denominator is a valid integer. So 3/4 becomes Rational(3, 4), but 3.0/4 becomes Float(3.0) Div Int(4).

Parsing: Getting Precedence Right

The parser uses recursive descent with precedence climbing. Each precedence level gets its own function:

parseExpression  ->  handles + and -    (lowest precedence)
parseTerm        ->  handles * / %      (medium)
parseFactor      ->  handles ^          (highest binary)
parsePrimary     ->  handles atoms      (numbers, variables, functions, parens)

Lower-precedence functions call higher-precedence ones. parseExpression calls parseTerm for its operands, which calls parseFactor, which calls parsePrimary.

This naturally produces the correct tree shape. The input 2 + 3 * 4 parses as Add(2, Mul(3, 4)) because parseTerm grabs the multiplication before parseExpression can claim the 3.

There is one place where this pattern breaks: exponentiation.

Most operators are left-associative. 2 - 3 - 4 means (2 - 3) - 4. But exponentiation is right-associative. 2^3^2 must be 2^(3^2) = 512, not (2^3)^2 = 64. The difference between 512 and 64 is the kind of bug that passes casual testing and fails in production.

The fix is a one-line change in how the parser recurses:

| Pow :: tail ->
    let tokens', rightExpr = parseFactor tail   // recurse on parseFactor, not parseFactorRest
    (tokens', BinaryOp(Exponentiation, leftExpr, rightExpr))

Left-associative operators recurse on their own "rest" function, building the tree leftward. Right-associative operators recurse on the base function, building rightward. It is a small difference in code and a large difference in correctness.

Evaluation and the Symbol Table

Once the parser produces an AST, the evaluator walks it depth-first. The interesting part is how it handles state.

Mathematical expressions live in a context. After x = 5, the expression x + 3 should evaluate to 8. That context is the symbol table: a map from variable names to values. The question is how to manage it.

A mutable approach would store the symbol table as a shared object that the evaluator reads and writes. That works, but it makes the evaluation order matter in subtle ways and makes testing harder. Instead, we used F#'s immutable maps. The evaluator takes a symbol table in and returns a new one out:

let evaluateStatement (statement: Statement) (symbolTable: SymbolTable)
    : NumberType * SymbolTable =
    match statement with
    | ExpressionStmt expr ->
        let value = evaluateExpr expr symbolTable
        (value, symbolTable)                          // table unchanged
    | Assignment(varName, expr) ->
        let value = evaluateExpr expr symbolTable
        let newTable = Map.add varName value symbolTable
        (value, newTable)                             // new table returned

An expression evaluation never changes the table. An assignment returns a new table with the binding added. The old table still exists, unchanged. This means you can evaluate the same AST against different symbol tables without interference, which is exactly what the plotter does when it evaluates y = x^2 at hundreds of different x-values.

The functional threading pattern, taking state in and returning new state out, shows up constantly in well-designed systems. It is the same idea behind Redux reducers, event sourcing, and database transactions. The shape is always the same: (input, state) -> (output, newState).

Six Number Types

The interpreter supports integers, floats, rationals, complex numbers, vectors, and matrices. Each is a case in a single discriminated union:

type NumberType =
    | Int of int
    | Float of float
    | Rational of int * int
    | CustomComplex of float * float
    | Vector of float list
    | Matrix of float list list

The interesting problem is what happens when you add an integer to a rational, or multiply a float by a complex number. We implemented automatic type promotion: when two different types meet in an operation, the less general type promotes to the more general one.

Type promotion hierarchy

The promotion rules preserve precision where possible. Int + Rational stays Rational (exact arithmetic). Float + Rational converts the rational to a float (because the float already lost exactness). Anything + Complex promotes to complex. This means 5 + 1/2 produces Rational(11, 2), not Float(5.5).

Rationals auto-simplify through GCD:

let simplifyRational (num: int) (den: int) : int * int =
    let g = gcd num den
    let newNum = num / g
    let newDen = den / g
    if newDen < 0 then (-newNum, -newDen) else (newNum, newDen)

And a rational with denominator 1 collapses back to an integer: 6/3 evaluates to Int(2), not Rational(2, 1). The system always finds the most specific type that can represent the result.

This module is the largest in the project at 908 lines, and most of that is the combinatorial expansion of operations across type pairs. It is unglamorous code. But getting the edge cases right (division by zero in rationals, negative denominators, complex division by conjugate) is what makes the system trustworthy.

Symbolic Differentiation

This is the part of the project I am most proud of.

The computeDerivative function takes an AST and a variable name, and returns a new AST representing the derivative. It implements the rules you learn in calculus, but as recursive tree transformations:

Constant rule: The derivative of a number is zero.

| Number _ -> Number(Int 0)

Variable rule: The derivative of x with respect to x is 1. Any other variable is treated as a constant.

| Variable name when name = varName -> Number(Int 1)
| Variable _ -> Number(Int 0)

Product rule: d/dx[f * g] = f' * g + f * g'

| BinaryOp(Multiplication, left, right) ->
    BinaryOp(Addition,
        BinaryOp(Multiplication, computeDerivative left varName, right),
        BinaryOp(Multiplication, left, computeDerivative right varName))

Chain rule: d/dx[f(g(x))] = f'(g(x)) * g'(x)

| FunctionCall(funcName, argExpr) ->
    let innerDerivative = computeDerivative argExpr varName
    let outerDerivative = match funcName.ToLower() with
        | "sin" -> FunctionCall("cos", argExpr)
        | "cos" -> UnaryOp(Negation, FunctionCall("sin", argExpr))
        | "exp" -> FunctionCall("exp", argExpr)
        | "ln"  -> BinaryOp(Division, Number(Int 1), argExpr)
        ...
    BinaryOp(Multiplication, outerDerivative, innerDerivative)

The function handles 11 mathematical functions, the product rule, quotient rule, power rule (both constant and variable exponents), and composes them through the chain rule. It is entirely symbolic: the output is an AST, not a number.

But raw symbolic derivatives are ugly. The derivative of x^2 through the power rule produces 2 * (x^(2-1) * 1). Technically correct, but no human would write that. So there is a simplification pass:

| BinaryOp(Multiplication, Number(Int 1), right) -> simplifyExpr right
| BinaryOp(Addition, Number(Int 0), right) -> simplifyExpr right
| BinaryOp(Exponentiation, base_, Number(Int 1)) -> simplifyExpr base_
| BinaryOp(Exponentiation, _, Number(Int 0)) -> Number(Int 1)

These rules apply recursively until the expression stops changing. After simplification, 2 * (x^(2-1) * 1) becomes 2 * x. The simplifier also folds constants (3 + 4 becomes 7) and flattens nested multiplications (2 * (3 * x) becomes 6 * x).

What makes this architecturally interesting is that none of it would work without the AST decision from earlier. Differentiation is tree transformation. If the parser had evaluated expressions immediately, there would be no tree to transform.

Root Finding: Newton-Raphson

The interpreter finds roots of functions using the Newton-Raphson method. The algorithm is elegant: start with a guess, evaluate the function and its derivative at that point, and step in the direction the derivative suggests.

x_next = x_current - f(x_current) / f'(x_current)

Each iteration typically doubles the number of correct digits (quadratic convergence). Our implementation runs up to 500 iterations per starting point with a tolerance of 1e-10.

The trick is that a single starting point might only find one root, or converge to the wrong one, or diverge entirely. Our solution: generate 1000 evenly-spaced initial guesses across the search interval, run Newton-Raphson from each one, discard failures, and de-duplicate the results. It is brute-force in the search space but precise in the convergence.

This approach reuses the symbolic differentiation. The derivative needed by Newton-Raphson is computed from the AST, not approximated numerically. That means the convergence is exact to the precision of floating-point arithmetic, not limited by a finite-difference step size.

The GUI: Making Mathematics Interactive

The WPF GUI is where the architecture becomes tangible. A user types y = sin(x), hits Plot, and sees the curve appear. They click Derivative and the orange cos(x) overlay draws on top. They type bounds and see the integral shade blue beneath the curve. They click Find Roots and markers appear at the zeros.

Behind every one of those interactions is the same AST being reused. The plot evaluates it at hundreds of x-values. The derivative button calls computeDerivative on it, producing a new AST that gets plotted the same way. The integral evaluates it at the trapezoidal quadrature points. The root finder passes it (and its symbolic derivative) to Newton-Raphson.

One decision that made this work smoothly is deferred evaluation. When the user types y = x^2 + 3, the interpreter detects that x is a free variable and stores the AST without evaluating it. There is no error, no prompt for a value. The expression waits until the user gives it a context, either by clicking Plot (which supplies hundreds of x-values) or by defining x later.

The plotting itself re-evaluates on interaction. When the user pans or zooms, the graph recalculates across the new viewport bounds. Because the AST is a lightweight data structure (not a closure, not a string to re-parse), this is fast enough to feel instantaneous. The user is directly manipulating the tree without knowing it.

The Takeaway

The single decision to separate parsing from evaluation turned a calculator into a computer algebra system. Every feature that followed, symbolic differentiation, integration visualisation, root finding, interactive plotting, was only possible because the parser preserved the structure of the input instead of collapsing it into a value.

That is the lesson I keep coming back to in software engineering: the abstractions you choose early determine what is easy and what is impossible later. Get the data model right and features fall out naturally. Get it wrong and every feature is a fight against your own architecture.

The interpreter is about 4,700 lines of code across F# and C#. The most important line is probably type Expr =. Everything else follows from that.

Building an Event-Driven Health Tracker with Three Lambda Functions

David Shortland — Mon, 13 Apr 2026 12:00:00 GMT

The brief was a health and fitness tracker. Log exercises, record meals, track weight, set goals. Standard full-stack coursework.

We could have built it as a monolith: one Express server that handles requests, sends emails, and checks for overdue goals all in the same process. It would have worked. But monoliths that send emails in the request path are fragile. If the email service is slow, the user waits. If it fails, the request fails. The user's goal achievement notification should not be coupled to whether AWS SES responded in time.

So we split the system into three independently deployable Lambda functions, connected by events. The API handles requests. A notification service sends emails asynchronously. A scheduled function checks for overdue goals every morning. Each one does its job without knowing how the others work.

This post is about that architecture: why we split it, how the pieces connect, and what the separation made possible.

The Architecture

Health Tracker architecture diagram

The system has two paths. The synchronous path handles user requests: the Angular frontend calls the API Gateway, which invokes the API Lambda, which reads and writes to MongoDB. This is straightforward.

The asynchronous path is where it gets interesting. When something notable happens in the API (a user registers, a goal is achieved, a group invitation is sent), the API publishes a message to an SNS topic and moves on. It does not send an email. It does not even know that emails exist.

The SNS topic delivers messages to an SQS queue. The queue triggers a second Lambda function that reads the message, renders an HTML email from a Liquid template, and sends it through SES. If this Lambda fails, the message stays in the queue and gets retried. The API never knows, and the user's request was never blocked.

A third Lambda runs on a CloudWatch Events schedule: once per day at 9 AM. It queries the API for goals past their target date, then publishes overdue notifications to the same SNS topic. The notification Lambda picks them up like any other event.

Three functions. One SNS topic. One queue. The event types are distinguished by their subject line, not by separate infrastructure.

Separating What Happens from When It Happens

The key design decision is that the API's job is to record what happened, not to decide what to do about it.

When a user logs a weight measurement and it happens to hit a goal target, the API does two things: save the measurement and mark the goal complete. Then it publishes a GoalAchieved event and returns the response. The API is done.

The notification Lambda, independently, picks up that event and decides what to do about it. It checks whether the user has verified their email. If they have, it renders a congratulations email with a suggested next goal (target incremented by a sensible amount). If they have not verified, it does nothing.

if (subj === goalAchievedSnsSbj) {
    const {email, htmlContent, emailVerified} = await render_goal_achieved(message, baseWebsiteUrl, engine);
    if (emailVerified) {
        await sendEmail(sender, email, "Goal achieved! Way to go!", htmlContent);
    }
}

This separation means the API controller code stays clean. The registration endpoint publishes a Registered event; it does not contain email rendering logic. The goal completion code publishes GoalAchieved; it does not know about suggested next goals. Each concern lives in exactly one place.

It also means the notification logic can change without touching the API. We added the "suggest a new goal" feature entirely within the notification Lambda. The API never had to be redeployed.

Auto-Completing Goals

The most interesting behaviour in the system is reactive: goals that complete themselves when the user logs data.

When a user creates a health metric (a weigh-in), the database service does not just save the record. It also checks every open weight goal for that user:

async createHealthMetricAsync(healthMetric: HealthMetricDocument) {
    let healthMetricDocument = await HealthMetrics.create(healthMetric);

    const weightGoals = await Goal.find({
        userId: healthMetric.userId,
        type: GoalType.WEIGHT,
        completed: false
    });

    const goals = await Promise.all(weightGoals.map(async goal => {
        goal.currentValue = lastHealthMetric.weight;
        if (goal.currentValue <= goal.targetValue) {
            goal.completed = true;
            goal.completedDate = new Date();
        }
        return await goal.save();
    }));

    return {healthMetric: healthMetricDocument, completedGoals: goals.filter(g => g.completed)};
}

The same pattern applies to exercise goals. Logging a run updates every open distance goal for that exercise type. Logging a workout updates every open duration goal. The controller then publishes GoalAchieved events for any goals that were completed, which flow through SNS to the notification Lambda.

From the user's perspective, they log a run and a few seconds later get an email saying they hit their 100km goal. From the system's perspective, five things happened in sequence: the exercise was saved, matching goals were queried, progress was updated, the controller published events, and the notification Lambda (asynchronously, separately) rendered and sent an email. Each step knows only about itself and the next.

The calorie calculation is a nice detail too. When an exercise is logged, the system pulls the exercise type's MET value and the user's most recent weight to calculate calories burned automatically:

const caloriesBurned = exerciseType.mET * 3.5 * lastHealthMetric.weight * exerciseDocument.duration / 200;

The user logs "30 minutes of running." The system returns the exercise record with calories already calculated, using physiologically accurate constants. No manual entry needed.

The Scheduled Lambda

The third Lambda runs on a cron schedule. Every day at 9 AM UTC, CloudWatch Events triggers it:

const overdueGoalsRule = new cdk.aws_events.Rule(this, `OverdueGoalsCheckRule-${stage}`, {
    schedule: cdk.aws_events.Schedule.expression('cron(0 9 * * ? *)'),
    description: `Trigger overdue goals check daily at 9 AM UTC`
});
overdueGoalsRule.addTarget(new cdk.aws_events_targets.LambdaFunction(checkOverdueGoalsLambda));

The Lambda itself is small. It calls the API's internal endpoint to find overdue goals and publishes notifications for each one. The notification Lambda handles the rest.

What makes this work is the internal API key. The three Lambdas share a secret generated by CDK and stored in Secrets Manager. The API authenticates requests in two ways: JWT tokens for user requests, and the raw API key for Lambda-to-Lambda communication. The overdue goals Lambda uses the API key to call the same API that the frontend calls, but with elevated access.

This keeps the overdue-checking logic in the API where it belongs. The scheduled Lambda is just a trigger. If the business rules for "overdue" change, only the API needs updating.

Group Goals

Groups add a social dimension. Users create groups, invite members via join codes, and set shared goals. The interesting part is how group goals work at the data level.

When someone sets a group goal, the system creates a separate goal record for every member:

async createGroupGoal(creatorId: string, groupId: string, goalData: {...}) {
    const group = await Group.findById(groupId);
    const groupGoalLink = new Types.ObjectId().toHexString();

    return await Promise.all(group.members.map(async memberId => {
        let currentValue = await this.calculateCurrentProgress(memberId, goalData);

        return await Goal.create({
            ...goalData,
            userId: memberId,
            isGroupGoal: true,
            groupId,
            groupGoalLink,
            currentValue,
            completed: currentValue >= goalData.targetValue
        });
    }));
}

Each member gets their own goal document with the same target but independent progress. A groupGoalLink ties them together so the UI can show group-wide progress. This means group goals work identically to personal goals from the database service's perspective. The auto-completion logic does not need a special case for groups. When a group member logs an exercise that completes their goal, the same GoalAchieved event fires, and a GroupGoalCompleted notification goes out.

The alternative would have been a single shared goal document that tracks multiple users' progress. That design sounds simpler until you need to handle a member leaving the group, or partial completion, or showing individual progress in the UI. Denormalising into one-goal-per-member made every downstream query simpler.

Composing the Infrastructure

The CDK stack defines all three Lambdas, the SNS topic, the SQS queue, and their permissions in a single TypeScript file. The pipeline builds all three services in parallel:

const buildApi = new pipelines.ShellStep(`BuildApi-${stage}`, {
    commands: ['cd HealthTrackerAPI', 'npm install', 'npm run build', 'npm run zip']
});

const buildNotificationLambda = new pipelines.ShellStep(`BuildNotificationLambda-${stage}`, {
    commands: ['cd HealthTrackerAPI.NotificationsLambda', 'npm install', 'npm run build', 'npm run zip']
});

const buildGoalOverdueLambda = new pipelines.ShellStep(`BuildGoalOverdueLambda-${stage}`, {
    commands: ['cd HealthTrackerAPI.OverdueGoalsLambda', 'npm install', 'npm run build', 'npm run zip']
});

Three independent builds feed into a single CDK synth step that composes them into one CloudFormation stack. The infrastructure references between them (API Gateway URL passed to the overdue Lambda, SNS ARN passed to the API) are wired through CDK constructs, not hardcoded strings.

The system deploys to two stages (dev and prod) with isolated resources. Each stage gets its own SNS topic, its own SQS queue, its own set of secrets, and its own MongoDB database. A bug in the notification template in dev cannot send emails to prod users. The stages share nothing except the pipeline that deploys them.

The Takeaway

The three-Lambda split was not about scalability. A monolith would have handled the load of a university project. It was about keeping each concern in its own box.

The API does not know how emails are sent. The notification Lambda does not know how goals are tracked. The scheduled Lambda does not know how overdue goals are identified. Each function has a single reason to change, and when it does change, the blast radius is limited to itself.

The event bus (SNS + SQS) is the contract between them. As long as the message format stays stable, any Lambda can be rewritten, redeployed, or replaced independently. That is the practical benefit of event-driven architecture: not performance, not scale, but the ability to change one part of the system without coordinating with every other part.

Building WeatherWise: A Weather Platform That Tells You What to Do

David Shortland — Sat, 04 Apr 2026 12:00:00 GMT

Every weather app shows you the same thing: temperature, humidity, wind speed, a little cloud icon. You look at the number, you decide what it means for your day, you close the app. The interpretation is your problem.

For the Advanced Web Development module at UEA, I built WeatherWise: a weather platform that does the interpretation for you. Instead of showing "UV index: 9" and leaving you to figure out what that means, it tells you to wear SPF 50+, stay out of direct sun between 10 and 4, and bring a hat. Instead of "wind: 45 km/h," it tells you to secure outdoor furniture and drive carefully.

The project scored 95%. I think the reason is that the interesting engineering is not in fetching weather data (that is just an API call) but in what happens between the data arriving and the user seeing it.

The Insight Engine

The core of the application is a rule-based recommendations system that evaluates raw weather data against a set of conditions and produces prioritised, actionable insights.

Insights engine diagram

Each insight has a category (safety, health, travel, activity), a priority level (high, medium, low), a description of the condition, and an action: the specific thing the user should do.

The rules are layered by severity:

High priority covers safety-critical conditions. Visibility below 1 km triggers fog driving advice. Temperature above 35 degrees triggers heat warnings with hydration targets. UV index 8 or above triggers specific sunscreen SPF recommendations and time windows to avoid.

Medium priority covers preparation. Rain in the forecast triggers umbrella and waterproof advice. But the engine also checks the time of day: if it is raining between 6 and 9 AM, it adds a commute-specific recommendation (leave 20 minutes early, check traffic apps, consider public transport). High humidity combined with high temperature triggers hydration alerts that would not fire for either condition alone.

Low priority captures opportunities. If the temperature is between 20 and 28 degrees, UV is below 6, and there is no rain, the system suggests outdoor activities. This only fires when none of the higher-priority conditions are active.

interface Insight {
  category: 'clothing' | 'activity' | 'travel' | 'health' | 'business' | 'safety';
  priority: 'high' | 'medium' | 'low';
  icon: React.ReactNode;
  title: string;
  description: string;
  action: string;
}

The insights are sorted by priority before rendering. High-priority items appear first with red indicators. Medium items are amber. Low items are green. If no conditions trigger, the component shows a positive "all clear" message instead of an empty state.

What makes this more than a series of if-statements is the compositional nature of the rules. The commute recommendation does not just check for rain. It checks for rain and a specific time window. The humidity alert does not just check humidity. It checks humidity and temperature, because 85% humidity at 15 degrees is not a health concern, but 85% humidity at 30 degrees is. The rules encode domain knowledge about when weather conditions actually matter to a person's day.

The Stack

This project uses a different stack from everything else in my portfolio, which was part of the point. The work projects and other university work are Angular. WeatherWise is Next.js 15 with React 19, PostgreSQL with Drizzle ORM, and Zustand for state management.

Next.js gave me the App Router for file-based routing with server components, API routes co-located with the pages that use them, and middleware for authentication guards. Drizzle gave me type-safe database queries that infer their types from the schema definition, so the TypeScript compiler catches query errors at build time rather than runtime. Zustand gave me a lightweight store without the boilerplate of Redux.

The combination means the type safety runs from the database schema through the API routes to the React components. A change to the schema propagates as compiler errors everywhere that data is used.

Authentication: No Passwords

WeatherWise uses Google OAuth exclusively. There is no registration form, no password field, no forgot-password flow. Users click "Continue with Google" and they are in.

This was a deliberate design decision, not a shortcut. Password authentication means storing hashed passwords, building reset flows, handling rate limiting, dealing with weak passwords, and accepting liability for credential storage. OAuth delegates all of that to Google. The database stores a user's name, email, and profile image. No secrets.

The NextAuth callback chain handles user creation automatically:

async signIn({ user, account, profile }) {
    const existingUser = await db.select().from(users)
        .where(eq(users.email, user.email));

    if (existingUser.length === 0) {
        await db.insert(users).values({
            name: user.name,
            email: user.email,
            image: user.image,
            preferences: { temperatureUnit: 'celsius', windUnit: 'kmh' }
        });
    } else {
        await db.update(users)
            .set({ name: user.name, image: user.image, updatedAt: new Date() })
            .where(eq(users.email, user.email));
    }
    return true;
}

First sign-in creates the user with sensible defaults. Subsequent sign-ins update the profile image and name (in case the user changed them on Google's side). The JWT session lasts 30 days. The schema evolved through three migrations to reach this design: the first version had a password field, the second added OAuth, the third removed passwords entirely.

State Management and Caching

The Zustand store manages the weather data cache, user preferences, saved locations, and loading state. The interesting part is how it handles multiple locations.

The dashboard loads weather for up to four saved locations in parallel:

const weatherPromises = locations.map(async (location) => ({
    locationId: location.id,
    weather: await fetch(`/api/weather/current?location=${lat},${lon}`).then(r => r.json())
}));

const results = await Promise.all(weatherPromises);

Each result is stored in a locationWeatherCache object keyed by location ID. When the user removes a location, its cached weather is pruned:

removeLocation: (locationId) => set((state) => ({
    locations: state.locations.filter(loc => loc.id !== locationId),
    locationWeatherCache: Object.fromEntries(
        Object.entries(state.locationWeatherCache)
            .filter(([key]) => key !== locationId)
    )
}));

The store also handles unit conversion. Rather than converting units at the component level (which scatters conversion logic across the codebase), the store provides helper methods:

getTemperatureInUnit: (tempC, tempF) => {
    return get().preferences.temperatureUnit === 'celsius' ? tempC : tempF;
}

Components call the helper. The preference propagates from one place. If I added a Kelvin option tomorrow, only the store would need to change.

Location Comparison

The comparison feature lets users place up to four saved locations side by side. Each location loads its weather in parallel, and the UI highlights the best and worst values for each metric.

The highlighting logic is context-aware. For temperature, higher is not necessarily better or worse, so it is left neutral. For UV and wind, lower is better. For visibility, higher is better. The system determines best and worst per metric and only applies colour highlighting when three or more locations are compared (with two, it is obvious).

This feature reuses the same weather API calls and Zustand cache as the dashboard. A location that was already loaded on the dashboard does not need to be fetched again. The comparison just reads from the store.

Geolocation and Fallbacks

When the dashboard loads, it requests the browser's geolocation with a 5-second timeout:

navigator.geolocation.getCurrentPosition(
    (position) => {
        const { latitude, longitude } = position.coords;
        if (savedLocations.length === 0) {
            loadWeatherForLocation(latitude, longitude);
        }
    },
    (error) => {
        if (savedLocations.length === 0) {
            loadWeatherForLocation(51.5074, -0.1278); // London fallback
        }
    },
    { enableHighAccuracy: false, timeout: 5000, maximumAge: 0 }
);

The fallback strategy has two layers. If the user has saved locations, those take priority over geolocation entirely, since the user has already told the system what they care about. If they have no saved locations and geolocation fails (permissions denied, timeout, or unavailable), it falls back to London. The user always sees weather data, never an empty screen.

Locations are stored and queried by latitude and longitude rather than city name. This avoids ambiguity ("Portland" could be Oregon or Maine) and gives precise results from the weather API.

The Data Model

The database has two tables. Users store authentication data and preferences as a JSON column:

export const users = pgTable('users', {
    id: uuid('id').defaultRandom().primaryKey(),
    name: varchar('name', { length: 255 }),
    email: varchar('email', { length: 255 }).unique().notNull(),
    image: text('image'),
    preferences: json('preferences').$type<{
        temperatureUnit: 'celsius' | 'fahrenheit';
        windUnit: 'mph' | 'kmh';
    }>().default({ temperatureUnit: 'celsius', windUnit: 'kmh' }),
    createdAt: timestamp('created_at').defaultNow(),
    updatedAt: timestamp('updated_at').defaultNow()
});

Locations use decimal precision to seven places (roughly 1 centimetre accuracy) and cascade-delete with their user:

export const locations = pgTable('locations', {
    id: uuid('id').defaultRandom().primaryKey(),
    userId: uuid('user_id').references(() => users.id, { onDelete: 'cascade' }).notNull(),
    name: varchar('name', { length: 255 }).notNull(),
    latitude: numeric('latitude', { precision: 10, scale: 7 }).notNull(),
    longitude: numeric('longitude', { precision: 10, scale: 7 }).notNull(),
    isDefault: boolean('is_default').default(false),
    createdAt: timestamp('created_at').defaultNow()
});

Preferences live in a JSON column rather than separate columns because they are always read and written as a unit. Adding a new preference (say, a pressure unit) means updating the TypeScript type and the default value. No migration needed.

API inputs are validated with Zod schemas at the route boundary. The preferences endpoint, for example, rejects anything that is not a valid unit combination:

const preferencesSchema = z.object({
    temperatureUnit: z.enum(['celsius', 'fahrenheit']),
    windUnit: z.enum(['mph', 'kmh'])
});

Invalid input gets a 400 before it reaches the database. Valid input is type-safe from that point forward.

The Takeaway

The weather data is free. WeatherAPI.com gives you temperature, wind, UV, humidity, visibility, pressure, forecasts, and alerts. Any developer can display that data in a grid.

The value is in the layer between the data and the user. The insight engine is only about 200 lines of code, but it is the reason the app is useful rather than just functional. It encodes the domain knowledge that most weather apps leave to the user: what UV 9 actually means for your skin, what 0.8 km visibility means for your drive, what rain at 7 AM means for your commute.

That pattern applies beyond weather apps. In most systems, the raw data is the easy part. The hard part is deciding what the data means for the person looking at it. The engineering that matters most is often not in the infrastructure or the framework. It is in the thin layer of logic that turns information into something someone can act on.

Alive to Guess Again

David Shortland — Tue, 24 Mar 2026 12:00:00 GMT

The previous posts in this series established a few principles. The cargo cult post argued that practices need reasons. The teacher and doer post explored what real understanding looks like. The pragmatist's razor argued that every decision, whether to follow a principle or deviate from it, needs a justification rooted in context.

But there's a problem with justification that I didn't address. You can justify almost anything if you're allowed to be vague enough. "We do standups because they improve communication." "We write tests because they improve quality." "We use microservices because they improve scalability." These sound like reasons. They have the shape of reasons. But they're missing something important.

Nobody is trying to prove them wrong.

Popper's Razor

Karl Popper was a philosopher of science who spent most of his career on a single question: what separates real science from things that merely look like science? His answer was falsifiability, but the idea goes deeper than most people realise when they first encounter it.

Popper wasn't just saying that theories should be testable. He was saying that science progresses by actively trying to destroy its own theories. You accept a theory provisionally, as the best available explanation, and then you do everything you can to break it. You don't test it in the easy cases. You test it at the extremes, in the conditions where it's most likely to fail. If it survives serious attempts at refutation, it earns its place. Not permanently, but for now. The moment it does fail, you discard it and move on.

The distinction matters. Proving gravity by dropping a ball is trivial. Everyone already knows the ball will fall. The real test is at the boundaries: near a black hole, at quantum scales, in the conditions where the theory might actually break down. Easy confirmations tell you nothing. Hard tests are where knowledge lives.

Popper's classic examples were astrology and certain readings of Freudian psychoanalysis. An astrologer can explain any outcome after the fact. If the prediction was wrong, there's always a reason: another planet was in retrograde, the birth time was imprecise, the subject wasn't receptive. The theory never fails because it can absorb any result. Contrast this with Einstein's general relativity, which made a specific, testable prediction about how light bends around massive objects. If the 1919 eclipse observations had shown no bending, the theory would have been wrong. That vulnerability is exactly what made it valuable.

Or as Popper put it: good tests kill flawed theories; we remain alive to guess again.

I encountered Popper through a recommendation from a mentor, and the moment I understood the argument, I started seeing unfalsifiable claims everywhere in software engineering. Worse, I started seeing them in my own work.

Falsifiable vs unfalsifiable practices

The Unit Test Problem

Here's something I did that taught me this lesson concretely.

I was working on a system and decided it needed better test coverage. This felt like an obviously good decision. Tests improve quality. Everyone knows this. So I went through the existing codebase and wrote unit tests for the code that was already there.

The tests passed. Coverage went up. It felt productive. But I was doing the equivalent of dropping a ball and confirming that gravity works. Every test I wrote verified that the code did what the code already did. I was looking at an implementation, understanding its behaviour, and then writing an assertion that confirmed it. These were easy confirmations. They tested the theory ("this code is correct") in the most comfortable conditions possible: the normal inputs, the happy path, the cases I already knew worked.

What I never did was try to break it. I never asked: "what are the boundary conditions where this logic might fall apart? What inputs would expose a flaw in my assumptions? What's the black hole for this function?" I was accumulating confirmations, not attempting refutations.

The coverage number looked good. But the test suite was unfalsifiable in practice. It couldn't fail in a way that told me anything I didn't already know. If a test broke, it was because someone changed the implementation, not because it caught a genuine behavioural problem. The tests were a mirror held up to the code, reflecting it back at itself.

What I should have done is what TDD actually intends: define the expected behaviour first, then write code to satisfy it, and critically, include the edge cases and boundary conditions where the behaviour might break. A test that says "when a driver completes a session, their lap times are ranked and the fastest is marked" is testing a business rule at its core. But the Popperian step is the next one: what happens when two lap times are identical? What happens when the session has zero laps? What about a session with one lap? Those are the hard tests. Those are the ones that kill flawed implementations.

In a small team, you can't afford to write tests for the sake of coverage. Every test should encode a business rule that, if violated, would cause a real problem. And the most valuable tests are the ones that test that rule in the conditions where it's most likely to break, not the ones that confirm it works in the easy case.

The Pattern Is Everywhere

Once I started looking for practices that had never survived a serious attempt at refutation, I couldn't stop finding them.

Standups. Most teams justify standups as "improving communication" or "keeping everyone aligned." These teams have never tried to falsify the claim. It's not enough to define what success looks like and then passively wait to see whether it happens. The Popperian approach is to actively look for failure. Ask the team: "Did anyone have a coordination problem this week that the standup should have caught but didn't? Did anyone sit through the standup already knowing everything that was said? Did anyone withhold a problem because the format didn't make it safe to raise?"

If you go looking for failure and can't find it, the practice has survived a genuine test. If you find failure immediately, you've learned something valuable. Either way, you know more than you did. But most teams never ask. The standup continues, provisionally accepted but never tested at the extremes. It becomes a ritual that cannot fail because nobody is trying to make it fail.

Code reviews. The justification is usually "catching bugs" or "knowledge sharing." But if you tracked what actually happens in your code reviews, you might find that 90% of comments are about formatting, naming, or style, and almost none catch logic errors. That's the easy test: "do reviews happen?" Yes. The hard test is: "has a code review ever caught a bug that would have reached production? How often? What kind of bugs?" If you go looking for that evidence and can't find it, the practice has been falsified. It's not doing what you claimed it does. Maybe it's doing something else that's valuable, but the original justification is dead and you should update it or drop the practice.

Retrospectives. Teams run retrospectives to "continuously improve." A serious attempt at refutation would be: pull up the action items from the last three retrospectives. How many were completed? How many led to a measurable change in how the team works? If the answer is "we don't track that," the practice has been insulated from failure. You've never tested it at the extremes. You've been dropping the ball and confirming that it falls.

Provisional Acceptance

There's a subtlety in Popper's thinking that changes how I approach all of this. He didn't say that unfalsified theories are "true." He said they're provisionally accepted. They've survived testing so far, and they're the best explanation available, but they could be overturned tomorrow by new evidence. This provisionality is the whole point. The moment you treat a practice as permanently justified, you stop testing it.

This is the difference between "we do standups because they work" and "we do standups because they've survived our attempts to find evidence that they don't work, and we'll keep looking." The first is a settled belief. The second is a living hypothesis. The first can't be wrong. The second invites being wrong, because being wrong is how you learn.

The same principle applies to architectural decisions, technology choices, team structures, deployment processes. All of them should be held provisionally. All of them should be subjected to the hardest tests you can find, not the easiest. And all of them should be discardable when the evidence turns against them.

Why This Is Hard

Unfalsifiable practices survive because actively trying to break your own processes is uncomfortable. If you define what failure looks like and then go looking for it, you might actually find it. That means admitting something isn't working, changing course, possibly having difficult conversations. It's much easier to keep the justification vague and the testing gentle.

Popper noticed the same dynamic in science. Unfalsifiable theories are popular because they're safe. They explain everything, predict nothing, and never require their proponents to change their minds. Falsifiable theories are dangerous. They put themselves on the line. But that danger is exactly what makes them capable of being useful.

The connection to the pragmatist's razor is direct. That post argued that every deviation from a principle needs a specific justification. This post adds: every justification needs to be tested at the extremes, not confirmed in the easy cases. And when a justification fails the test, you have to be willing to let it go. Good tests kill flawed practices. We remain alive to guess again.

What This Changed For Me

I approach testing differently now. Before I write a test, I ask: "what business rule does this encode, and what inputs would break it?" Not the happy path. The edge cases. The boundary conditions. The black holes. Coverage as a metric has become almost irrelevant to me. What matters is whether each test represents a genuine attempt to falsify the assumption that the code is correct.

More broadly, I've started treating every practice as a provisional hypothesis rather than a settled decision. Standups, reviews, architectural patterns: they're all theories about what works, and they all deserve to be tested seriously, not just confirmed gently.

I don't always get this right. The pull toward easy confirmation is strong, and it takes discipline to actively seek evidence that you're wrong. But I think that discipline is what Popper was really arguing for. Not just testability as a logical property, but a habit of mind: the willingness to try to break your own beliefs, and the honesty to update them when they break.

The Pragmatist's Razor

David Shortland — Tue, 17 Mar 2026 12:00:00 GMT

The cargo cult post established a principle: a practice is only justified if you can articulate the specific problem it solves in your context. If you can't, you're performing a ritual, not engineering. The teacher and the doer post went further into what that kind of understanding actually looks like, the difference between being able to apply a rule and being able to explain why it exists.

That principle is correct. But it's incomplete. It addresses one failure mode (applying practices without understanding) while leaving the opposite failure mode untouched: understanding a practice perfectly, and applying it in every context regardless of whether the problem it solves exists there.

This post argues that the same standard of justification that applies to adopting a practice also applies to how rigidly you apply it. And that the decision to relax a principle in a given context is not the absence of rigour. It is a higher form of it.

Two Failure Modes, Not One

The cargo cult post described a single failure: practices without understanding. But there are two distinct ways to misapply knowledge of a practice.

Failure mode one: applying a practice you don't understand. This is the cargo cult problem. You adopt microservices because successful companies use microservices. You don't know what problem microservices solve, so you can't evaluate whether you have that problem. The practice fails, and you don't know why.

Failure mode two: applying a practice you do understand, in a context where the problem it solves doesn't exist. This is different. You understand that microservices solve independent deployability at scale. You understand the trade-offs. But your team is five people, your deployment pipeline is simple, and you have no scaling pressure. You apply the practice anyway, because the principle says you should, and principles are principles.

The first failure is ignorance. The second is rigidity. They produce different symptoms but the same outcome: wasted effort and systems that don't serve their users well.

The cargo cult test asks: "can you explain why you're doing this?" The pragmatist's test extends it: "can you explain why you're doing this here, given these constraints?"

Defining Pragmatism

To reason about pragmatism precisely, it helps to distinguish it from two things it is often confused with.

Pragmatism is not recklessness. Recklessness means taking shortcuts without awareness of what you're giving up. A reckless decision is one where you can't name the trade-off, because you didn't consider that there was one.

Pragmatism is not the absence of principles. It is the application of an additional principle: that the value of any practice is conditional on context, and that context must be evaluated, not assumed.

A pragmatic decision has three properties:

You can name the principle you're choosing not to follow.
You can explain why the problem that principle addresses is either absent or less important than a competing concern in this specific situation.
You can describe the conditions under which you would revisit that decision.

If any of these is missing, the decision isn't pragmatic. It's either reckless (you can't name the trade-off) or arbitrary (you can't explain the reasoning).

The pragmatism spectrum

The Spectrum

This gives us three positions on a spectrum.

Recklessness is at one end. Decisions are made without reference to principles at all. Shortcuts are taken because they're faster, not because they've been evaluated. The question "what are we giving up?" is never asked.

Purism is at the other end. Principles are applied uniformly regardless of context. The question "does this problem exist here?" is never asked, because the principle is treated as unconditional rather than contextual.

Pragmatism sits between them. It requires more knowledge than either extreme, because you need to understand the principle (which the reckless engineer doesn't), and evaluate whether it applies (which the purist doesn't).

This is an important point: pragmatism is not the easy middle ground. It is the most demanding position. The purist can apply the same rules everywhere without thinking. The reckless engineer can ignore rules everywhere without thinking. The pragmatist has to think every time.

When to Hold and When to Relax

If pragmatism means "principles are conditional on context," you need a way to evaluate when the condition is met. Three factors matter.

The cost asymmetry of getting it wrong. Some principles protect against failures that are cheap to fix. Others protect against failures that are catastrophic. Input validation, authentication, and data integrity fall into the second category. The cost of applying these principles correctly is small. The cost of not applying them can be enormous. When the downside of relaxing a principle is high relative to the cost of following it, follow it. The asymmetry does the reasoning for you.

Whether the shortcut is local or structural. Some deviations from a principle affect one file, one function, one component. If the decision turns out to be wrong, you fix it in an afternoon. Other deviations create coupling between systems that compounds over time. Changing the database schema now requires changing the API, the frontend, and the deployment pipeline. The first kind of deviation is low-risk and often pragmatic. The second kind is high-risk and rarely pragmatic, because the cost isn't borne at the time of the decision. It's deferred, and deferred costs tend to grow.

Whether you can articulate the trade-off. This is the test that connects back to the cargo cult principle. In the previous post, I argued that a practice is only justified if you can explain the problem it solves. The same applies to deviations. A pragmatic deviation is one where you can say: "I'm choosing not to do X because the problem X addresses doesn't apply here, and applying it anyway would cost Y." If you can't articulate it that precisely, you're not being pragmatic. You're just skipping something because it's inconvenient, which is recklessness with a better vocabulary.

The Symmetry

This reveals a symmetry between the cargo cult problem and the purism problem that I didn't fully see when I wrote the first post.

The cargo cult test: every practice should have a specific, articulable reason for being followed.

The pragmatist's extension: every deviation from a practice should have a specific, articulable reason for being made.

These are the same test applied in opposite directions. Together they form a single standard: every engineering decision, whether to follow a principle or to deviate from it, requires a justification that references the specific context.

The cargo cult engineer fails the first test. They follow practices without reasons. The purist fails the second test. They refuse to deviate without acknowledging that reasons could exist. The pragmatist passes both.

Pragmatism as the Harder Skill

This framing explains why pragmatism is harder to develop than either purism or recklessness.

Recklessness requires no knowledge of principles. You just do what seems easiest.

Purism requires knowledge of principles, but not judgement about their applicability. You learn the rules and apply them. This feels rigorous, and it is, in the same way that applying a formula without checking whether the assumptions hold is rigorous. It is consistent without being correct.

Pragmatism requires knowledge of principles and the ability to evaluate their relevance to a specific context. You need to understand what problem the principle solves well enough to recognise when that problem is absent. This means understanding the principle more deeply than the purist does, not less.

The purist knows that you should separate concerns. The pragmatist knows why you separate concerns (because different rates of change in the same unit create cascading modifications), and can therefore identify situations where the rates of change are actually the same and separation would add complexity without benefit.

This is why I think pragmatism is better understood as a deeper engagement with principles rather than a looser one. The pragmatist doesn't care less about good engineering. They care enough to distinguish between the principle and the context that gives it value.

A Decision Framework

When I'm evaluating whether to follow or deviate from a principle, I try to answer three questions:

What specifically am I trading off? Not vaguely "code quality," but precisely. What principle, what property of the system, what future capability am I choosing to forgo or defer?

What is the cost if I'm wrong? If this turns out to be a mistake, how expensive is it to reverse? A decision that can be undone in an afternoon carries different weight than one that's baked into the architecture.

Can I explain this decision in six months? If I can't imagine justifying this to a colleague or to my future self, it probably isn't a reasoned trade-off. It's a shortcut dressed up as pragmatism.

If all three answers are clear, the decision is pragmatic whether it follows the principle or deviates from it. If any answer is vague, that's a signal to think harder before committing.

What This Means for the Series

The cargo cult post established that understanding is necessary for good engineering. This post adds that understanding is necessary but not sufficient. You also need the judgement to evaluate when and how strongly a principle applies.

The next question follows naturally: if every engineering decision requires a justification that references context, how do you know whether that justification is actually correct? It's not enough to have a reason. The reason has to be testable. That's where the series goes next.

The Teacher and the Doer

David Shortland — Sun, 15 Mar 2026 12:00:00 GMT

A few months ago, I started code reviewing for a junior developer on our team. I expected it to be straightforward: read the code, spot the issues, talk them through on a call. I'd been writing production software for a while. How hard could it be to explain what I already knew?

It turned out to be one of the most revealing experiences of my career so far. Not because the code was difficult, but because the act of teaching exposed gaps in my own understanding that I didn't know existed.

The Gap Between Doing and Explaining

There's a specific moment that stays with me. I was reviewing some code where the junior had written a service that mixed data fetching with business logic. I knew instinctively that this was wrong. I could feel it. If I'd been writing the code myself, I would have separated those concerns without thinking. It was automatic.

But when we got on a call to discuss it, I froze. "Because it's better" isn't feedback. "Because separation of concerns" is just naming the principle without explaining it. I needed to articulate the specific reason this separation mattered in this context: what would go wrong if we didn't do it, what it would cost us later, what it would make easier.

That moment taught me something important: there's a difference between being able to apply a rule and being able to explain why the rule exists.

The stages of understanding

Three Stages of Understanding

Through the experience of code reviewing, and many conversations with my mentor about this exact problem, I've come to think about software knowledge as moving through three stages.

Stage one: pattern recognition through instances. You see your mentor or a senior developer do something a certain way. Then you see them do it again in a different context. And again. Over time, your brain starts to recognise the pattern without anyone explicitly stating the rule. This is how my mentor taught me: not by lecturing about SOLID principles, but by showing me specific instances of applying them in real code. The learning was implicit. Here's how I structured this service, here's why I split this module, here's what I changed in this piece of code.

Stage two: unconscious competence. After enough instances, you can apply the pattern yourself. You write code that separates concerns, that keeps functions small, that names things well... but if someone asks you why, you struggle to articulate it beyond "it feels right" or "that's how it should be done." You've internalised the rule, but you can't externalise it. This is where most competent developers sit for years, and it's a perfectly functional place to be.

Stage three: teachable understanding. This is where you can not only apply the rule, but explain the principle behind it, describe the contexts where it does and doesn't apply, and generate new instances that illustrate it. You've moved from knowing-how to knowing-that. You can rationalise it, defend it, and critically, know when to break it.

The jump from stage two to stage three is what code reviewing forced on me.

The Feedback Loop

Here's where I originally had a simpler model in my head: you do first, then you read to understand what you did. Practice, then theory. But I've come to think that's too linear.

What actually happens is more like a feedback loop. You see instances and develop tacit knowledge. Then you read something, an article about dependency injection or a chapter on domain-driven design, and it clicks because you've already felt the problem it solves. That reading reshapes how you see the next instance. You apply the refined understanding, encounter a new edge case, go back to reading with a sharper question, and the cycle continues.

Reading without doing produces cargo cult understanding. You can recite the principles but you've never felt the pain they address. You know that "you should favour composition over inheritance" but you've never been burned by a deep inheritance hierarchy that made a simple change cascade through twelve files.

Doing without reading produces superstition. You know that something works, but you might attribute it to the wrong cause. You always write small functions because a senior once told you to, but you think it's about readability when it's actually about testability. The practice is correct but the mental model is wrong; wrong mental models eventually lead you to apply the rule in contexts where it doesn't help, or fail to apply it in contexts where it would.

The strongest developers I've observed alternate between the two rapidly. They try something, read about why it worked, try a variation, read a different perspective, and so on. The theory and the practice aren't sequential; they're interleaved, each one sharpening the other.

What Code Reviewing Taught the Reviewer

The irony is that reviewing code for a junior developer pushed me from stage two to stage three on several concepts I thought I already understood.

When I had to explain why we inject dependencies rather than instantiate them directly, I realised my own understanding of dependency injection was more mechanical than principled. I knew the pattern, but articulating the specific benefit (that it makes the dependency relationship explicit and the component testable in isolation) required me to think about it more carefully than I ever had when just writing the code.

When I had to explain why a certain function should be extracted, I couldn't just say "it's too long." I had to identify the specific reason: this function is doing two things with different rates of change, and when one changes, the other shouldn't have to.

Each of these explanations forced a precision of thought that writing code alone never demanded. The junior's questions were the best forcing function I'd encountered. Not because they were sophisticated, but because they were honest. "Why?" is the most powerful question in software development, and it's the one we stop asking once we reach unconscious competence.

The Implication for How We Teach

This model has practical consequences for how we should structure learning in software teams.

Don't start with the textbook. If someone hasn't felt the pain of tightly coupled code, explaining the dependency inversion principle is just noise. It'll sound theoretically correct and practically meaningless. Instead, let them write the tightly coupled code, let them experience the change that cascades everywhere, and then show them the principle. The learning sticks because it has a hook to attach to.

Don't stop at the doing, either. A team that only learns through osmosis, watching seniors and picking up habits, will develop capable practitioners who can't explain their decisions. That's fine until they need to make a decision in unfamiliar territory, where there's no pattern to match against. That's when the rationalised understanding matters.

Create opportunities for stage-three learning. Code reviews are the obvious one, but there are others: pair programming where the more experienced person narrates their thinking, architecture decision records where you have to write down why you chose an approach, and team discussions where practices are questioned rather than assumed.

The Test

Here's a test I now apply to myself: for any practice I follow, can I explain not just what I do, but why I do it, and when I would stop doing it?

If I can only say what ("I write unit tests") then I'm at stage one. If I can say why ("because they let me refactor with confidence") then I'm getting to stage three. And if I can say when I'd stop ("when the cost of maintaining the tests exceeds the confidence they provide, which happens with highly volatile UI code") then I'm there.

The junior developer doesn't know it, but their code reviews have been a great learning experience for me. Not because they taught me new techniques, but because they forced me to understand the ones I already had.

Cargo Cult Software Engineering

David Shortland — Sat, 14 Mar 2026 12:00:00 GMT

Last year, a mentor of mine recommended I read Richard Feynman's 1974 commencement address at Caltech. It's about what Feynman called "cargo cult science," and it changed how I think about work.

During the Second World War, Pacific islanders had watched military planes land on improvised airstrips, delivering cargo: food, equipment, supplies. After the war ended and the planes stopped coming, some islanders built replica runways out of bamboo, lit signal fires, and carved wooden headphones to wear while sitting in control towers they'd built from straw. They'd replicated the form perfectly. But no planes came.

The form is perfect. But no planes land.

Feynman's point wasn't about the islanders. It was about scientists who follow the rituals of scientific inquiry (the conferences, the papers, the methodology sections) without the intellectual honesty that makes science actually work. They do everything that looks right. But the planes don't land.

I think about this regularly in software engineering.

The Rituals We Perform

Watch a typical software team and you'll see rituals everywhere. Daily standups where everyone recites what they did yesterday without anyone actually listening. Sprint retrospectives that produce "fugazi" action items nobody follows up on. Code reviews that check formatting but not logic. Architecture decision records that get written after the decision is already made. Story points that get reported up to management as if they were units of measurement.

Each of these practices exists because someone, somewhere, did it for a real reason and it worked. Standups originated in teams that genuinely needed to coordinate across dependencies every morning. Retrospectives were invented by teams that took continuous improvement seriously. Code reviews catch real bugs... when the reviewer actually reads the code.

But when you adopt the practice without understanding the underlying reason, you get cargo cult engineering. You get the bamboo runway. You get the ritual without the result.

Why This Happens

The pattern is predictable. A successful company publishes a blog post about how they work. "This is how Spotify organises engineering teams." "This is how Google does code review." "This is how Netflix handles deployments." The industry reads it and copies the form: the squad model, the review checklist, the deployment pipeline.

What they don't copy is the context. Spotify's squad model emerged from specific scaling challenges with specific people and specific technical constraints. Google's code review culture is embedded in decades of institutional knowledge and tooling. Netflix's deployment confidence comes from years of investment in chaos engineering and observability.

Lifting a practice from one context and dropping it into another without understanding why it works is exactly the cargo cult problem. You've built the runway. But the planes aren't coming because the planes were never about the runway; they were about the logistics network, the supply chain, the war effort behind them.

What It Looks Like in Practice

I see this most clearly with Agile. The Agile Manifesto was written by people who valued individuals and interactions over processes and tools. Twenty years later, "being Agile" mostly means buying Jira licenses and having a certified Scrum Master run your ceremonies. The form is immaculate. The substance (the willingness to adapt, to communicate honestly, to deliver working software frequently because you care about the outcome) is often completely absent.

Microservices are another example. Amazon and Netflix decomposed their monoliths into services because they had specific scaling and organisational problems that monoliths couldn't solve. They did it gradually, painfully, over years. But the industry cargo-culted the result: "successful companies use microservices, therefore we should use microservices." Teams of five people split their simple CRUD application into twelve services, added a message broker, a service mesh, and distributed tracing, then spent the next year debugging network issues that didn't exist when it was one application.

The same thing happens with infrastructure as code, with test-driven development, with domain-driven design, with every practice that has a name. The name makes it easy to adopt the form. The understanding is the part that takes effort.

Feynman's Antidote

Feynman's prescription was simple: intellectual honesty. He called it "a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty." The first principle is that you must not fool yourself, and you are the easiest person to fool.

In software terms, this means asking uncomfortable questions. Not "are we doing standups?" but "are our standups actually helping us coordinate?" Not "do we have microservices?" but "are our service boundaries in the right places, and how would we know if they weren't?" Not "are we Agile?" but "when was the last time we changed our process because it wasn't working?"

It means being honest about what you don't understand. If you're adopting a practice because someone you respect recommended it, that's fine, but you should know that's what you're doing, and you should be watching for signs that it's not working in your context.

It means measuring outcomes, not activities. The team that ships reliable software and responds quickly to user needs is engineering well, regardless of whether they have sprints or standups or story points. The team that performs every ceremony perfectly but ships late and buggy has built a beautiful bamboo runway.

The Practices I've Kept

I'm not arguing against practices; I'm arguing against unreflective adoption.

I try to ensure that every practice I follow has a specific, articulable reason. If I can't explain why I do something, that's a signal that I might be cargo-culting it.

The Test

Here's a simple test for whether a practice is genuine or cargo cult in your team: can the people performing it explain why they're doing it, in terms of the specific problem it solves for them?

Not "we do standups because Scrum says so." Not "we use microservices because that's the modern architecture." Not "we write tests because best practices."

But: "We do a quick sync each morning because the data pipeline team and the frontend team keep stepping on each other's database migrations, and this catches it before it becomes a merge conflict." That's a real reason. That's a practice that solves a problem the team actually has.

If you can't articulate the reason, you have two options: figure out the reason, or stop doing the thing. Both are better than continuing to carve wooden headphones and waiting for planes that aren't coming.

Building a Portfolio That Practices What It Preaches

David Shortland — Fri, 13 Mar 2026 12:00:00 GMT

Most developer portfolios are static sites. There's nothing wrong with that. If all you need is a page that says "here's my work," a static site does the job. But I wanted this site to be the work itself. Every decision in the stack is deliberate, and together they reflect how I think about building software.

This post isn't really a deployment guide. It's about the principles behind the decisions, and a particularly stubborn bug that tested all of them.

Start with the Constraint, Not the Tool

The first question wasn't "what framework should I use?" It was "what are the constraints?"

I wanted server-side rendering for fast first loads and proper SEO. I wanted infrastructure I wouldn't have to babysit. I wanted zero ongoing cost when nobody's visiting. And I wanted the deployment process to be simple: push code, walk away.

Once you define the constraints clearly, the architecture almost designs itself. SSR means a server. Zero cost at idle means serverless. Push-and-forget means a CI/CD pipeline. The tools (Angular, Lambda, CDK, CloudFront) are just the implementations. They could be swapped out and the principles would hold.

This is something I've learned working on production systems in my day job: start with the problem, not the technology. The teams that pick tools first and then try to fit their problem into them always end up fighting the architecture later.

The Architecture

Request flow diagram

The system has two paths for serving content. Static assets, including JavaScript bundles, CSS, images, and fonts, are served directly from S3 via CloudFront. Everything else hits a Lambda function running the Angular SSR server.

This separation matters. Static assets are immutable after deployment: they have content hashes in their filenames and get cached for a year. The SSR responses are dynamic because they render the page on every request, which means the HTML always reflects the latest build. CloudFront sits in front of both, handling HTTPS termination and edge caching.

The philosophy here is separation of concerns applied to infrastructure. The same principle that says "don't put business logic in your controller" also says "don't serve static files through your application server." Each component does one thing well.

Lambda Web Adapter is what makes the serverless SSR work. It's an AWS-provided layer that wraps any HTTP server (Express, Fastify, or whatever) and handles the Lambda invocation lifecycle. From the application's perspective, it's simply a normal Express server listening on port 8080. The adapter translates between Lambda's event model and HTTP. This is a good abstraction because the application code doesn't know or care that it's running on Lambda.

The Pipeline: Systems Should Maintain Themselves

Deployment pipeline diagram

The deployment pipeline is self-mutating. If I change the pipeline definition itself, for example adding a build step or modifying the deployment order, it updates itself before deploying the application. The only manual cdk deploy I ever ran was the initial bootstrap.

This is a principle I care about deeply: a system should be capable of maintaining itself. If deploying a change to your deployment process requires a manual process, you've created a recursive problem. CDK Pipelines solves this elegantly because the pipeline is simply another piece of infrastructure defined in code.

The pipeline watches two repositories: the infrastructure repo (CDK stacks) and the web app repo (Angular). A push to either triggers a full build and deploy. The synth step builds the Angular app, generates the blog content from markdown, synthesises the CloudFormation templates, and the pipeline takes it from there.

There's a broader philosophy here about infrastructure as code that goes beyond version control. When your infrastructure is code, it's reviewable, testable, and reproducible. If I deleted every AWS resource tomorrow, a single cdk deploy would recreate the entire stack identically. That's not just convenient. It means the infrastructure is documented by its own existence. There is no wiki page that is three months out of date describing what's deployed where.

Domain-Driven Thinking Beyond the Backend

The project structure follows domain-driven design principles, even though it's a frontend application. The codebase is organised around business concepts such as features/hero, features/experience, and features/skills, not technical layers like components/, services/, or pages/.

This matters more than it might seem. When I need to change how the experience section works, I go to features/experience/ and everything I need is there. I'm not hunting across five different folders to find the component, its service, its model, and its tests. The code is organised around what it does, not what it is.

The same principle applies to the infrastructure. Each CDK stack has a single responsibility. DnsStack manages the hosted zone. CertificateStack handles TLS. WebStack composes the application layer. They depend on each other explicitly through typed props rather than through hardcoded ARNs or naming conventions.

This is how I structure all the systems I work on. The domain drives the architecture, which means there are clear boundaries. When the requirements change (and they always do), the boundaries tell you exactly where the change needs to happen.

The Bug That Tested Everything

After the first successful deployment, the site returned a 400 Bad Request:

URL with hostname "xxx.lambda-url.eu-west-2.on.aws" is not allowed.

Angular 21.2.2 had introduced SSRF protection as part of a CVE fix. The AngularNodeAppEngine validates the Host header against an allowlist, and the Lambda function URL hostname wasn't on it.

This is where debugging philosophy matters. The temptation with a cryptic error is to start changing things at random, adding an environment variable here or trying a different config format there. I've watched teams burn hours this way. The disciplined approach is to understand the system before you try to fix it.

So I traced the request path. CloudFront receives the request with Host: davidshortland.dev. It forwards it to the Lambda function URL but replaces the Host header with the Lambda URL hostname. This is standard CloudFront behaviour for function URL origins. Lambda Web Adapter passes this to Express, which passes it to Angular's SSR engine. Angular checks the Host header against its allowlist. The Lambda hostname is not there. Result: 400.

Once you understand the flow, the fix becomes obvious: tell Angular about the Lambda hostname. However, the implementation of that fix had its own subtlety.

I tried three approaches that didn't work:

Setting NG_ALLOWED_HOSTS as a Lambda environment variable.

This seemed like the right approach because it is documented. However, Angular 21.2.2 reads this at build time and bakes it into the SSR manifest. A runtime environment variable is too late.

Passing allowedHosts in the AngularNodeAppEngine constructor.

The API accepts it, but the build-time manifest takes precedence. The constructor options are additive rather than overriding, and the manifest was empty.

Using dot-prefix patterns in angular.json.

Close, but incorrect syntax. Angular uses *.example.com wildcard notation, not .example.com.

The fix was the allowedHosts array in angular.json under security, using wildcard patterns:

{
  "security": {
    "allowedHosts": [
      "localhost",
      "davidshortland.dev",
      "www.davidshortland.dev",
      "*.lambda-url.eu-west-2.on.aws",
      "*.cloudfront.net"
    ]
  }
}

The key insight: this configuration is baked into the build output. It is not a runtime setting. Each failed attempt required a full pipeline cycle to test: push, build, deploy, check. This is where the self-mutating pipeline proved useful, since at least I did not have to manually deploy each attempt.

The lesson is not about Angular configuration. It is about the value of tracing a problem through the entire system before reaching for solutions. Understand first, then fix.

The Cold Start Tradeoff

Lambda functions have cold starts. The first request after a period of inactivity takes longer because AWS needs to initialise the runtime. For this site, a cold start adds roughly 2 to 3 seconds to the first request.

I'm comfortable with this tradeoff, and here's why: optimise for the common case, not the edge case.

The common case for a portfolio site is that nobody is visiting. I would rather pay zero pounds during those idle hours and accept a slightly slower first load than run a t3.micro 24/7 for instant responses to traffic that does not exist. Once the function is warm, subsequent requests are fast, typically 100 to 200 ms for a full SSR render.

If this were a high-traffic application, the calculus would be different. Provisioned concurrency or a container-based deployment would make more sense. Applying high-traffic patterns to a low-traffic site is a common mistake. It is over-engineering: spending complexity on a problem you do not actually have.

This connects to a broader principle: every architectural decision has a context. There is no universally correct answer to "should I use serverless?" The answer is always conditional, depending on what you are building, for whom, and under what constraints. Developers who insist that one approach is always right are usually the ones who have not worked across enough different problems.

Iterative Delivery Over Big Bang Releases

The site was not built in one go. It was deployed to production within hours of starting, initially just a working SSR page with the basic structure. Features were added incrementally: the telemetry gauge animations, the blog system, security headers, analytics. Each change was a small, deployable unit.

It is agile as a mindset. The pipeline enables it because pushing a small change to production takes minutes rather than hours. When the cost of deployment is near zero, you naturally gravitate toward smaller, more frequent changes. When deployment is painful, you batch changes together, which increases risk and makes debugging harder.

The Stack

For anyone building something similar:

Angular 21 with @angular/ssr and outputMode: server
Express 5 via AngularNodeAppEngine
Lambda Web Adapter layer (ARM64), which wraps Express as a Lambda function
CloudFront with dual origins: S3 for static assets, Lambda Function URL for SSR
CDK with a self-mutating CodePipeline watching two repositories
TailwindCSS v4 via @tailwindcss/postcss (Angular's built-in support does not fully handle v4 syntax)
Blog system built from markdown files processed at build time into bundled JSON

The total infrastructure cost for a low-traffic site is effectively zero, comfortably within the AWS free tier.

More important than the specific tools, though, is understanding why you are choosing each one. If you cannot articulate the principle behind a decision, you probably have not made the decision yet. You have simply defaulted to something familiar. Familiar is not always the right choice.