Claude 4
Claude 4 holds a multi-hour task and the unit of work jumps from a function to a feature — but lean on it and the real gap shows: it has no memory of what we already trust
I gave it the whole thing on Wednesday. Not a function, not a tidy snippet to paste in — the whole feature. The booking flow, the forms, the awkward edge cases I usually keep for myself because explaining them takes longer than writing them. I described the outcome, went to make coffee, and came back to something that mostly held together.
That is the part that still surprises me. It held together.
Anthropic shipped Claude 4 yesterday — Opus and Sonnet — and the headline is not that it scores higher on some chart. It is that it stays on its feet through a long run. Hours, not minutes. It can carry a task without losing the thread, without forgetting the decision it made forty steps back. The unit of work just grew. Last year I handed these things functions. Now I hand them features and walk away.
So I keep handing it more, the way you find the limit of anything — by pushing past where you expected it to stop.
And the limit it hits is not intelligence. It is memory. I asked it to wire up a date picker, and it wrote one from scratch — a clean one, a fine one — with no idea that three folders over sits the date picker we already use, the one we fixed and re-fixed and trust. It cannot see what we have. So it invents. Confidently, freshly, every time.
I felt that clearly this week. The smarts are no longer the scarce thing. What is scarce is everything around the smarts — what the model can lean on, what it can verify, what it is allowed to reuse instead of conjuring again.
Smarter hands, no shared memory of the work. That gap is the whole game now. I keep circling it.