Jack Puccini
Good at starting things, working on the finishing part.
AI · Product & zero-to-one · Reasoning & decisions
Recent Writing
- Mar 17, 2026 · 4 min readAI
Managing Agents, Not Programming Them
Reflections on designing agent systems by managing intelligent actors rather than prescribing rigid workflows.
- Mar 4, 2025 · 10 min readSoftware
Mutability and Immutability in Python
Understanding mutability and immutability in Python
- Dec 30, 2024 · 3 min readOther
Jane Street Puzzle - December 2024
My solution to Jane's street December puzzle, 'Games Night'
- Nov 28, 2024 · 8 min readOther
Jane Street Puzzle - November 2024
My solution to Jane's street November puzzle, 'Besides the Point'
- Aug 3, 2024 · 23 min readChess
The Axiom System – Part 4: Justification in Chess
Why no set of meta-rules can rank chess principles across every position, and what that impossibility means for how we justify our moves.
About
I get distracted by how things work. I'm wrong a lot, and mostly just try to be wrong faster than last time.
Everything has structure. Finding it lets you see the same thing through different lenses — a system, a product, a market, a decision — and I try not to respect the borders between them.
Recent Thoughts
Notes to myself, made public.
- Jun 21, 2026
If we assume AI will lack 'taste' for some time going forward (think: does this marketing copy read like slop, is this generated image cheesy), then we'll need systems to compensate. You can give skills and optimise prompts to refine what the model does at the generation stage, but my feeling is that investment in the review stage matters more.
A composer doesn't immediately produce their ideal melody. They try many, and it's their expert judgement that selects from the candidates. Correcting taste might work the same way: an adequately proficient reviewer, working in tandem with the generator, yields a better eventual output. The system would involve finetuning the reviewer - not necessarily in the technical weights sense - to align its judgement with a reference standard, presumably an expert human who carries that judgement.
There's much to discuss here:
• How much signal is actually present in the human judgement? If the human judged twice, what would the correlation between trials be? This upper-bounds the potential ability of any reviewer.
• What methodologies best suit this analysis? Pairwise comparisons and Bradley-Terry?
• What if the output requires a translation layer? For example, if the thing to be judged is a video production, AI can't adequately judge it as a whole (for now) - it must first be parsed into some LLM-understandable structure, which itself may be lossy.
Anyway - these were supposed to be short. Fuller post later (hopefully). - Jun 21, 2026
Cool, so I send the message in the thoughts slack channel and it pops up on the website…..now I better write some more so that this one gets buried
- Jun 21, 2026
Hello World!