Jack Puccini
Good at starting things, working on the finishing part.
AI · Product & zero-to-one · Reasoning & decisions
Recent Writing
- Jun 23, 2026 · 6 min readAI
My Current 'AI Stack' for 'Productivity'
The tools I actually use day to day - dictation, screenshots, coding agents, and a writing pipeline - and an honest take on whether any of it makes me faster.
- Mar 17, 2026 · 4 min readAI
Managing Agents, Not Programming Them
Reflections on designing agent systems by managing intelligent actors rather than prescribing rigid workflows.
- Mar 4, 2025 · 10 min readSoftware
Mutability and Immutability in Python
Understanding mutability and immutability in Python
- Dec 30, 2024 · 3 min readOther
Jane Street Puzzle - December 2024
My solution to Jane's street December puzzle, 'Games Night'
- Nov 28, 2024 · 8 min readOther
Jane Street Puzzle - November 2024
My solution to Jane's street November puzzle, 'Besides the Point'
About
I get distracted by how things work. I'm wrong a lot, and mostly just try to be wrong faster than last time.
Everything has structure. Finding it lets you see the same thing through different lenses — a system, a product, a market, a decision — and I try not to respect the borders between them.
Recent Thoughts
Notes to myself, made public.
- Jun 23, 2026
Using ‘quietly’ as adverb = Opus 4.8 used to produce the writing.
I hesitate to call it slop, since I generally try to care about the ideas rather than the source, but man it's distracting. - Jun 22, 2026
Loops produce slop when the inadequate AI judgement accumulates over time.
Hence why it's important to inject a relatively deterministic and well-defined standard or similar into the process. Might be best for data science type problems where you optimize against some evaluation metric, or coding tasks with defined pass criteria (passes test suite, latency thresholds etc.) - Jun 21, 2026
If we assume AI will lack 'taste' for some time going forward (think: does this marketing copy read like slop, is this generated image cheesy), then we'll need systems to compensate. You can give skills and optimise prompts to refine what the model does at the generation stage, but my feeling is that investment in the review stage matters more.
A composer doesn't immediately produce their ideal melody. They try many, and it's their expert judgement that selects from the candidates. Correcting taste might work the same way: an adequately proficient reviewer, working in tandem with the generator, yields a better eventual output. The system would involve finetuning the reviewer - not necessarily in the technical weights sense - to align its judgement with a reference standard, presumably an expert human who carries that judgement.
There's much to discuss here:
• How much signal is actually present in the human judgement? If the human judged twice, what would the correlation between trials be? This upper-bounds the potential ability of any reviewer.
• What methodologies best suit this analysis? Pairwise comparisons and Bradley-Terry?
• What if the output requires a translation layer? For example, if the thing to be judged is a video production, AI can't adequately judge it as a whole (for now) - it must first be parsed into some LLM-understandable structure, which itself may be lossy.
Anyway - these were supposed to be short. Fuller post later (hopefully).