"Since there are still people talking about "stochastic parrots" that…

"Since there are still people talking about "stochastic parrots" that merely copy from their training data, let me quickly say that we've long moved beyond LLMs.

Modern systems like Claude Code are closed-loop systems "grounded" in a hard reality: compilers, unit tests, linters, benchmarks, mathematical checkers, and other evaluators. Coding agents donât just âguessâ code. They write it, run it, observe failures, and iterate. In other words, they have something older ML systems lacked: tight feedback loops that let them reliably converge on working code rather than a plausible-sounding one.

And once you wrap an LLM in a search-and-evaluate harness (like evolutionary or population-based optimization), you get something even more important: systematic exploration. At that point, the model isnât "recalling" a solution so much as proposing candidates that are filtered by reality. The LLM supplies strong priors (intuition) that prune the search space, while the harness supplies the reasoning pressure by checking what actually works.

This is why systems in the AlphaEvolve family are a big deal: they can generate candidate programs, measure them, keep improvements, and repeat. Google explicitly stated that

AlphaEvolve was used to optimize the code for the very TPUs (AI chips) and kernels used to train the Gemini models themselves. The AI literally optimized its own "brain" and "nervous system." The novelty comes from the interaction between a generative prior and an objective evaluator, not from magical memorization. The intelligence emerges from the system, not just the large language model.

Furthermore, when a system like AlphaEvolve explores a problem and finds a novel solution, the traces are synthetic data that can be fed back into the training loop to make the base model inherently smarter. In agentic synthetic training, the model learns from a curated set of perfect trajectories. The system records every "thought" (Chain of Thought) and every tool call the agent made to get there. Claude Code generates millions of clean, verified coding traces that are almost certainly used to make "Claude 5" better at coding out of the box.

P.S. Also, don't forget that compression is intelligence. If you compare the size of the training data to the size of the weights of the model, you get huge compression ratios (320:1 for Llama 3). To fit all of human knowledge into a 140-gigabyte "container," the model cannot simply copy-paste. Whatever LLMs are doing, it is overwhelmingly compressed abstraction, not simple storage of everything theyâve seen."

-- Alexander Kruel