William Zhang

Vibing with Compilers: Part 1

Earlier this month, Anthropic's Claude C Compiler went viral. The choice of Rust was surprising since I was slowly writing my own C Compiler in Rust at around the same time. But my pace and persistence were far below that of Anthropic, since my access to coding agents is much more limited.

The part that stood out the most was how underwhelming Anthropic's final result was relative to the high cost of $20,000 in tokens (not to mention the researchers' salaries) and the agents' continuous "time commitment" of two weeks.

Coincidentally, my classmate told me that GitHub Copilot Pro was free for students. So I claimed a free subscription, and heavily ramped up the pace of work on my own compiler project.

However, due to Copilot's inevitable rate limiting and my busy daily schedule, my personal time commitment of acting as the agent's supervisor remained constrained to occasional late evenings. Sometimes, I found reviewing the code and trying out all the features so addictive that I was relying on the rate limit to take a break :)

Previously, I had studied compilers via my own reading at a shallow level, so this project was not only my first at a large scale, but also allowed me to deep dive under the hood of a working compiler, and "improve" it myself.

The ultimate goal is to make it compile a fully working Linux distribution along with other big open-source projects, all at a personal monetary cost of $0. At the same time, I expect the generated assembly to be as small as possible and run as fast as possible.

The current state of my C Compiler is that it can compile the vast majority of C features, including Hello World (which CCC allegedly fails), to an x86 backend, in both Windows and Linux (WSL-tested) environments. It can handle arbitrary headers, signal handlers, and multithreaded programs. However, to make the project easier, I offloaded the work of the preprocessor, assembler, and linker entirely to GCC, similar to Anthropic's project.

The interface is actually very user-friendly. Go try it out!

Download the release executable corresponding to your platform (Windows or Linux), and point it to any number of C files. You can also specify the output object name with the -o argument, similar to GCC.

Development Flow: Since my project involved a single agent (running mostly on Claude Sonnet 4.5, but sometimes switching to Gemini) rather than an agent team, I planned the next features at a very fine-grained level, using several experimental heuristics for my decision-making.

The choice of the next feature was guided by three main objectives: maximizing C language coverage, maximizing efficiency of the generated assembly, and ensuring code cleanliness and modularity. I mostly alternated between the first two and only addressed the third when files got too big. This alternation was meant to avoid two bad outcomes: ending up with a spaghetti-structured compiler that covered all features of C but could never generate fast code, or building a compiler that was over-optimized for a tiny subset of C, making it difficult to extend later.

Writing Tests: Copilot naturally figured out that every single feature needed corresponding tests. After a slight nudge from me, it organized all integration tests into one folder and made it possible to run them all in one command. This quickly proved to be invaluable. As the number of features ballooned, so did the size of the integration tests and the difficulty of extending the compiler without breaking at least one existing feature. So I made it imperative that in each of my prompts, I would tell Copilot to run all integration tests after each small change, allowing it to quickly recover from regressions.

Moreover, when multiple integration tests failed, Copilot would often get tangled up in deciding which failures to address. Hence, I made integration test runs halt on the first failure to direct the agent's focus on single tasks. This short feedback loop enabled full autonomy, which allowed me to open the agent in the morning, assign it a major task, let it figure everything out while I was in class, and then come back for lunch with everything finished and working.

Ensuring Assembly Efficiency: This one required the most human input, albeit still not much. The code optimization was where Copilot/Claude ended up being the most "stubborn" at introducing new features, often requiring manual prompting rather than higher-level directives to optimize the code. For example, graph-coloring-based register allocation and mem2reg IR optimizations had to be introduced via prompt. The temporary solution was the classic "break it down into micro-tasks" method for prompting the agent.

I also used GCC as an oracle for efficiency, but in the rather crude manner of pasting various cases of C code into the Compiler Explorer, pasting GCC-generated assembly from Godbolt into the Copilot prompt, and instructing the agent to "figure out" how to match GCC assembly. This approach proved to be beneficial, as Copilot would then remind itself of additional optimizations to add. While this has been very effective at improving runtime performance, Copilot doesn't fully follow through when asked to reduce assembly size to that of GCC.

To measure the runtime speed of the compiler's assembly, I created a benchmark folder to measure my compiler's performance relative to non-optimized GCC (i.e. GCC -O0), which Anthropic says the CCC failed at. Thankfully, after only some lightweight prompts and my dual-objective development flow, it was quite easy to match or slightly exceed that. There is, in fact, still a great deal of potential to reach the performance of GCC -O3, which will be explained in subsequent blog posts.

In the end, this project has really put the capabilities of a single human + single-agent team to the test. So far, the results have been surprisingly effective: the majority of the heavy lifting is already done, and at basically zero personal cost. The next few steps and beyond will raise the bar even higher and push the limits of what can realistically be achieved.