BarraCUDA Open-source CUDA compiler targeting AMD GPUs

An open-source CUDA compiler that targets AMD GPUs, with more architectures planned. Written in 15,000 lines of C99. Zero LLVM dependency. Compiles .cu files straight to GFX11 machine code and spits out ELF .hsaco binaries that AMD GPUs can actually run.

This is what happens when you look at NVIDIA's walled garden and think "how hard can it be?" The answer is: quite hard, actually, but I did it anyway.

note: if youre here to test out my current tenstorrent implementation youll have to clone that respective branch :-)

Takes CUDA C source code, the same .cu files you'd feed to nvcc, and compiles them to AMD RDNA 3 (gfx1100) binaries. No LLVM. No HIP translation layer. No "convert your CUDA to something else first." Just a lexer, a parser, an IR, and roughly 1,700 lines of hand-written instruction selection that would make a compiler textbook weep.

Every single encoding has been validated against llvm-objdump with zero decode failures. I didn't use LLVM to compile, but I did use it to check my homework.

The following CUDA features compile to working GFX11 machine code:

No LLVM required :-)

All data structures use pre-allocated fixed-size arrays. No malloc in hot paths. No recursion. Bounded loops everywhere. The kind of code that would make JPL's coding standards committee nod approvingly before going back to landing things on Mars.

Being honest about limitations is important. Here's what's missing:

None of these are architectural blockers. They're all "haven't got round to it yet" items.

14 test files, 35+ kernels, ~1,700 BIR instructions, ~27,000 bytes of machine code:

Fix the known gaps: compound assignment operators, bare unsigned, integer literal suffixes, const, parameter reassignment. These are all small parser/lowerer changes. The goal is to compile real-world .cu files without modifications.

The generated code works but isn't winning any benchmarks. Priorities:

The IR (BIR) is target-independent. The backend is cleanly separated. Adding a new target means writing a new isel + emit pair. Candidates:

If you're considering writing your own AMDGPU backend, here are the things that will ruin your afternoon:

All 1,735 lines of amdgpu_emit.c are a testament to reading those pages so you don't have to.

Found a bug? Want to discuss the finer points of AMDGPU instruction encoding? Need someone to commiserate with about the state of GPU computing?

zanehambly@gmail.com

Open an issue if theres anything you want to discuss. Or don't. I'm not your mum.

Based in New Zealand, where it's already tomorrow and the GPUs are just as confused as everywhere else.

Apache 2.0. Do whatever you want. If this compiler somehow ends up in production, I'd love to hear about it, mostly so I can update my LinkedIn with something more interesting than wrote a CUDA compiler for fun.

Alex Chen