Things I’m building
A short list. Mostly hardware/software boundary work.
- pccxactive
A parallel compute core executor for edge FPGAs: custom ISA, INT8 systolic array, runtime queues, and a Python-facing driver stack.
why it matters · It lets me study edge LLM inference behavior: memory movement, kernel shape, and driver overhead rather than MAC count alone.
- pccx-labactive
Visual performance profiler and pre-RTL simulator for the pccx NPU.
why it matters · Hardware needs good software tooling to be debuggable. This bridges the gap between Verilog waveforms and high-level execution graphs.
- llm-bottleneck-labactive
A compact LLM serving/reference stack with Python runtime pieces, C++ kernels, and KV-cache experiments.
why it matters · It gives me a software baseline before moving an optimization down into FPGA kernels.
- NPU-FPGA-Transformer-Accelerator-KV260wip
Transformer inference IP on AMD Kria KV260: systolic GEMM plus small special-function units for operations around attention and normalization.
why it matters · It pushed me from "model acceleration" into memory hierarchy, scheduling, and runtime design.
- driver-drowsiness-detectionarchived
An undergraduate latency-focused computer vision project using facial landmarks and a small model.
why it matters · It was the first project that made me care more about end-to-end latency than benchmark accuracy.