Projects | hkimw

Skip to main content

01work

Things I’m building

A short list. Mostly hardware/software boundary work.

pccxactive· SystemVerilog / C++ / Python
A parallel compute core executor for edge FPGAs: custom ISA, INT8 systolic array, runtime queues, and a Python-facing driver stack.
why it matters · It lets me study edge LLM inference behavior: memory movement, kernel shape, and driver overhead rather than MAC count alone.
source·releases
pccx-labactive· Rust / TypeScript
Visual performance profiler and pre-RTL simulator for the pccx NPU.
why it matters · Hardware needs good software tooling to be debuggable. This bridges the gap between Verilog waveforms and high-level execution graphs.
source·releases
llm-bottleneck-labactive· Python / C++
A compact LLM serving/reference stack with Python runtime pieces, C++ kernels, and KV-cache experiments.
why it matters · It gives me a software baseline before moving an optimization down into FPGA kernels.
source·releases
NPU-FPGA-Transformer-Accelerator-KV260wip· SystemVerilog
Transformer inference IP on AMD Kria KV260: systolic GEMM plus small special-function units for operations around attention and normalization.
why it matters · It pushed me from "model acceleration" into memory hierarchy, scheduling, and runtime design.
source·releases
driver-drowsiness-detectionarchived· Python
An undergraduate latency-focused computer vision project using facial landmarks and a small model.
why it matters · It was the first project that made me care more about end-to-end latency than benchmark accuracy.
source·releases