[Paper] Attention Is All You Need
This text contains the core concepts and mathematical principles of the Transformer model architecture.
This text contains the core concepts and mathematical principles of the Transformer model architecture.
This document is a note organizing the architecture and training process of the GPT-1 paper by combining mathematical definitions with intuitive interpretations.
Cherry blossoms are starting to bloom — and so is this new site.
I finally set up a proper personal homepage. What used to be just a GitHub Profile README is now a full static site powered by Docusaurus.
A quick summary of notable updates in deep learning inference, GPU architecture, and HPC this week.
The book that helped me most when I first started learning CUDA programming.
While studying deep learning inference optimization, I explored how memory access patterns in CUDA kernels dramatically affect performance.