ParaLLM: 1600+ tok/s on a MacBook
June 23, 2024
Batched KV caching for fast parallel LLM inference in MLX.
Read more →Thoughts, musings, works-in-progress, info-dumps.
June 23, 2024
Batched KV caching for fast parallel LLM inference in MLX.
Read more →June 05, 2024
I organized all my favorite LLM explainers into a 'book'.
Read more →April 29, 2024
A summary of my PhD + thoughts on what comes next.
Read more →