Ke Wan

Hi! I’m a Software Engineer II on the HPC/AI team at Microsoft (Azure Core), where I work on large-scale LLM systems in production cloud environments. My research interests lie in efficient transformer inference and system-aware LLM serving. I focus on memory-efficient inference techniques, particularly KV-cache optimization, to enable scalable and high-throughput deployment without model retraining. My work addresses fundamental bottlenecks in large-scale inference and has been evaluated and adopted by both academic and industrial research. Related open-source implementations have received 1,100+ GitHub stars, demonstrating strong community engagement and practical impact. I apply these research insights to the design of reliable and scalable inference platforms serving high-volume workloads.

Your Name

Publications