Posts by Collection

portfolio

publications

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

Published in ICML, 2025

An on-the-fly MoE inference system on memory-constrained GPU, founded on the insight that substantial untapped redundancy exists within sparsely activated experts.

Recommended citation: Zhou, Y., Li, Z., Zhang, J., Wang, J., Wang, Y., Xie, Z., Chen, K., & Shou, L. (2025). FloE: On-the-Fly MoE Inference on Memory-constrained GPU. arXiv. https://arxiv.org/abs/2505.05950 https://arxiv.org/pdf/2505.05950v2

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.