publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- NeurIPS
- arXivBeyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language ModelsarXiv preprint, 2025
- arXivFGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM InferencearXiv preprint, 2025
- arXivXQuant: Breaking the Memory Wall for LLM Inference with KV Cache RematerializationarXiv preprint, 2025
- arXiv
- ICML
- SpringerSPEED: Speculative Pipelined Execution for Efficient DecodingIn Enhancing LLM Performance: Efficacy, Fine-Tuning, and Inference Techniques, 2025
2024
- EMNLP Demo
- IEEE Micro
- ICML Workshop
- MLSys
2023
- CHiME-7 WorkshopProperty-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data GenerationCHiME-7 Workshop, 2023
- ISSCCA 12nm 18.1 TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power ManagementIn ISSCC, 2023
- ISCA WorkshopFull Stack Optimization of Transformer Inference: A SurveyISCA Workshop on Architecture and System Support for Transformer Models (ASSYST), 2023
2022
- JSSCA 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference with Bayesian Sound Source Separation and Attention-Based DNNsJSSC, 2022
2021
- MICROEdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP InferenceIn MICRO, 2021
- ISSCCA 25mm² SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFETIn ISSCC, 2021
- Hot ChipsSM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP ApplicationsIn Hot Chips Symposium, 2021