publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. NeurIPS
    Multipole Attention for Efficient Long Context Reasoning
    Coleman Hooper*, Sebastian Zhao*, Luca Manolache, and 5 more authors
    NeurIPS, 2025
  2. arXiv
    Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
    Minseo Kim, Coleman Hooper, Aditya Tomar, and 5 more authors
    arXiv preprint, 2025
  3. arXiv
    FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
    Coleman Hooper, Charbel Sakr, Ben Keller, and 4 more authors
    arXiv preprint, 2025
  4. arXiv
    XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
    Aditya Tomar*, Coleman Hooper*, Minjae Lee, and 7 more authors
    arXiv preprint, 2025
  5. arXiv
    ETS: Efficient Tree Search for Inference-Time Scaling
    Coleman Hooper, Sehoon Kim, Suhong Moon, and 7 more authors
    arXiv preprint, 2025
  6. ICML
    QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
    Haocheng Xi, Aditya Tomar, Coleman Hooper, and 6 more authors
    ICML, 2025
  7. ACL
    Squeezed Attention: Accelerating Long Context Length LLM Inference
    Coleman Hooper*, Sehoon Kim*, Hiva Mohammadzadeh, and 6 more authors
    ACL, 2025
  8. Springer
    SPEED: Speculative Pipelined Execution for Efficient Decoding
    Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, and 4 more authors
    In Enhancing LLM Performance: Efficacy, Fine-Tuning, and Inference Techniques, 2025

2024

  1. EMNLP Demo
    TinyAgent: Function Calling at the Edge
    Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, and 7 more authors
    EMNLP Demo, 2024
  2. IEEE Micro
    AI and Memory Wall
    Amir Gholami, Zhewei Yao, Sehoon Kim, and 3 more authors
    IEEE Micro, 2024
  3. NeurIPS
    KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
    Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, and 4 more authors
    NeurIPS, 2024
  4. ICML Workshop
    Learned Best-Effort LLM Serving
    Siddharth Jha, Coleman Hooper, Xiaoxuan Liu, and 2 more authors
    ICML Workshop on Efficient Systems for Foundation Models, 2024
  5. MLSys
    S-LoRA: Serving Thousands of Concurrent LoRA Adapters
    Ying Sheng, Shiyi Cao, Dacheng Li, and 8 more authors
    MLSys, 2024
  6. ICML
    SqueezeLLM: Dense-and-Sparse Quantization
    Sehoon Kim*, Coleman Hooper*, Amir Gholami*, and 5 more authors
    ICML, 2024

2023

  1. CHiME-7 Workshop
    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
    Tae Jin Park, He Huang, Coleman Hooper, and 5 more authors
    CHiME-7 Workshop, 2023
  2. ISSCC
    A 12nm 18.1 TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management
    Thierry Tambe, Jeff Zhang, Coleman Hooper, and 8 more authors
    In ISSCC, 2023
  3. ISCA Workshop
    Full Stack Optimization of Transformer Inference: A Survey
    Sehoon Kim*, Coleman Hooper*, Thanakul Wattanawong, and 8 more authors
    ISCA Workshop on Architecture and System Support for Transformer Models (ASSYST), 2023

2022

  1. JSSC
    A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference with Bayesian Sound Source Separation and Attention-Based DNNs
    Thierry Tambe, En-Yu Yang, Glenn G Ko, and 7 more authors
    JSSC, 2022

2021

  1. MICRO
    EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
    Thierry Tambe, Coleman Hooper, Lillian Pentecost, and 8 more authors
    In MICRO, 2021
  2. ISSCC
    A 25mm² SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET
    Thierry Tambe, En-Yu Yang, Glenn G Ko, and 7 more authors
    In ISSCC, 2021
  3. Hot Chips
    SM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP Applications
    Thierry Tambe, En-Yu Yang, Glenn G Ko, and 7 more authors
    In Hot Chips Symposium, 2021