Coleman Hooper

PhD Student (UC Berkeley)

Contact: chooper@berkeley.edu

Research Interests: Efficient LLM Inference, AI Systems, Model Compression

I'm a 4th-year PhD student at UC Berkeley, advised by Professors Kurt Keutzer in Berkeley AI Research (BAIR) and Sophia Shao in SLICE Lab (Computer Architecture group). I mainly work on efficient algorithms for LLM inference and AI systems.

prof_pic.jpg

selected publications

  1. NeurIPS
    Multipole Attention for Efficient Long Context Reasoning
    Coleman Hooper*, Sebastian Zhao*, Luca Manolache, and 5 more authors
    NeurIPS, 2025
  2. ACL
    Squeezed Attention: Accelerating Long Context Length LLM Inference
    Coleman Hooper*, Sehoon Kim*, Hiva Mohammadzadeh, and 6 more authors
    ACL, 2025
  3. NeurIPS
    KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
    Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, and 4 more authors
    NeurIPS, 2024
  4. ICML
    SqueezeLLM: Dense-and-Sparse Quantization
    Sehoon Kim*, Coleman Hooper*, Amir Gholami*, and 5 more authors
    ICML, 2024
  5. ISCA Workshop
    Full Stack Optimization of Transformer Inference: A Survey
    Sehoon Kim*, Coleman Hooper*, Thanakul Wattanawong, and 8 more authors
    ISCA Workshop on Architecture and System Support for Transformer Models (ASSYST), 2023