Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in The 2020 IEEE 39th International Conference on Computer Design (ICCD 2020), 2020
Published in The 2021 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED 2021), 2021
Published in The 2021 IEEE 39th International Conference on Computer Design (ICCD 2021), 2021
Published in 2022 Winter Simulation Conference (WSC 2022), 2022
Published in Journal of Computer Science and Technology (JCST), 2023
With the surge of big data applications and the worsening of the memory-wall problem, the memory system, instead of the computing unit, becomes the commonly recognized major concern of computing. However, this “memory-centric” common understanding has a humble beginning. More than three decades ago, the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation. The memory-bounded model was well received even by then. It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing. These include Prof. Kai Hwang’s book “Scalable Parallel Computing” in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law, parallel with the Amdahl’s Law and the Gustafson’s Law. Through the years, the impacts of this model have grown far beyond parallel processing and into the fundamental of computing. In this article, we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue, to stimulate new solutions for big data applications, and to promote data-centric thinking and rethinking.
Published in The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2023), 2024
Published in The 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2024), 2024
Published in The 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2024), 2024
Published in The 2024 IEEE 42nd International Conference on Computer Design (ICCD 2024), 2024
Published in IEEE Computer Architecture Letters (CAL), 2025
Integrating processing-in-memory (PIM) with GPUs accelerates large language model (LLM) inference, but existing GPU-PIM systems encounter several challenges. While GPUs excel in large general matrix-matrix multiplications (GEMM), they struggle with small-scale operations better suited for PIM, which currently cannot handle them independently. Additionally, the computational demands of activation operations exceed the capabilities of current PIM technologies, leading to excessive data movement between the GPU and memory. PIM’s potential for general matrix-vector multiplications (GEMV) is also limited by insufficient support for fine-grained parallelism. To address these issues, we propose Pyramid, a novel GPU-PIM system that optimizes PIM for LLM inference by strategically allocating cross-level computational resources within PIM to meet diverse needs and leveraging the strengths of both technologies. Evaluation results demonstrate that Pyramid outperforms existing systems like NeuPIM, AiM, and AttAcc by factors of 2.31×, 1.91×, and 1.72×, respectively.
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.