CoPIM: A Concurrency-aware PIM Workload Offloading Architecture for Graph Applications

Published in The 2021 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED 2021), 2021

Background

PIM architecture enables computation near memory, reducing the memory wall bottleneck prevalent in data-heavy applications. However, determining the optimal workload to offload to PIM, especially under high concurrency, is essential for maximizing efficiency.

Design

CoPIM introduces a Pure Miss Cycle Rate metric to guide workload partitioning based on concurrency and cache miss patterns. This metric helps CoPIM identify critical portions of graph workloads that can be offloaded to PIM units without compromising CPU performance. CoPIM’s architecture also includes an Instruction Management Unit (IMU) to manage loop-based offloading dynamically, improving the responsiveness of offloading decisions.

CoPIM workflow for selecting offloading targets

Key Features

Pure Miss Cycle Rate (θ): Measures concurrency and cache utilization to determine suitable workloads for PIM execution.
Dynamic Workload Partitioning: Adapts offloading decisions in real-time, focusing on high-impact code segments for enhanced system efficiency.

Results

CoPIM outperforms PEI and GraphPIM with a 19.5% and 11.4% speedup, respectively, demonstrating significant energy efficiency improvements due to reduced offloading overhead.

Performance comparison of state-of-the-art frameworks

Conclusion

CoPIM showcases that concurrency-aware workload partitioning in PIM can optimize graph application performance, especially in data-intensive environments, establishing a new standard for workload offloading strategies in PIM architectures.

paper

Share on

Twitter Facebook LinkedIn

Xiaoyang Lu