ProMiner: Enhancing Locality, Parallelism, and Offloading for Graph Mining on Processing-in-Memory Systems

Published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TACD), 2025

Abstract

Graph mining, critical for discovering specific patterns within complex structures, is becoming increasingly important in our data-driven world. Due to their memory-bound nature, graph mining applications encounter significant limitations with conventional processor-centric systems, like central processing units (CPUs) and graphics processing units (GPUs), stemming from the costly data movement between memory and processing units. Memory-centric computing systems, such as processing-in-memory (PIM) where computation occurs directly within or near memory modules, have the potential to accelerate graph mining. However, accelerating graph mining applications with PIM presents three primary challenges: 1) the difficulty in utilizing locality; 2) the challenge of exploring parallelism; and 3) the complexity of workload offloading between PIM and CPU. Addressing these intricate challenges, we introduce ProMiner, a novel framework that integrates three key techniques through cohesive software and hardware co-design. First, we propose a partitioning method tailored for graph mining to enhance data locality. Second, we design a coarse-fine parallelism optimization scheme to explore parallelism across different levels of memory. Third, we introduce a concurrency-aware mechanism for performance estimation, aimed at identifying the optimal computing engine for workload offloading to maximize performance. Our experimental results demonstrate that ProMiner significantly advances the state-of-the-art in graph mining, achieving 48.8% and 29.9% execution time reduction over NDMiner and DIMMining, respectively.

paper