Command line option for MPIRUN

PPR :

Processes Per Resource

Examples

--map-by ppr:1:socket	1 MPI rank per CPU socket
--map-by ppr:2:numa	2 MPI ranks per NUMA node
--map-by ppr:1:node	1 MPI rank per compute node
--map-by ppr:4:core	4 MPI ranks per core (⚠️ rarely used; oversubscribes CPUs)
--map-by ppr:1:numa:PE=16	1 MPI rank per NUMA node, with 16 cores (OpenMP threads) each

Bind to

It tells OpenMPI to pin each MPI process (and possibly its threads) to specific CPU hardware:

Prevents CPU migration
Improves cache locality
Reduces NUMA latency
Prevents thread/process bouncing across cores

--bind-to none	No binding — processes can float across all CPUs (not NUMA friendly)
--bind-to core	Bind each process to a set of cores (based on PE or thread count)
--bind-to socket	Bind each process to a CPU socket (group of cores)
--bind-to numa	Bind each process to a NUMA node
--bind-to hwthread	Bind to logical CPUs (hyperthreads) — rarely used for performance

Example

mpirun -np 4 --map-by ppr:1:numa:PE=16 --bind-to core ...

4 MPI ranks
Each rank gets 16 cores
Those 16 cores are pinned, so the OS can’t move them
OpenMP threads stay within those cores → great NUMA locality

--bind-to none	MPI process and threads can move across all 64 cores — may cause NUMA traffic
--bind-to core	Process and its threads pinned to specific physical cores — best for cache/NUMA
--bind-to numa	Process bound to full NUMA node — useful when OpenMP threads < NUMA cores
--bind-to socket	Similar to numa, but not always accurate on AMD with NPS settings

Thread binding in OpenMP

export OMP_PLACES=cores
export OMP_PROC_BIND=close

OpenMP threads stay within the process’s assigned cores
Threads are packed closely (minimizing L3 cache sharing)

Example

export OMP_NUM_THREADS=16
export OMP_PLACES=cores
export OMP_PROC_BIND=close

mpirun -np 4 \
  --map-by ppr:1:numa:PE=16 \
  --bind-to core \
  --report-bindings \
  ./lmp -sf omp -pk omp 16 -in in.benchmark

[dell7875-Precision-7875-Tower:21904] MCW rank 0 bound to NUMA node 0[core 0-15]: [B/B/B/B/./././././././././././.]
[dell7875-Precision-7875-Tower:21905] MCW rank 1 bound to NUMA node 1[core 16-31]: [././././B/B/B/B/B/B/B/B/B/B/B/B]
[dell7875-Precision-7875-Tower:21906] MCW rank 2 bound to NUMA node 2[core 32-47]: [././././././././B/B/B/B/B/B/B/B]
[dell7875-Precision-7875-Tower:21907] MCW rank 3 bound to NUMA node 3[core 48-63]: [././././././././././././B/B/B/B]

B = core bound to this MPI rank
. = not bound
[core x-y] = physical cores used
NUMA node n = NUMA affinity (inferred from core set)

export OMP_NUM_THREADS=32
mpirun -np 2 \
  --map-by ppr:1:node:PE=32:overload-allowed \
  --bind-to core \
  --report-bindings \
  ./lmp -sf omp -pk omp 32 -in in.benchmark

[dell7875:00000] MCW rank 0 bound to [core 0-31]
[dell7875:00001] MCW rank 1 bound to [core 32-63]

'HPC' 카테고리의 다른 글

Setup Process_Intel Based Machine (0)	2025.05.01
NUMA Pinning of AMD ThreadRipper 7995WX Pro (0)	2025.04.24
pkg-config in NURION (0)	2025.04.15
생각해 볼 것 (0)	2025.04.14
Ncurses (0)	2025.04.14

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

NMGE

Command line option for MPIRUN

'HPC' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

Command line option for MPIRUN

'HPC' 카테고리의 다른 글

'HPC' Related Articles

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역