본문 바로가기

LAMMPS

NUMA Pinning with intelOneAPI

LAMMPS Script

사용하는 시스템 NPS4

processors       * * 2 numa_nodes 4
package          omp 5

 

numactl

$ numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 7 8 12 13 14 18 19 20
node 0 size: 31850 MB
node 0 free: 28285 MB
node 1 cpus: 4 5 6 9 10 11 15 16 17 21 22 23
node 1 size: 32251 MB
node 1 free: 30036 MB
node 2 cpus: 24 25 26 27 31 32 33 37 38 39 43 44
node 2 size: 32251 MB
node 2 free: 30743 MB
node 3 cpus: 28 29 30 34 35 36 40 41 42 45 46 47
node 3 size: 32248 MB
node 3 free: 30612 MB
node distances:
node   0   1   2   3 
  0:  10  11  21  21 
  1:  11  10  21  21 
  2:  21  21  10  11 
  3:  21  21  11  10

 

실행명령어

mpirun -np 8 \
-genv I_MPI_PIN 1 \
-genv I_MPI_PIN_DOMAIN numa \
-genv I_MPI_PIN_ORDER spread \
-genv I_MPI_PERHOST 2 \
-genv I_MPI_DEBUG 5 \
./lmp -in MWE_1.in

 

옵션설명

Using Intel MPI Pinning Variables

To accomplish a “similar” pinning strategy in Intel MPI (one that places exactly 2 ranks per node with 6 cores pinned to each rank), you generally combine these environment variables:

  1. I_MPI_PIN=1
    • Enables process pinning.
  2. I_MPI_PIN_DOMAIN
    • Defines the “scope” of pinning (e.g., numa, socket, core). If you want each rank pinned to an entire NUMA node, use numa; if your system has 4 NUMA nodes and you launch 8 ranks, Intel MPI will (ideally) place 2 ranks per NUMA domain in a “spread” pattern.
  3. I_MPI_PIN_ORDER
    • Controls how ranks are spread or compacted (e.g., spread, bunch, compact).
  4. I_MPI_PERHOST=<N>
    • If you want exactly <N> ranks per host (node) in a multi‐node cluster. Setting 2 might help ensure you only get 2 ranks per node.
  5. I_MPI_DEBUG=5 (or 4, 6, etc.)
    • Prints diagnostic info so you can see how Intel MPI actually pinned processes.

'LAMMPS' 카테고리의 다른 글

Domain decomposition VTK convert  (0) 2025.02.24
Local patch  (0) 2025.02.20
wall/gran의 정보 빨아오기  (0) 2025.02.09
특이한 에러  (0) 2025.02.01
PyLAMMPS  (0) 2025.01.30