https://www.intel.com/content/www/us/en/developer/articles/guide/lammps-tuning-guide.html
LAMMPS Tuning Guide on 3rd Generation Intel® Xeon® Scalable...
The LAMMPS tuning guide includes optimizations for Intel® AVX-512 on Intel® Xeon® Scalable Processors that can significantly speed up simulations.
www.intel.com
1. lstopo
$ lstopo numa_simple.svg

물리코어는 48개 이지만, Hyper threading을 하므로 논리코어는 96개 이다.
위의 topology를 보면, 물리코어 (L#0)에는 논리코어 (P#0, P#48)이 할당되어 있다.
따라서, LAMMPS에서 48개의 rank를 물리코어에 구성하고, 각각의 rank당 하나의 OpenMP thread를 할당한다면,
물리코어 L#0에 (P#0, P#48) 각각 프로세스와 Omp 쓰레드가 탑재되어야 한다.
이를 위해서 LAMMPS의 Command Line Option은
export I_MPI_DEBUG=5 # Optional: shows detailed binding/debug info
mpirun -np 48 lmp -sf intel -in in.ST1.MSCDSS -pk intel 0 omp 2
출력을 검토하면, 제대로 나온 듯
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 75332 hpz8 {0,48}
[0] MPI startup(): 1 75333 hpz8 {1,49}
[0] MPI startup(): 2 75334 hpz8 {2,50}
[0] MPI startup(): 3 75335 hpz8 {3,51}
[0] MPI startup(): 4 75336 hpz8 {7,55}
[0] MPI startup(): 5 75337 hpz8 {8,56}
[0] MPI startup(): 6 75338 hpz8 {12,60}
[0] MPI startup(): 7 75339 hpz8 {13,61}
[0] MPI startup(): 8 75340 hpz8 {14,62}
[0] MPI startup(): 9 75341 hpz8 {18,66}
[0] MPI startup(): 10 75342 hpz8 {19,67}
[0] MPI startup(): 11 75343 hpz8 {20,68}
[0] MPI startup(): 12 75344 hpz8 {4,52}
[0] MPI startup(): 13 75345 hpz8 {5,53}
[0] MPI startup(): 14 75346 hpz8 {6,54}
[0] MPI startup(): 15 75347 hpz8 {9,57}
[0] MPI startup(): 16 75348 hpz8 {10,58}
[0] MPI startup(): 17 75349 hpz8 {11,59}
[0] MPI startup(): 18 75350 hpz8 {15,63}
[0] MPI startup(): 19 75351 hpz8 {16,64}
[0] MPI startup(): 20 75352 hpz8 {17,65}
[0] MPI startup(): 21 75353 hpz8 {21,69}
[0] MPI startup(): 22 75354 hpz8 {22,70}
[0] MPI startup(): 23 75355 hpz8 {23,71}
[0] MPI startup(): 24 75356 hpz8 {24,72}
[0] MPI startup(): 25 75357 hpz8 {25,73}
[0] MPI startup(): 26 75358 hpz8 {26,74}
[0] MPI startup(): 27 75359 hpz8 {27,75}
[0] MPI startup(): 28 75360 hpz8 {31,79}
[0] MPI startup(): 29 75361 hpz8 {32,80}
[0] MPI startup(): 30 75362 hpz8 {33,81}
[0] MPI startup(): 31 75363 hpz8 {37,85}
[0] MPI startup(): 32 75364 hpz8 {38,86}
[0] MPI startup(): 33 75365 hpz8 {39,87}
[0] MPI startup(): 34 75366 hpz8 {43,91}
[0] MPI startup(): 35 75367 hpz8 {44,92}
[0] MPI startup(): 36 75368 hpz8 {28,76}
[0] MPI startup(): 37 75369 hpz8 {29,77}
[0] MPI startup(): 38 75370 hpz8 {30,78}
[0] MPI startup(): 39 75371 hpz8 {34,82}
[0] MPI startup(): 40 75372 hpz8 {35,83}
[0] MPI startup(): 41 75373 hpz8 {36,84}
[0] MPI startup(): 42 75374 hpz8 {40,88}
[0] MPI startup(): 43 75375 hpz8 {41,89}
[0] MPI startup(): 44 75376 hpz8 {42,90}
[0] MPI startup(): 45 75377 hpz8 {45,93}
[0] MPI startup(): 46 75378 hpz8 {46,94}
[0] MPI startup(): 47 75379 hpz8 {47,95}
Check
export I_MPI_PIN=1
export I_MPI_PIN_DOMAIN=numa:3
export I_MPI_DEBUG=5
mpirun -np 12 lmp -in in.ST1.MSCDSS
[0] MPI startup(): ===== CPU pinning =====
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 15665 hpz8 {0-3}
[0] MPI startup(): 1 15666 hpz8 {7-8,12-13}
[0] MPI startup(): 2 15667 hpz8 {14,18-20}
[0] MPI startup(): 3 15668 hpz8 {4-6,9}
[0] MPI startup(): 4 15669 hpz8 {10-11,15-16}
[0] MPI startup(): 5 15670 hpz8 {17,21-23}
[0] MPI startup(): 6 15671 hpz8 {24-27}
[0] MPI startup(): 7 15672 hpz8 {31-33,37}
[0] MPI startup(): 8 15673 hpz8 {38-39,43-44}
[0] MPI startup(): 9 15674 hpz8 {28-30,34}
[0] MPI startup(): 10 15675 hpz8 {35-36,40-41}
[0] MPI startup(): 11 15676 hpz8 {42,45-47}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.15
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=ipl2
[0] MPI startup(): I_MPI_PIN_DOMAIN_SIZE=numa
[0] MPI startup(): I_MPI_PIN_DOMAIN_SIZE=numa
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_PIN=1
[0] MPI startup(): I_MPI_PIN_DOMAIN=numa:3
[0] MPI startup(): I_MPI_PIN_ORDER=3
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5'HPC' 카테고리의 다른 글
| AMD Threadripper Hybrid MPI+OMP (0) | 2025.06.26 |
|---|---|
| CPU Pinning and Affinity Check (0) | 2025.06.25 |
| LAMMPS Process Mapping in OpenMPI - (3) --map-by ppr:<N>:<resource> (0) | 2025.05.08 |
| LAMMPS Process Mapping in OpenMPI - (2) --map-by numa:PE (0) | 2025.05.08 |
| LAMMPS Process Mapping in OpenMPI - (1) Basic (0) | 2025.05.08 |