https://www.intel.com/content/www/us/en/developer/articles/guide/lammps-tuning-guide.html
LAMMPS Tuning Guide on 3rd Generation Intel® Xeon® Scalable...
The LAMMPS tuning guide includes optimizations for Intel® AVX-512 on Intel® Xeon® Scalable Processors that can significantly speed up simulations.
www.intel.com
1. lstopo
$ lstopo numa_simple.svg
물리코어는 48개 이지만, Hyper threading을 하므로 논리코어는 96개 이다.
위의 topology를 보면, 물리코어 (L#0)에는 논리코어 (P#0, P#48)이 할당되어 있다.
따라서, LAMMPS에서 48개의 rank를 물리코어에 구성하고, 각각의 rank당 하나의 OpenMP thread를 할당한다면,
물리코어 L#0에 (P#0, P#48) 각각 프로세스와 Omp 쓰레드가 탑재되어야 한다.
이를 위해서 LAMMPS의 Command Line Option은
export I_MPI_DEBUG=5 # Optional: shows detailed binding/debug info
mpirun -np 48 lmp -sf intel -in in.ST1.MSCDSS -pk intel 0 omp 2
출력을 검토하면, 제대로 나온 듯
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 75332 hpz8 {0,48}
[0] MPI startup(): 1 75333 hpz8 {1,49}
[0] MPI startup(): 2 75334 hpz8 {2,50}
[0] MPI startup(): 3 75335 hpz8 {3,51}
[0] MPI startup(): 4 75336 hpz8 {7,55}
[0] MPI startup(): 5 75337 hpz8 {8,56}
[0] MPI startup(): 6 75338 hpz8 {12,60}
[0] MPI startup(): 7 75339 hpz8 {13,61}
[0] MPI startup(): 8 75340 hpz8 {14,62}
[0] MPI startup(): 9 75341 hpz8 {18,66}
[0] MPI startup(): 10 75342 hpz8 {19,67}
[0] MPI startup(): 11 75343 hpz8 {20,68}
[0] MPI startup(): 12 75344 hpz8 {4,52}
[0] MPI startup(): 13 75345 hpz8 {5,53}
[0] MPI startup(): 14 75346 hpz8 {6,54}
[0] MPI startup(): 15 75347 hpz8 {9,57}
[0] MPI startup(): 16 75348 hpz8 {10,58}
[0] MPI startup(): 17 75349 hpz8 {11,59}
[0] MPI startup(): 18 75350 hpz8 {15,63}
[0] MPI startup(): 19 75351 hpz8 {16,64}
[0] MPI startup(): 20 75352 hpz8 {17,65}
[0] MPI startup(): 21 75353 hpz8 {21,69}
[0] MPI startup(): 22 75354 hpz8 {22,70}
[0] MPI startup(): 23 75355 hpz8 {23,71}
[0] MPI startup(): 24 75356 hpz8 {24,72}
[0] MPI startup(): 25 75357 hpz8 {25,73}
[0] MPI startup(): 26 75358 hpz8 {26,74}
[0] MPI startup(): 27 75359 hpz8 {27,75}
[0] MPI startup(): 28 75360 hpz8 {31,79}
[0] MPI startup(): 29 75361 hpz8 {32,80}
[0] MPI startup(): 30 75362 hpz8 {33,81}
[0] MPI startup(): 31 75363 hpz8 {37,85}
[0] MPI startup(): 32 75364 hpz8 {38,86}
[0] MPI startup(): 33 75365 hpz8 {39,87}
[0] MPI startup(): 34 75366 hpz8 {43,91}
[0] MPI startup(): 35 75367 hpz8 {44,92}
[0] MPI startup(): 36 75368 hpz8 {28,76}
[0] MPI startup(): 37 75369 hpz8 {29,77}
[0] MPI startup(): 38 75370 hpz8 {30,78}
[0] MPI startup(): 39 75371 hpz8 {34,82}
[0] MPI startup(): 40 75372 hpz8 {35,83}
[0] MPI startup(): 41 75373 hpz8 {36,84}
[0] MPI startup(): 42 75374 hpz8 {40,88}
[0] MPI startup(): 43 75375 hpz8 {41,89}
[0] MPI startup(): 44 75376 hpz8 {42,90}
[0] MPI startup(): 45 75377 hpz8 {45,93}
[0] MPI startup(): 46 75378 hpz8 {46,94}
[0] MPI startup(): 47 75379 hpz8 {47,95}
'HPC' 카테고리의 다른 글
LAMMPS Process Mapping in OpenMPI - (3) --map-by ppr:<N>:<resource> (0) | 2025.05.08 |
---|---|
LAMMPS Process Mapping in OpenMPI - (2) --map-by numa:PE (0) | 2025.05.08 |
LAMMPS Process Mapping in OpenMPI - (1) Basic (0) | 2025.05.08 |
NVIDIA Driver install (0) | 2025.05.07 |
NVIDIA ConnectX-5 Direct Dual Connection (0) | 2025.05.04 |