본문 바로가기

HPC

LAMMPS on Intel Xeon

https://www.intel.com/content/www/us/en/developer/articles/guide/lammps-tuning-guide.html

 

LAMMPS Tuning Guide on 3rd Generation Intel® Xeon® Scalable...

The LAMMPS tuning guide includes optimizations for Intel® AVX-512 on Intel® Xeon® Scalable Processors that can significantly speed up simulations.

www.intel.com

1. lstopo

 

$ lstopo numa_simple.svg

 

물리코어는 48개 이지만, Hyper threading을 하므로 논리코어는 96개 이다.

위의 topology를 보면, 물리코어 (L#0)에는 논리코어 (P#0, P#48)이 할당되어 있다.

따라서, LAMMPS에서 48개의 rank를 물리코어에 구성하고, 각각의 rank당 하나의 OpenMP thread를 할당한다면,

 

물리코어 L#0에 (P#0, P#48) 각각 프로세스와 Omp 쓰레드가 탑재되어야 한다.

이를 위해서 LAMMPS의 Command Line Option은

 

export I_MPI_DEBUG=5  # Optional: shows detailed binding/debug info
mpirun -np 48 lmp -sf intel -in in.ST1.MSCDSS -pk intel 0 omp 2

 

출력을 검토하면, 제대로 나온 듯

[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       75332    hpz8       {0,48}
[0] MPI startup(): 1       75333    hpz8       {1,49}
[0] MPI startup(): 2       75334    hpz8       {2,50}
[0] MPI startup(): 3       75335    hpz8       {3,51}
[0] MPI startup(): 4       75336    hpz8       {7,55}
[0] MPI startup(): 5       75337    hpz8       {8,56}
[0] MPI startup(): 6       75338    hpz8       {12,60}
[0] MPI startup(): 7       75339    hpz8       {13,61}
[0] MPI startup(): 8       75340    hpz8       {14,62}
[0] MPI startup(): 9       75341    hpz8       {18,66}
[0] MPI startup(): 10      75342    hpz8       {19,67}
[0] MPI startup(): 11      75343    hpz8       {20,68}
[0] MPI startup(): 12      75344    hpz8       {4,52}
[0] MPI startup(): 13      75345    hpz8       {5,53}
[0] MPI startup(): 14      75346    hpz8       {6,54}
[0] MPI startup(): 15      75347    hpz8       {9,57}
[0] MPI startup(): 16      75348    hpz8       {10,58}
[0] MPI startup(): 17      75349    hpz8       {11,59}
[0] MPI startup(): 18      75350    hpz8       {15,63}
[0] MPI startup(): 19      75351    hpz8       {16,64}
[0] MPI startup(): 20      75352    hpz8       {17,65}
[0] MPI startup(): 21      75353    hpz8       {21,69}
[0] MPI startup(): 22      75354    hpz8       {22,70}
[0] MPI startup(): 23      75355    hpz8       {23,71}
[0] MPI startup(): 24      75356    hpz8       {24,72}
[0] MPI startup(): 25      75357    hpz8       {25,73}
[0] MPI startup(): 26      75358    hpz8       {26,74}
[0] MPI startup(): 27      75359    hpz8       {27,75}
[0] MPI startup(): 28      75360    hpz8       {31,79}
[0] MPI startup(): 29      75361    hpz8       {32,80}
[0] MPI startup(): 30      75362    hpz8       {33,81}
[0] MPI startup(): 31      75363    hpz8       {37,85}
[0] MPI startup(): 32      75364    hpz8       {38,86}
[0] MPI startup(): 33      75365    hpz8       {39,87}
[0] MPI startup(): 34      75366    hpz8       {43,91}
[0] MPI startup(): 35      75367    hpz8       {44,92}
[0] MPI startup(): 36      75368    hpz8       {28,76}
[0] MPI startup(): 37      75369    hpz8       {29,77}
[0] MPI startup(): 38      75370    hpz8       {30,78}
[0] MPI startup(): 39      75371    hpz8       {34,82}
[0] MPI startup(): 40      75372    hpz8       {35,83}
[0] MPI startup(): 41      75373    hpz8       {36,84}
[0] MPI startup(): 42      75374    hpz8       {40,88}
[0] MPI startup(): 43      75375    hpz8       {41,89}
[0] MPI startup(): 44      75376    hpz8       {42,90}
[0] MPI startup(): 45      75377    hpz8       {45,93}
[0] MPI startup(): 46      75378    hpz8       {46,94}
[0] MPI startup(): 47      75379    hpz8       {47,95}