CPU Pinning and Affinity Check

Here’s an analysis of your hybrid MPI+OpenMP LAMMPS run using:

export I_MPI_PIN=1
export OMP_NUM_THREADS=4
mpirun -np 12 lmp -in in.ST1.MSCDSS -sf intel -pk intel 0 omp 4

1. What the Configuration Does

-np 12: Launches 12 MPI processes
OMP_NUM_THREADS=4: Each MPI process spawns 4 OpenMP threads
So the total logical concurrency demand is: 12 MPI x 4 OMP = 48 threads
I_MPI_PIN=1: Requests Intel MPI to pin processes/threads to specific CPU cores for performance via CPU affinity

2. Affinity Check: taskset -cp

Each PID (MPI rank) is bound to 4 logical cores, and the cores do not overlap, which is correct behavior for:

Ensuring no thread contention
Allowing vectorization and cache locality to be preserved

Here’s the summary mapping:

MPI RankCPU Affinity (4 logical cores)

0	0-3
1	7,8,12,13
2	14,18-20
3	4-6,9
4	10,11,15,16
5	17,21-23
6	24-27
7	31-33,37
8	38,39,43,44
9	28-30,34
10	35,36,40,41
11	42,45-47

🔍 Good signs:

All MPI ranks are affinitized to different cores, matching OMP_NUM_THREADS=4
There is no core sharing, avoiding resource contention

3. Thread Verification: ps -L -p

Each PID has 4 LWPs (lightweight processes = threads):

PID 211438 → 4 threads (main + 3 OpenMP)
PID 211439 → 4 threads
...

✅ Confirms that OMP_NUM_THREADS=4 is properly respected

4. System Optimization Insight

You are using:

12 MPI processes
4 OpenMP threads per process
Pinning is working well
No oversubscription
NUMA locality and core grouping are probably honored (would need lstopo or numactl -H to confirm precisely)

This is a well-formed hybrid MPI+OMP execution setup.

✅ Summary

✅ OpenMP threads per MPI rank = 4: confirmed
✅ Core binding per rank = 4 non-overlapping logical cores: confirmed
✅ Total 48 threads over system cores: efficient and clean
✅ I_MPI_PIN=1 is effective
🔄 Further optimization (if needed) could involve:
- Ensuring NUMA locality with numactl or hwloc
- Using KMP_AFFINITY=granularity=fine,compact if Intel OpenMP is used (to help fine-tune thread affinity)
- Checking lscpu or hwloc-ls to verify CCD/domain boundaries

Let me know if you’d like an analysis with lstopo or NUMA distance awareness added.

TID in This Context?

In Linux, TID refers to the Thread ID, and in most tools (like ps -L), it’s shown as the Lightweight Process (LWP)ID. In your output:

ps -L -p 211438
    PID     LWP TTY          TIME CMD
 211438  211438 pts/1    00:02:57 lmp
 211438  211486 pts/1    00:00:00 lmp
 211438  211488 pts/1    00:00:00 lmp
 211438  211491 pts/1    00:00:00 lmp

PID is the process ID (i.e., MPI rank)
LWP (or TID) is the thread ID
So MPI rank 0 (PID 211438) is using 4 threads (TID 211438, 211486, 211488, 211491)

This shows that each MPI process is spawning OpenMP threads as expected.

🧠 What

taskset -cp <PID>

Does

taskset -cp <PID> reports or sets the CPU affinity for the entire process, including all its threads, unless you modify it per-thread with extra work.

So, when you run:

taskset -cp 211438

You get:

pid 211438's current affinity list: 0-3

This tells you that PID 211438 and all its threads (TIDs 211438, 211486, 211488, 211491) are restricted to cores 0–3.

However, note:

taskset does not show per-thread affinity
Linux generally inherits the CPU affinity of the main process to its threads (unless changed by OpenMP or a manual override)

To check per-thread affinity, you’d need something like:

for tid in $(ps -L -p 211438 | awk 'NR>1 {print $2}'); do
    taskset -cp $tid
done

💡 Is it Worth Running

taskset -cp <PID>

?

Yes — it is absolutely worth running if:

You’re verifying CPU affinity of your MPI ranks
You want to ensure OpenMP threads are not competing for the same CPU cores
You’re checking that Intel MPI pinning via I_MPI_PIN=1 is effective

But it’s not sufficient for per-thread analysis (use hwloc or thread-level taskset for that).

✅ Summary

TermMeaning

PID	MPI rank process ID
TID	Thread ID (LWP), shown under ps -L
taskset -cp <PID>	Shows/sets CPU affinity for the whole process (and by default, its threads)
Worth running?	✅ Yes, for checking MPI rank affinity setup

For deeper optimization (e.g., NUMA-aware mapping or thread-core pinning), tools like hwloc, numactl, or likwid-pinmay offer better granularity.

Let me know if you want to script full rank/thread affinity checks.

'HPC' 카테고리의 다른 글

Mellanox Connect-X (0)	2025.07.08
AMD Threadripper Hybrid MPI+OMP (0)	2025.06.26
LAMMPS on Intel Xeon (0)	2025.05.14
LAMMPS Process Mapping in OpenMPI - (3) --map-by ppr:<N>:<resource> (0)	2025.05.08
LAMMPS Process Mapping in OpenMPI - (2) --map-by numa:PE (0)	2025.05.08

NMGE

CPU Pinning and Affinity Check

1. What the Configuration Does

2. Affinity Check: taskset -cp

3. Thread Verification: ps -L -p

4. System Optimization Insight

✅ Summary

TID in This Context?

🧠 What

taskset -cp <PID>

Does

💡 Is it Worth Running

taskset -cp <PID>

?

✅ Summary

'HPC' 카테고리의 다른 글

티스토리툴바

CPU Pinning and Affinity Check

1. What the Configuration Does

2. Affinity Check: taskset -cp

3. Thread Verification: ps -L -p

4. System Optimization Insight

✅ Summary

TID in This Context?

🧠 What

taskset -cp <PID>

Does

💡 Is it Worth Running

taskset -cp <PID>

?

✅ Summary

'HPC' 카테고리의 다른 글

'HPC' Related Articles

티스토리툴바