infiniband 연결 확인
Device information 확인
z641@z641:~$ ibv_devinfo
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.35.4030
node_guid: 0c42:a103:0017:3b2e
sys_image_guid: 0c42:a103:0017:3b28
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000023
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 16.35.4030
node_guid: 0c42:a103:0017:3b2f
sys_image_guid: 0c42:a103:0017:3b28
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000023
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_2
transport: InfiniBand (0)
fw_ver: 16.35.4030
node_guid: 0c42:a103:0017:3b28
sys_image_guid: 0c42:a103:0017:3b28
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000023
phys_port_cnt: 1
port: 1
state: PORT_INIT (2)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_3
transport: InfiniBand (0)
fw_ver: 16.35.4030
node_guid: 0c42:a103:0017:3b29
sys_image_guid: 0c42:a103:0017:3b28
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000023
phys_port_cnt: 1
port: 1
state: PORT_INIT (2)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
Infiniband status
z640@z640:~$ ibstat
CA 'mlx5_0'
CA type: MT4119
Number of ports: 1
Firmware version: 16.35.4030
Hardware version: 0
Node GUID: 0x0c42a10300173ae6
System image GUID: 0x0c42a10300173ae0
Port 1:
State: Down
Physical state: LinkUp
Rate: 100
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0xa641e848
Port GUID: 0x0c42a10300173ae6
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4119
Number of ports: 1
Firmware version: 16.35.4030
Hardware version: 0
Node GUID: 0x0c42a10300173ae7
System image GUID: 0x0c42a10300173ae0
Port 1:
State: Down
Physical state: LinkUp
Rate: 100
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0xa641e848
Port GUID: 0x0c42a10300173ae7
Link layer: InfiniBand
CA 'mlx5_2'
CA type: MT4119
Number of ports: 1
Firmware version: 16.35.4030
Hardware version: 0
Node GUID: 0x0c42a10300173ae0
System image GUID: 0x0c42a10300173ae0
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 100
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0xa651e848
Port GUID: 0x0c42a10300173ae0
Link layer: InfiniBand
CA 'mlx5_3'
CA type: MT4119
Number of ports: 1
Firmware version: 16.35.4030
Hardware version: 0
Node GUID: 0x0c42a10300173ae1
System image GUID: 0x0c42a10300173ae0
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 100
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0xa651e848
Port GUID: 0x0c42a10300173ae1
Link layer: InfiniBand
SM lid는 Subnet Manager Local IDentifier의 약자로,
Base lid 65535 값은 LID가 할당되지 않은 경우 default값이다.
Base lid 는 해당포트의 LID값이며, SM lid는 Subnet manager가 돌아가고 있는 포트의 LID값이다.
정상적으로 SM이 실행되고 있는 환경이라면 다음과 같이 나와야 된다.
Port 1:
State: Active
Base lid: 3
SM lid: 1
OpenSM이 설치되어 있는가?
z640@z640:~$ which opensm
/usr/sbin/opensm
z640@z640:~$ systemctl status opensm
○ opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers
Loaded: loaded (/usr/lib/systemd/system/opensm.service; enabled; preset: enabled)
Active: inactive (dead)
Condition: start condition unmet at Sat 2025-07-05 16:59:32 UTC; 21h ago
└─ ConditionPathExists=/sys/class/infiniband_mad/abi_version was not met
Docs: man:opensm(8)
Jul 05 16:59:32 z640 systemd[1]: opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers was skipped because of an unmet condition check (ConditionPathExists=/sys/class/infiniband_mad/abi_version).
Mellanox OFED 커널이 현재 Linux버전을 서포트 하지 않아서 그렇다 한다. Mellanox OFED를 인스톨할때 옵션에서 현재 Linux kernel을 support할 수 있게 해야 한다고 한다.
'HPC' 카테고리의 다른 글
| Bandwidth Test (0) | 2025.09.24 |
|---|---|
| IPoIB (0) | 2025.09.24 |
| AMD Threadripper Hybrid MPI+OMP (0) | 2025.06.26 |
| CPU Pinning and Affinity Check (0) | 2025.06.25 |
| LAMMPS on Intel Xeon (0) | 2025.05.14 |