- 설치 후 LINUX에서 확인
$ lspci |grep -i Mellanox
3e:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
3e:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
ab:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
ab:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
버스 3e에 있는 ConnectX-5 1장, 버스 ab에 있는 ConnectX-5 1장 이 설치되어 있다는 정보이다.
.0, .1은 하나의 카드에 Ethernet과 Inifiniband를 의미한다.
- 드라이버 설치
리눅스에서 설치하여야 할 것 : NVIDIA OFED(OpenFabrics Enterprise Distribution) for Linux package
MLNX_OFED_LINUX-<ver>-<OS label><CPU arch>.iso.
그런데, apt 설치도 지원하므로 이걸로 하자
- GPG 키
wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox \
| sudo gpg --dearmor -o /usr/share/keyrings/mellanox-ofed.gpg
- 저장소 등록
echo "deb [signed-by=/usr/share/keyrings/mellanox-ofed.gpg] https://linux.mellanox.com/public/repo/mlnx_ofed/24.10-3.2.5.0/ubuntu24.04/x86_64/ ./" \
| sudo tee /etc/apt/sources.list.d/mellanox_mlnx_ofed.list
- 설치
sudo apt update
sudo apt install mlnx-ofed-all
- 드라이버는 UEFI에서 Secure boot를 해제해야 올라온다.
- 펌웨어 버전 확인
$ sudo mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
$ sudo mst status
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
MST devices:
------------
/dev/mst/mt4119_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:3e:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00
/dev/mst/mt4119_pciconf1 - PCI configuration cycles access.
domain:bus:dev.fn=0000:ab:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00
$ sudo mlxfwmanager --query
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX5
Part Number: MCX556M-ECA_Ax_Bx
Description: ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
PSID: MT_0000000023
PCI Device Name: /dev/mst/mt4119_pciconf1
Base GUID: 0c42a10300173b28
Versions: Current Available
FW 16.35.4030 N/A
PXE 3.6.0902 N/A
UEFI 14.29.0015 N/A
Status: No matching image found
다른 카드는 펌웨어 버전이 낮아서 연결이 안되고 있다.
$ sudo mlxfwmanager --query
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX5
Part Number: MCX556M-ECA_Ax
Description: ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
PSID: MT_0000000023
PCI Device Name: /dev/mst/mt4119_pciconf1
Base GUID: ec0d9a0300cda646
Base MAC: ec0d9acda646
Versions: Current Available
FW 16.25.1020 N/A
PXE 3.5.0701 N/A
UEFI 14.18.0019 N/A
Status: No matching image found
펌웨어 업그레이드는 mlxup으로 하며, 아래의 주소에서 다운로드 받는다.
https://network.nvidia.com/support/firmware/mlxup-mft/
mlxup - Mellanox Update and Query Utility
Mellanox offers two firmware tools to update and query adapter firmware: mlxup & MFT
network.nvidia.com
wget https://www.mellanox.com/downloads/firmware/mlxup/4.30.0/SFX/linux_x64/mlxup
--2025-12-31 16:24:08-- https://www.mellanox.com/downloads/firmware/mlxup/4.30.0/SFX/linux_x64/mlxup
Resolving www.mellanox.com (www.mellanox.com)... 23.201.35.50, 23.201.35.48, 23.201.35.80
Connecting to www.mellanox.com (www.mellanox.com)|23.201.35.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89302864 (85M)
Saving to: ‘mlxup’
mlxup 100%[=============================================================================>] 85.17M 10.7MB/s in 9.1s
2025-12-31 16:24:19 (9.38 MB/s) - ‘mlxup’ saved [89302864/89302864]
먼저, 업데이트 가능한 펌웨어를 찾아본다
$ chmod +777 mlxup
$ sudo ./mlxup --query
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX5
Part Number: MCX556M-ECA_Ax
Description: ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
PSID: MT_0000000023
PCI Device Name: /dev/mst/mt4119_pciconf1
Base GUID: ec0d9a0300cda646
Base MAC: ec0d9acda646
Versions: Current Available
FW 16.25.1020 16.35.4030
PXE 3.5.0701 3.6.0902
UEFI 14.18.0019 14.29.0015
Status: Update required
업데이트 하라니까 하자.
$ sudo ./mlxup -online
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX5
Part Number: MCX556M-ECA_Ax
Description: ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
PSID: MT_0000000023
PCI Device Name: /dev/mst/mt4119_pciconf1
Base GUID: ec0d9a0300cda646
Base MAC: ec0d9acda646
Versions: Current Available
FW 16.25.1020 16.35.4030
PXE 3.5.0701 3.6.0902
UEFI 14.18.0019 14.29.0015
Status: Update required
Release notes for the available Firmware:
-----------------------------------------
For more details, please refer to the following FW release notes:
1- ConnectX3 (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf
2- ConnectX3Pro (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_42_5000-release_notes.pdf
3- Connect-IB (10.16.1200): http://www.mellanox.com/pdf/firmware/ConnectIB-FW-10_16_1200-release_notes.pdf
4- ConnectX4 (12.28.2006): http://docs.mellanox.com/display/ConnectX4Firmwarev12282006
5- ConnectX4Lx (14.32.1010): http://docs.mellanox.com/display/ConnectX4LxFirmwarev14321010
6- ConnectX5 (16.35.4030): http://docs.mellanox.com/display/ConnectX5Firmwarev16354030
7- ConnectX6 (20.43.1014): http://docs.mellanox.com/display/ConnectX6Firmwarev20431014
8- ConnectX6Dx (22.43.1014): http://docs.mellanox.com/display/ConnectX6DxFirmwarev22431014
9- ConnectX6Lx (26.43.1014): http://docs.mellanox.com/display/ConnectX6LxFirmwarev26431014
10- BlueField2 (24.43.1014): http://docs.mellanox.com/display/BlueField2Firmwarev24431014
11- ConnectX7 (28.43.1014): http://docs.mellanox.com/display/ConnectX7Firmwarev28431014
12- BlueField3 (32.43.1014): http://docs.mellanox.com/display/BlueField3Firmwarev32431014
---------
Found 1 device(s) requiring firmware update...
Perform FW update? [y/N]: y
Please wait while downloading MFA(s) 100%
Device #1: Updating FW ...
FSMST_INITIALIZE - OK
Writing Boot image component - OK
Done
Restart needed for updates to take effect.
Log File: /tmp/mlxup_workdir/mlxup-20251231_163505_238652.log'HPC' 카테고리의 다른 글
| Test No.1 (0) | 2026.01.02 |
|---|---|
| scratch 디렉토리의 생성 (0) | 2026.01.01 |
| Latency Test (0) | 2025.09.24 |
| Bandwidth Test (0) | 2025.09.24 |
| IPoIB (0) | 2025.09.24 |