본문 바로가기

HPC

Connect-X 5 설치

  • 설치 후 LINUX에서 확인
$ lspci |grep -i Mellanox
3e:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
3e:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
ab:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
ab:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]


버스 3e에 있는 ConnectX-5 1장, 버스 ab에 있는 ConnectX-5 1장 이 설치되어 있다는 정보이다.

.0, .1은 하나의 카드에 Ethernet과 Inifiniband를 의미한다.

 

  • 드라이버 설치

리눅스에서 설치하여야 할 것 : NVIDIA OFED(OpenFabrics Enterprise Distribution) for Linux package

MLNX_OFED_LINUX-<ver>-<OS label><CPU arch>.iso.

 

 

그런데, apt 설치도 지원하므로 이걸로 하자

 

- GPG 키

wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox \
 | sudo gpg --dearmor -o /usr/share/keyrings/mellanox-ofed.gpg

 

- 저장소 등록

echo "deb [signed-by=/usr/share/keyrings/mellanox-ofed.gpg] https://linux.mellanox.com/public/repo/mlnx_ofed/24.10-3.2.5.0/ubuntu24.04/x86_64/ ./" \
 | sudo tee /etc/apt/sources.list.d/mellanox_mlnx_ofed.list

 

- 설치

 

sudo apt update
sudo apt install mlnx-ofed-all

 

- 드라이버는 UEFI에서 Secure boot를 해제해야 올라온다.

 

  • 펌웨어 버전 확인
$ sudo mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success

$ sudo mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4119_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:3e:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4119_pciconf1         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:ab:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00

$ sudo mlxfwmanager --query
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX5
  Part Number:      MCX556M-ECA_Ax_Bx
  Description:      ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
  PSID:             MT_0000000023
  PCI Device Name:  /dev/mst/mt4119_pciconf1
  Base GUID:        0c42a10300173b28
  Versions:         Current        Available
     FW             16.35.4030     N/A
     PXE            3.6.0902       N/A
     UEFI           14.29.0015     N/A

  Status:           No matching image found

 

다른 카드는 펌웨어 버전이 낮아서 연결이 안되고 있다.

$ sudo mlxfwmanager --query
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX5
  Part Number:      MCX556M-ECA_Ax
  Description:      ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
  PSID:             MT_0000000023
  PCI Device Name:  /dev/mst/mt4119_pciconf1
  Base GUID:        ec0d9a0300cda646
  Base MAC:         ec0d9acda646
  Versions:         Current        Available
     FW             16.25.1020     N/A
     PXE            3.5.0701       N/A
     UEFI           14.18.0019     N/A

  Status:           No matching image found

 

펌웨어 업그레이드는 mlxup으로 하며, 아래의 주소에서 다운로드 받는다.

https://network.nvidia.com/support/firmware/mlxup-mft/

 

mlxup - Mellanox Update and Query Utility

Mellanox offers two firmware tools to update and query adapter firmware: mlxup & MFT

network.nvidia.com

wget https://www.mellanox.com/downloads/firmware/mlxup/4.30.0/SFX/linux_x64/mlxup
--2025-12-31 16:24:08--  https://www.mellanox.com/downloads/firmware/mlxup/4.30.0/SFX/linux_x64/mlxup
Resolving www.mellanox.com (www.mellanox.com)... 23.201.35.50, 23.201.35.48, 23.201.35.80
Connecting to www.mellanox.com (www.mellanox.com)|23.201.35.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89302864 (85M)
Saving to: ‘mlxup’

mlxup                                   100%[=============================================================================>]  85.17M  10.7MB/s    in 9.1s

2025-12-31 16:24:19 (9.38 MB/s) - ‘mlxup’ saved [89302864/89302864]

 

먼저, 업데이트 가능한 펌웨어를 찾아본다

$ chmod +777 mlxup

$ sudo ./mlxup --query
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX5
  Part Number:      MCX556M-ECA_Ax
  Description:      ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
  PSID:             MT_0000000023
  PCI Device Name:  /dev/mst/mt4119_pciconf1
  Base GUID:        ec0d9a0300cda646
  Base MAC:         ec0d9acda646
  Versions:         Current        Available
     FW             16.25.1020     16.35.4030
     PXE            3.5.0701       3.6.0902
     UEFI           14.18.0019     14.29.0015

  Status:           Update required

 

업데이트 하라니까 하자.

$ sudo ./mlxup -online
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX5
  Part Number:      MCX556M-ECA_Ax
  Description:      ConnectX-5 VPI adapter card with Socket Direct supporting dual-socket server; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; 2x PCIe3.0 x8; ROHS R6
  PSID:             MT_0000000023
  PCI Device Name:  /dev/mst/mt4119_pciconf1
  Base GUID:        ec0d9a0300cda646
  Base MAC:         ec0d9acda646
  Versions:         Current        Available
     FW             16.25.1020     16.35.4030
     PXE            3.5.0701       3.6.0902
     UEFI           14.18.0019     14.29.0015

  Status:           Update required

Release notes for the available Firmware:
-----------------------------------------

  For more details, please refer to the following FW release notes:
    1- ConnectX3 (2.42.5000):    http://www.mellanox.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf
    2- ConnectX3Pro (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_42_5000-release_notes.pdf
    3- Connect-IB (10.16.1200):  http://www.mellanox.com/pdf/firmware/ConnectIB-FW-10_16_1200-release_notes.pdf
    4- ConnectX4 (12.28.2006):   http://docs.mellanox.com/display/ConnectX4Firmwarev12282006
    5- ConnectX4Lx (14.32.1010): http://docs.mellanox.com/display/ConnectX4LxFirmwarev14321010
    6- ConnectX5 (16.35.4030):   http://docs.mellanox.com/display/ConnectX5Firmwarev16354030
    7- ConnectX6 (20.43.1014):   http://docs.mellanox.com/display/ConnectX6Firmwarev20431014
    8- ConnectX6Dx (22.43.1014):   http://docs.mellanox.com/display/ConnectX6DxFirmwarev22431014
    9- ConnectX6Lx (26.43.1014):   http://docs.mellanox.com/display/ConnectX6LxFirmwarev26431014
    10- BlueField2 (24.43.1014):   http://docs.mellanox.com/display/BlueField2Firmwarev24431014
    11- ConnectX7 (28.43.1014):   http://docs.mellanox.com/display/ConnectX7Firmwarev28431014
    12- BlueField3 (32.43.1014):   http://docs.mellanox.com/display/BlueField3Firmwarev32431014

---------
Found 1 device(s) requiring firmware update...

Perform FW update? [y/N]: y

Please wait while downloading MFA(s) 100%
Device #1: Updating FW ...
FSMST_INITIALIZE -   OK
Writing Boot image component -   OK
Done

Restart needed for updates to take effect.
Log File: /tmp/mlxup_workdir/mlxup-20251231_163505_238652.log

'HPC' 카테고리의 다른 글

Test No.1  (0) 2026.01.02
scratch 디렉토리의 생성  (0) 2026.01.01
Latency Test  (0) 2025.09.24
Bandwidth Test  (0) 2025.09.24
IPoIB  (0) 2025.09.24