본문 바로가기

OpenMPI

OpenMPI 설치 with CUDA Support

OpenMPI의 다운로드

https://www.open-mpi.org/software/ompi/v5.0/

 

Open MPI: Version 5.0

Changes in this release: See this page if you are upgrading from a prior major release series of Open MPI. It shows the Big Changes for which end users need to be aware. See the release notes for a more fine-grained listing of changes between each release

www.open-mpi.org

wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.6.tar.gz

 

CUDA Support를 위해서는, 11.2.6.1 절을 참조

https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html

 

Cluster구축이 아니므로, 2. Via internal Open MPI CUDA support를 따라서 진행

build directory에서 make를 이용한다면, 상위 디렉토리의 configure를 설정

../configure --with-cuda=<path-to-cuda> --with-cuda-libdir=<path-to-cuda library>

 

path-to-cuda는 cuda 설치 위치

/usr/local$ ls -l
total 36
drwxr-xr-x  2 root root 4096 12월 30 23:37 bin
lrwxrwxrwx  1 root root   21 12월 31 02:04 cuda -> /usr/local/cuda-12.6/
drwxr-xr-x 17 root root 4096 12월 31 02:05 cuda-12.6
drwxr-xr-x  2 root root 4096 12월 30 22:51 etc
drwxr-xr-x  2 root root 4096  8월  8  2023 games
drwxr-xr-x  7 root root 4096 12월 30 22:51 include
drwxr-xr-x 10 root root 4096 12월 30 22:51 lib
lrwxrwxrwx  1 root root    9  8월 22  2023 man -> share/man
drwxr-xr-x  2 root root 4096 12월 30 22:51 sbin
drwxr-xr-x 14 root root 4096 12월 26 23:51 share
drwxr-xr-x  2 root root 4096  8월  8  2023 src

 

일반적으로 cuda는 /usr/local/cuda에 위치함. 리눅스 기본 참조위치. 여기서 cuda는 심볼릭 링크로 /usr/local/cuda-12.6으로 리다이렉션

 

path-to-cuda library는 libcuda.so가 위치한 곳의 path를 지정함. libcuda.so가 어디있는지 찾아보면, 현재 위치 기준으로 하위 디렉토리를 뒤진다.

/usr/local$ find . -name "libcuda.so"
./cuda-12.6/targets/x86_64-linux/lib/stubs/libcuda.so

 

path를 확인했으니, configure를 구성

../configure --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/local/cuda-12.6/targets/x86_64-linux/lib/stubs/

 

build 디렉토리로 가서 configure전에 기존 make 정리, build 디렉토리 청소

/openmpi-5.0.6/build$ sudo make clean
/openmpi-5.0.6/build$ sudo make uninstall
/openmpi-5.0.6/build$ rm -rf *

 

configure 실행

/openmpi-5.0.6/build$ ../configure --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/local/cuda-12.6/targets/x86_64-linux/lib/stubs/

 

OpenMPI는 cmake를 지원하지 않고 make만 지원하므로 make 실행

make -j 4
sudo make install

 

추가사항 : configure 및 make과정 log기록을 위한 tee

shell$ tar xf openmpi-<version>.tar.bz2
shell$ cd openmpi-<version>
shell$ ./configure --prefix=<path> [...options...] 2>&1 | tee config.out
<... lots of output ...>

# Use an integer value of N for parallel builds
shell$ make [-j N] all 2>&1 | tee make.out

# ...lots of output...

# Depending on the <prefix> chosen above, you may need root access
# for the following:
shell$ make install 2>&1 | tee install.out

 

제대로 설치됬는지 확인하는 방법

# Use ompi_info to verify cuda support in Open MPI
shell$ ompi_info | grep "MPI extensions"
       MPI extensions: affinity, cuda, pcollreq
shell$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
       mca:mpi:base:param:mpi_built_with_cuda_support:value:true

 

몇가지 오류 제거

:~$ ompi_info | grep "MPI extensions"
ompi_info: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory
:~$ ompi_info
ompi_info: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory
:~$ mc

:/usr/local$ find . -name "libmpi.so.40" 2>/dev/null
./lib/libmpi.so.40
:/usr/local$ ls
bin  cuda  cuda-12.6  etc  games  include  lib  man  sbin  share  src
:/usr/local$ echo 'export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
:/usr/local$ source ~/.bashrc
:/usr/local$ ompi_info | grep "MPI extensions"
          MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat
:/usr/local$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
mca:mpi:base:param:mpi_built_with_cuda_support:value:true

'OpenMPI' 카테고리의 다른 글

OpenMPI with AOCC  (0) 2025.01.23
Another  (1) 2025.01.09
TEST  (0) 2025.01.09
Performance Check  (0) 2025.01.09