본문 바로가기

HPC

Clustering

Below is an outline of how to set up a “mini cluster” of two AMD Threadripper PRO workstations (a 5995WX and a 7995WX) to run LAMMPS under MPI across both machines. Even though you only have two nodes, it’s valuable to follow the same best practices used for larger HPC clusters to ensure stable performance.


1. Network Connectivity

  1. High-bandwidth, low-latency interconnect
    • For CPU-bound parallel applications (like LAMMPS), interconnect speed and latency matter.
    • Ideally, use Infiniband (e.g., 100 Gb/s or more) for best performance. If Infiniband is not feasible, use 10 GbE (or faster) Ethernet.
    • Ensure both machines are on the same fast switch or connected directly (for a two-node cluster, a direct link can work).
  2. Network configuration
    • Assign static IPs or ensure reliable DHCP so that hostnames/addresses are consistent.
    • Enable jumbo frames if supported (and beneficial for your specific interconnect).
    • Test with ping, ibping (if using Infiniband), or iperf3 to confirm throughput and latency.

2. Software Environment

  1. Matching MPI stacks
    • Install the same version of MPI (OpenMPI, MPICH, or MVAPICH2) on both machines.
    • Consistency is key: same MPI version, same build flags, etc.
  2. Matching LAMMPS build
    • LAMMPS must be built with MPI support on both workstations.
    • Use the same LAMMPS version (Spack can make this easier):
      # Example with Spack + AMD AOCC toolchain
      spack install lammps %aocc +mpi ...
      
    • Alternatively, build LAMMPS from source in exactly the same manner on each node to avoid version mismatches.
  3. Shared libraries / environment modules
    • If using environment modules (e.g., lmod), ensure the same modules are loaded.
    • If you use Spack, ensure each node has the same Spack environment or that you have a shared Spack installation on a network filesystem.

3. Passwordless SSH and Host Configuration

  1. Passwordless SSH
    • To allow MPI to spawn processes on both nodes seamlessly:
      ssh-keygen -t ed25519
      ssh-copy-id -i ~/.ssh/id_ed25519.pub your_user@other_node
      
    • Repeat in both directions if needed.
  2. Host file (if you do not use a scheduler)
    • Create a simple hosts.txt file listing hostnames (or IPs) and slots (CPU counts). For example:
      # hosts.txt
      node5995 slots=64
      node7995 slots=96
      
    • Adjust slots= to however many MPI processes you intend to run per node (often set to the number of CPU cores for CPU-only runs).

4. Optional: Using a Job Scheduler

For just two nodes, you can simply rely on mpirun/mpiexec with a host file. However, if you’d like job-management features:

  1. Slurm
    • Slurm is common in HPC clusters. It can manage resources, monitor usage, and schedule jobs.
    • Installing Slurm on two nodes is not overly difficult, but might be more overhead than you need.
    • If you do, set one node as the controller and both as compute nodes.
  2. OpenPBS / Torque / other
    • Similar HPC schedulers exist, but Slurm tends to be the most widely used.

5. Running LAMMPS

  1. Simple MPI run
    • If you go the “no scheduler” route, you can directly invoke something like:
      mpirun -np 160 -hostfile hosts.txt lmp_executable -in in.lammps
      
    • Here -np 160 represents the total number of MPI ranks (64 + 96) across both nodes.
    • If you’re using OpenMPI and have it installed at the same path on both nodes, this should work as-is.
    • Ensure the in.lammps input file is accessible on both nodes (e.g., via a shared filesystem like NFS, or simply copied over).
  2. Performance considerations
    • Pin MPI ranks to cores: many MPI versions offer binding options:
      mpirun --bind-to core --map-by socket ...
      
    • For large systems, you may also consider multi-threading (OpenMP) combined with MPI ranks (hybrid MPI+OpenMP builds in LAMMPS).
  3. Check environment
    • If using environment modules or Spack, load the same environment on both nodes before running.

6. Storage / File Sharing

  1. Shared filesystem (optional)
    • You can set up NFS or another network file system if you want to keep input and output files in sync.
    • For two-node setups, an NFS server on one node and NFS client on the other is usually enough.
  2. Local SSD
    • If you do not need a shared file system, you can copy your input files to both nodes and run from each local drive to avoid network overhead. Then gather result files afterward.

7. Performance Tips for AMD Threadripper PRO

  1. BIOS settings
    • For HPC workloads, ensure NUMA is configured appropriately. Often leaving it in NUMA mode (not “Uniform Memory Access” or “UMA”) is best for LAMMPS. Test to see if your system sees better performance in a particular memory mode.
    • Make sure you have the latest BIOS/firmware for each board.
  2. Compiler optimizations
    • If you’re using the AMD AOCC compiler, leverage appropriate flags (e.g., -Ofast, vectorization flags, etc.).
    • If you built via Spack, check that relevant CPU optimization flags are enabled (like -march=znver3 or similar for Zen 3/Zen 4–based Threadrippers).
  3. Affinity
    • Manage CPU affinity carefully for HPC. Tools like numactl can help.
    • LAMMPS has built-in ways to manage thread affinity if you’re using hybrid MPI+OpenMP.
  4. Monitor usage
    • Tools like amd-uprof or Linux perf can help you identify bottlenecks or suboptimal CPU usage.

Summary

  • Network: Use the fastest (lowest-latency) possible—Infiniband if you can, else 10/25/100 GbE.
  • Identical Software Stacks: Same versions of MPI and LAMMPS, built in the same way on both nodes.
  • Passwordless SSH: So mpirun/mpiexec can launch processes on both machines.
  • Host File or Scheduler: Decide whether to use a simple host file or a resource manager like Slurm.
  • Run: Use mpirun -np <cores> -hostfile ... lmp_executable -in in.lammps.
  • Optimization: Tweak BIOS (NUMA, memory modes) and compiler flags, and pin processes for best performance.

With only two nodes, you do not need a complex cluster management setup—just a good MPI installation, passwordless SSH, a host file, and a high-speed interconnect. This setup can be surprisingly effective for medium-scale HPC workloads like LAMMPS.

'HPC' 카테고리의 다른 글

NUMA Pinning in IntelOneAPI  (0) 2025.02.05
NUMA Check  (1) 2025.02.05
valgrind를 이용한 mpirun의 memory leak 분석  (0) 2025.01.21
Mpirun with OpenMP  (0) 2025.01.21
AMD Spack - AOCC AOCL  (0) 2025.01.16