Build a Slurm Desktop Cluster

Desktop Abaqus Slurm Cluster

This is a guide to create a Slurm Cluster using NUC computers to allow for development and testing of Abaqus simulations with a desktop form factor.

Table of Contents

Hardware Overview

The hardware stack we are using are:

  • Head Node - GMKtec Mini PC, Intel N100 (3.4GHz) 16GB 1TB M2 View on Amazon
  • Compute Nodes - 3 Geekom XT12 Pro Mini PCs 12th Gen Intel i9-12900H 14 cores - 32GB. View on Amazon
  • D-Link 5-Port 2.5GB Unmanaged Gaming Switch View on Amazon
  • GeeekPi 8U Server Rack - View on Amazon
  • Here's the build:

image

Software Overview

The configuration is:

image

  • Operating system - Ubuntu 24.04
  • Slurm - Version
  • Abaqus - Verion 2023HF3
  • OpenMpi -
  • Fortran -

Install Steps

Install Ubuntu

The link below has the steps to create a bootable usb drive to install Ubuntu 24.04 on Windows, Linux and MacOS. We are using the desktop version.

Create a Ubuntu bootable USB stick

Configure Networking

On the Head node and all the Compute nodes execute the following commands:

Install openssh:

sudo apt install openssh-server

Verify its running:

sudo systemctl status ssh

If its not running start it:

sudo systemctl start ssh

Set it to autostart:

sudo systemctl enable ssh

This is optional but I prefer to install xrdp on the nodes for a graphical gnome desktop.
Same steps as for ssh:

sudo apt install xrdp
sudo systemctl status xrdp
sudo systemctl status xrdp
sudo systemctl enable xrdp

Configure the ethernet interface for a private network:

I chose 192.168.110. as my network, the details to plug into Ubuntu manual settings for the interface:
Address: 192.168.110.1 to 4
Netmask: 255.255.255.0
Gateway: 192.168.110.255

On each node add to the /etc/hosts file with something like this:

192.168.110.1 headnode
192.168.110.2 node1
192.168.110.3 node2
192.168.110.4 node13

At this point you should be able to ssh to and from each node.

Configure Munge

Steps:

Install Munge packages:

sudo apt install munge libmunge2 libmunge-dev

Verify the Munge installation, this should return "Success":

munge -n | unmunge | grep STATUS

If verification wasn't successful then verify the munge key was created:

ls -la /etc/munge/munge.key

If the key doesn't exist create it:

sudo /usr/sbin/mungekey

Set permssions for the munge user:

sudo chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
sudo chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge/
sudo chmod 0755 /run/munge/
sudo chmod 0700 /etc/munge/munge.key
sudo chown -R munge: /etc/munge/munge.key

Restart the munge service:

systemctl enable munge
systemctl restart munge

Check the munge server status:

systemctl status munge

Once munge is installed on all node on the worker nodes copy the munge key from the head node.

sudo scp /etc/munge/munge.key worker node

Restart the munge service one the keys are copied:

systemctl enable munge
systemctl restart munge

Create NFS Shares

On the head node which will share the file systems. I created two file systems:

  • /app - shared software
  • /jobs - job output
sudo mkdir /apps
sudo mkdir /jobs
sudo chmod a+rwx /apps
sudo chmod a+rwx /jobs

Install the NFS server packages:

sudo apt install nfs-kernel-server
sudo systemctl start nfs-kernel-server.service
sudo systemctl enable nfs-kernel-server.service

Update the /etc/exports file with the shares:

/jobs  192.168.110.0/24(rw,sync,no_root_squash) 
/apps  192.168.110.0/24(rw,sync,no_root_squash)

Export the file systems:

sudo exportfs -a

On the Compute nodes add and mount the NFS shares from the Head Node.

Install the NFS client software:

apt install nfs-common

Create the directories:

  • /app - shared software
  • /jobs - job output
sudo mkdir /apps
sudo mkdir /jobs
sudo chmod a+rwx /app
sudo chmod a+rwx /jobs

Update the /etc/fstab file for the new nfs mounts:

sudo cp /etc/fstab /etc/fstab.bkp
sudo vi /etc/fstab

Add the lines:

mhpcheadnode:/jobs /jobs nfs
mhpcheadnode:/apps /apps nfs

Reload the daemon with the new fstab:

systemctl daemon-reload

Mount the file systems:

sudo mount /apps
sudo mount /jobs

Install Slurm

On the head node and all compute nodes install slurm:

sudo apt install slurm-wlm

Configure Slurm:

The slurm configuration tool is located at:

/usr/share/doc/slurmctld/slurm-wlm-configurator.html

Open the file with a web browser and fill out the following:

  • ClusterName: <CLUSTER-NAME>
  • SlurmctldHost: <HEAD-NODE-NAME>
  • NodeName: <COMPUTE-NODE-NAME>[1-N]
  • Enter values for CPUs, Sockets, CoresPerSocket, and ThreadsPerCore according to $ lscpu.
  • ProctrackType: LinuxProc

Once completed copy the text from the browser and update the file below on all nodes:

/etc/slurm/slurm.conf

Restart slurm on all nodes:

systemctl enable slurmctld
systemctl restart slurmctld

Install Slurm Rest API:

The Slurm Rest API provides an endpoint to allow for submitting jobs, listing running jobs and other Slurm commands.

Installing the API server consists of the following components:

  • Database - I installed mysql.
  • slurmdbd - Slurm Database Daemon.
  • slurmrestd - Interface to Slurm via REST API.

Configure slurm JWT support:

Slurm JWT Documentaion

dd if=/dev/random of=/var/spool/slurm/statesave/jwt_hs256.key bs=32 count=1
chown slurm:slurm /var/spool/slurm/statesave/jwt_hs256.key
chmod 0600 /var/spool/slurm/statesave/jwt_hs256.key
chown slurm:slurm /var/spool/slurm/statesave
chmod 0755 /var/spool/slurm/statesave

Update both the slurm.conf and slurmdbd.conf and add:

AuthAltTypes=auth/jwt
AuthAltParameters=jwt_key=/var/spool/slurm/statesave/jwt_hs256.key

To create a user token:

scontrol token username=$USER lifespan=$LIFESPAN

Install and configure mysql:

sudo apt install mysql-server

As root:

mysql
grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'some_pass' with grant option;

Install and configure slurmdbd:

sudo apt install slurmdbd

Create the /etc/slurm/slurmdbd.conf file, this is a copy of the one I used:

cat /etc/slurm/slurmdbd.conf 
AuthType=auth/munge
AuthInfo=/var/run/munge/munge.socket.2
AuthAltTypes=auth/jwt
AuthAltParameters=jwt_key=/var/spool/slurm/statesave/jwt_hs256.key
DbdHost=localhost
DebugLevel=debug5
StorageHost=localhost
StorageLoc=slurm_acct_db
StoragePass=SLURM-USER-MYSQL-PASSWORD
StorageType=accounting_storage/mysql
StoragePort=3306
StorageUser=slurm
LogFile=/opt/slurm/log/slurmdbd.log
PidFile=/tmp/slurmdbd.pid
SlurmUser=slurm

Update the /etc/slurm/slurm.conf file and add:

AccountingStorageHost=localhost

Note: Do not enter the Slurm mysql password"

#AccountingStoragePass=   
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm

Start the Slurm database daemon:

systemctl start slurmdbd

Install the Slurm rest API:

sudo apt install slurmrest

The Rest daemon has to run as a non-privedged user so a user has to be created:

sudo adduser slurmrest

Edit the slurmrest service and add the slurmrest user:

sudo vi /usr/lib/systemd/system/slurmrestd.service

Update the following:

# an unpriviledged user to run as.
User=slurmrest
Group=slurmrest
Environment="SLURM_JWT=daemon"

Start the Rest service:

sudo systemctl start slurmrestd

Restart the Slurm controller service:

sudo systemctl daemon-reload
sudo systemctl restart slurmctld

Copy the /etc/slurm/slurm.conf to all of the compute nodes and restart slurmd


You'll only receive email when they publish something new.

More from Ken G
All posts