The Wooki Cluster – The Woo Lab – University of Ottawa

About: Wooki is a high performance computing cluster containing >2400 cpu cores and GPUs that was ‘rebuilt’ in January 2021 using compute nodes from an older iteration of the cluster. Newer hardware has since been added to it. This page is a quick guide to using Wooki.

Location: Currently the cluster is available only within the uOttawa network at: wooki.chem.uolocal. You will need to use a SSH program such as MobaXterm or PuTTY to connect to it. For off-campus access you need to use the University’s VPN.

Interacting with the Head Node:

The head node is intended for managing jobs and files, and not for heavy computing since it manages the queue and many other services including the interactive sessions of all the users. It is, however, a very powerful machine, and for tasks that will take less than 15 minutes to complete and one or two CPUs, it is fine to run these interactively. Anything longer than that is subject to being killed without notice.

Any longer or “heavier” job must be submitted to the compute hosts via the queuing system (see below), which manages the available resources and assigns them to waiting jobs.

DATA STORAGE

Wooki has two distinct data storage areas /home and /share_scratch. Note that files one /home and /share_scratch are compressed as they are written so there is no need to compress your files.

- /home/username is a 51TB storage space. This is most user’s main storage space that you can run your jobs from and store critical data like codes or data you are currently processing. This file system is physically on the head node of Wooki, but it is network mounted on all compute nodes as well.
  - You should use this space for data that you are currently processing or absolutely critical data. If you store the results of your calculations on /home, please take care to exclude the large intermediate files, such as wave function files that you typically don’t need or can regenerate easily if needed.
- /share_scratch/username is a 341 TB storage space one can use for long term storage. This file system is network mounted onto every compute node at /share_scratch. If you have large amounts of data (say ~1 TB or more) you need to store but are not actively processing, you should move the data to /share_scratch.

- Each compute node also has a local drive mounted at: /local_scratch. If your compute jobs are disk IO heavy you should consider running your job on the /local_scratch. However, your script needs to copy the files back to /home or /share_scratch after it is complete.

BACKUPS

- /home is built on a redundant disk array, so several drives have to fail for data to be lost. Snapshots of the file system are daily which are kept for a week, weekly which are only kept for a month and monthly which are only retained for 3 months. Files that are erased and modified can be recovered from these snapshots. Additionally, a running backup copy of /home is kept on another server. This backup is incrementally updated once a week and is a complete backup incase of a catastrophic failure of the hardware.
- /share_scratch is run from a redundant disk array and several hard drives need to fail at the same time before data is lost. Snapshots of the file system are daily which are kept for a week, weekly which are only kept for a month and monthly which are only retained for 3 months. /share_scratch is not backed up on a separate server as is done with /home, so this means if there is a catastrophic hardware failure on /share_scratch the files will not be recoverable.

Software

Centos 8.2 Linux distribution is installed on all nodes, using the openHPC clustering software. The compute nodes are running Centos in diskless mode, such that Linux is loaded from the head node and stored in memory upon boot up.

We are using the linux module environment manager to load software. Use the following command to see what is available:

    module avail

For some of the packages, like VASP, scripts to submit to the queueing system have been written. They are typically called ‘packagename-submit’. i.e. for vasp the submit job is ‘vasp-submit’. Use vasp-submit -h to get details on how to use it and the options available.

Python

Python 3.6 is the default version that CentOS uses. It is invoked using ‘python3’, not ‘python’. The Anaconda Python environment have been installed with Python 2.7 and 3.8. These can be invoked with ‘module load python/{version}. i.e. ‘module load python/2.7’.

The Queuing System

We are using the ‘Slurm’ queueing system. This is the same queueing system that is used on the DRAC/Compute Canada systems.

Partitions available (slurm uses the term partitions instead of queues)

1. General – this is a general CPU partition. Most jobs should be submitted to this queue.
2. gpu – currently one node (gpu_1) has a GPU for machine learning. Use this queue to access it. (see below for instructions).

Useful commands (see individual help or man pages for more details):

- squeue – prints jobs in the queue
- scancel – to kill jobs.
- qstat– alias we wrote that prints out more useful information that the default squeue command. Use ‘-a’ option to see all jobs instead of just your own jobs.
- sinfo– prints out info on the queues/partitions.
- sbatch – submit a script to the queue. See below for more information about job submission.
- nodeinfo – print out detailed information about the nodes.
- wookistat – gives an overview of current cluster usage.
- cluster_stat – gives an different overview of the cluster usage.

Currently, the ‘General’ queue/partition has a maximum 14 day time limit for jobs.

It is important to realize that run times on jobs are enforced. This means that the jobs will be killed by the queuing system once they go past their estimated run time. Job times are being enforced, to allow for more efficient scheduling.

Please try to specify accurate job times as it ensures efficient queuing. Therefore it is a good habit to include job times when submitting jobs because the defaults for a given submission script may be too long or short. Additionally, the queue currently has a 14 day maximum run time, but if you don’t specify a run time the default is about 8 hours.

If you have a job that desperately needs to have its time extended please contact Woo.

Please try to specify accurate memory requirements for you job. Please don’t ask for 4 Gb of memory for your job if it only uses less than 1 Gb. The problem with specifying more memory than you need is that one can use up all the available memory on a node, which means the queue will not assign anymore jobs to the node even though there are many CPU cores available.

You can check the memory usage of a completed job, using the ‘seff’ command: seff <jobid>. We have written a script ‘pastjobseff‘ which will print the ‘seff’ info for all of your jobs that have completed in the last 30 days. Note the job has to be finished for seff to report anything other than 0.

Prewritten Submission Scripts:

Submission scripts for some packages have been written similar to those on old Wooki. When you module load a package, like vasp, you can use the corresponding submitscript, like ‘vasp-submit’ or ‘gaussian-submit’. They generally work such that the first argument is the jobname or input name, and the 2nd optional argument is the number of CPUs.

vasp-submit  myjob 4 -t 1-5:30 --memory 6G

The above will submit a VASP job called ‘myjob’ with 4 CPUs, request a minimum of 6G memory, and run for 1 day 5 hours and 30 minutes. Many of these scripts will copy your job to the node’s local disk in the directory: /local_scratch/{your username}/{slurm job#}/. When the job is complete, it will copy the contents of the above run directory back into the directory where you submitted the job. Note: If your job is cancelled or terminated by the queue because it ran out of time, the contents of the /local_scratch directory above will not be copied back. However, that directory will still be available on the node it ran on. (One can ssh to the node). The location and node of the job should be written to the beginning of the slurm-{job #}.out file.

Use the ‘-h’ or ‘–help’ option to see all the options available for a particular submit script.

IMPORTANT NOTE: Default time limits for the submit scripts are typically short. So you may have to change these. Please try to give realistic job times to make the queueing more efficient.

Custom Submission Scripts

You can also write your own submission scripts. Below is a sample shell script that can be submitted to the queue with the command ‘sbatch script_name’. If you job uses a lot of IO, think about using the ‘/local_scratch’ of the compute nodes. The queue keywords at the beginning that start with ”#SBATCH“ can also be configured as arguments in the sbatch call. i.e. sbatch –time =500 script.name. See the slurm manual for all options.

#!/bin/bash
#SBATCH --partition=General
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=3
#SBATCH --job-name=REPEAT
#SBATCH --time=120

# The above are keywords that the queueing system reads.

# This script will look in ALL directories in the current folder and
# run REPEAT in it.

# the locpot to cube script runs in python 2
module load python/2.7

REPEAT_EXE=/opt/ohpc/pub/repeat/bin/repeat.x
VASP2CUBE=/opt/ohpc/pub/repeat/bin/vasp_to_cube.py
REPEAT_INPUT=/opt/ohpc/pub/repeat/bin/REPEAT_param.inp
export OMP_NUM_THREADS=3

for file in `ls | shuf`
do
    # check if item is a directory
    if [ -d $file ] ; then

        # if the 'queued' file exists then skip
        if [ -f $file/queued ] ; then
            echo $file 'skipping because queued file found '
        else
            cd $file

            touch queued
            cp $REPEAT_INPUT REPEAT_param.inp
            $VASP2CUBE   #this converts the LOCPOT to cube file

            # Run the repeat code
            # important use the srun command when executing parallel tasks.
            srun $REPEAT_EXE> repeat.output
            echo "REPEAT completed"

            #clean up
            rm queued *.dat mof.cube REPEAT_param.inp

            cd ..
        fi
    fi
done

echo 'done '

Example for GPU jobs:

#!/bin/bash
#SBATCH --gres=gpu:1          # Number of GPUs (per node)
#SBATCH --partition=gpu
#SBATCH --time=0-03:00        # time (DD-HH:MM)
#SBATCH --nodes=1             # Number of nodes to run on
#SBATCH --cpus-per-task=1     # Number of cpus