University of Calgary
UofC Navigation

Helix QuickStart Guide

About this QuickStart Guide

This QuickStart guide gives a overview of the Helix cluster at the University of Calgary.

It is intended to be read by new account holders getting started on Helix, covering such topics as the Helix hardware and performance characteristics, available software, usage policies and how to log in and run jobs. 

For Helix-related questions not answered here, please write to support@hpc.ucalgary.ca .

Introduction

Helix is a computing cluster installed in March 2016 as a test environment as Phase 1 of an infrastructure project in support of advanced research computing for the Cumming School of Medicine.

Intended use

Helix is a test environment to explore how a mix of large-memory, general purpose and GPU compute nodes can serve the needs of researchers in the Cumming School of Medicine.  It does not provide the extra security measures required for storing and processing restricted (such as patient-identifiable) data. Although it may be used for "real" research projects, it should be kept in mind that it is a small-scale test environment that will not be able to accommodate the needs of all researchers.

Note that Helix runs the Linux operating system and most calculations on Helix will be run through non-interactive batch-mode job scripts.  We would be happy to help you get started if you are not familiar with this kind of environment and those terms are not familiar to you.

Accounts

If you have a project associated with research in the Cumming School of Medicine that you think would benefit from running computations on Helix, please write to support@hpc.ucalgary.ca .

Accounts on the Helix cluster use the same credentials (user name and password) as for University of Calgary computing accounts offered by Information Technologies for email and other services.

Hardware

Processors

Besides login and administrative servers, the Helix hardware consists of 8 general-purpose compute servers (each with 24 CPU cores at 2.5 GHz and 256 GB of RAM), two large-memory servers (each with 64 CPU cores at 2.2 GHz, 2 TB of RAM and 6 TB of local disk), a GPU-accelerated compute server (16 CPU cores at 2.4 GHz, 2 K40m GPUs and 256 GB of RAM) and a special higher clock rate server for serial applications or applications requiring a very fast disk access (16 CPU cores at 3.2 GHz, 256 GB of RAM, 5.8 TB fast SSD based local disk).

Interconnect

The compute nodes communicate via 10-gigabit/s Ethernet connections.

Storage

Initially there will be about 50 TB of usable disk space allocated for home directories. A default per-user quota (disk usage limit) is 500 GB, with an option to extent the storage space on individual basis.

Update 2017-05-19: On each of the two large-memory nodes and the GPU-enabled node there is 21.8 TB of special high-speed local disk, accessible as /local_scratch, that can be used for temporary storage.

Software

Look for installed software under /global/software and through the module avail command.  GNU and Intel compilers are available. The setup of the environment for using some of the installed software is through the module command. An overview of modules on WestGrid is largely applicable to Helix.

To list available modules, type:

module avail

So, for example, to load a module for Java use:

module load java/1.8.0

and to remove it use:

module remove java

To see currently loaded modules, type:

module list

By default, modules are installed on Helix to set up Intel compilers and to support parallel programming with MPI (including the determination of which compilers are used by the wrapper scripts mpicc, mpif90, etc.).

Write to support@hpc.ucalgary.ca if you need additional software installed.

Using Helix

To log in to Helix, connect to helix.hpc.ucalgary.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, the WestGrid QuickStart Guide for New Users may be helpful.

The Helix login node may be used for short interactive runs during development (under 15 minutes). Production runs and longer test runs should be submitted as batch jobs. Batch jobs are submitted through TORQUE and scheduled using Maui (similar to Moab used on WestGrid). Processors may also be reserved for interactive sessions, in a similar manner to batch jobs.

Most of the information on the Running Jobs page on the WestGrid web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on Helix.

There is a 7-day maximum walltime limit for non-interactive jobs on Helix.  There is a 24-hour limit for the GPU node. Interactive jobs are limited to 3 hours. 

Access to various types of the compute servers (nodes) on Helix is controlled through queues. 

There are four queues on Helix, mainq (the default queue), bigmem, fast, and gpu.

  • The mainq queue provides access to 8 general compute nodes with 24 CPU cores and 256 GB of RAM.
    Time limit is 7 days or 168 hours.
    There is no need to specify the queue in the jobs script, as it is the default queue.
  • The bigmem queue provides access the 2 big memory compute nodes with 64 cores and 2 TB of RAM each.
    The time limit is 24 hours.
    Each of the nodes also have 6 TB fast local storage mounted on /tmp.
    The queue is targeted to memory intensive jobs, but can be used for shorter general computations as well.
    Request with "#PBS -q bigmem" line in your job script.
  • The fast queue provides access to a single "fast" node with 16 higher clocked cores and fast local disk.
    The time limit is 24 hours.
    The node has 5.8 TB of very fast local storage mounted on /tmp.
    The queue is suitable for serial (single CPU) jobs and jobs requiring very fast disk access.
    Request with "#PBS -q fast" line in your job script.
  • The gpu queue provides access to a single node with 2 K40 NVidia GPUs and 16 CPU cores.
    The time limit is 24 hours.
    The node can only run one job at a time allocating both the GPUs to the same job.
    The queue is suitable for GPU enabled jobs.
    Request with "#PBS -q gpu" line in your job script.
    Note that one must not request any gpus on the "-l" resource line.

Support

Send Helix-related questions to support@hpc.ucalgary.ca.


Updates:

2016-04-01 - Page created.
2017-05-19 - Added note regarding fast local disk on the GPU and high-memory nodes.

 

 


Please send corrections or suggestions about the hpc.ucalgary.ca site to support@hpc.ucalgary.ca.