Cluster Specifications #

Overview #

Compute nodes
500
Physical cores
18372
GPUs
235 GPUs on 61 GPU nodes (88/22 GPUs/nodes are communal and 147/39 GPUs/nodes are prioritized for GPU contributors)
RAM
48-1512 GiB/node
Local scratch
0.1-1.8 TiB/node
Global scratch
703 TiB
User home storage
500 GiB/user (770 TiB in total)
Group storage
6.5 PB
Number of accounts
1394 of which 325 are approved for PHI
Number of projects
686

Summary of Compute Environment #

Feature Login Nodes Transfer Nodes Development Nodes Compute Nodes
Hostname log[1-2].wynton.ucsf.edu, plog1.wynton.ucsf.edu dt[1-2].wynton.ucsf.edu, pdt[1-2].wynton.ucsf.edu dev[1-3], gpudev1, pdev1, pgpudev1
Accessible via SSH from outside of cluster ✓ (2FA if outside of UCSF) ✓ (2FA if outside of UCSF) no no
Accessible via SSH from within cluster no
Outbound access Within UCSF only: SSH and SFTP HTTP/HTTPS, FTP/FTPS, SSH, SFTP, Globus Via proxy: HTTP/HTTPS, GIT+SSH(*) no
Network speed 10 Gbps 10 Gbps 10 Gbps 1,10,40 Gbps
Core software Minimal Minimal Same as compute nodes + compilers and source-code packages Rocky 8 packages
modules (software stacks) no no
Global file system
Job submission no
CPU quota per user(**) 1 core 2 cores 4 cores per job request
Memory limit per user(**) 32 GiB 96 GiB 96 GiB per job request
Purpose Submit and query jobs. SSH to development nodes. File management. Fast in- & outbound file transfers. File management. Compile and install software. Prototype and test job scripts. Submit and query jobs. Version control (clone, pull, push). File management. Running short and long-running job scripts.

(*) GIT+SSH access on development nodes is restricted to git.bioconductor.org, bitbucket.org, gitea.com, github.com / gist.github.com, gitlab.com, cci.lbl.gov, and git.ucsf.edu.

(**) CPU is throttled and memory is limited by Linux Control Groups (CGroups). If a process overuses the memory, it will be killed by the operating system.

All nodes on the cluster run Rocky 8 which is updated on a regular basis. The job scheduler is SGE 8.1.9 (Son of Grid Engine) which provides queues for both communal and lab-priority tasks.

Details #

Login Nodes #

The cluster can be accessed via SSH to one of the login nodes:

  1. log1.wynton.ucsf.edu
  2. log2.wynton.ucsf.edu
  3. plog1.wynton.ucsf.edu (for PHI users)

Data Transfer Nodes #

For transferring large data files, it is recommended to use one of the dedicate data transfer nodes:

  1. dt1.wynton.ucsf.edu
  2. dt2.wynton.ucsf.edu
  3. pdt1.wynton.ucsf.edu (for PHI users)
  4. pdt2.wynton.ucsf.edu (for PHI users)

which have a 10 Gbps connection - providing a file transfer speed of up to (theoretical) 1.25 GB/s = 4.5 TB/h. As for the login nodes, the transfer nodes can be accessed via SSH.

Comment: You can also transfer data via the login nodes, but since those only have 1 Gbps connections, you will see much lower transfer rates.

Development Nodes #

The cluster has development nodes for the purpose of validating scripts, prototyping pipelines, compiling software, and more. Development nodes can be accessed from the login nodes.

Node Physical Cores RAM Local /scratch CPU x86-64 level CPU GPU
dev1.wynton.ucsf.edu 72 384 GiB 0.93 TiB x86-64-v4 Intel Gold 6240 2.60GHz  
dev2.wynton.ucsf.edu 48 512 GiB 0.73 TiB x86-64-v3 Intel Xeon E5-2680 v3 2.50GHz  
dev3.wynton.ucsf.edu 48 256 GiB 0.73 TiB x86-64-v3 Intel Xeon E5-2680 v3 2.50GHz  
gpudev1.wynton.ucsf.edu 56 256 GiB 0.82 TiB x86-64-v3 Intel Xeon E5-2660 v4 2.00GHz NVIDIA GeForce GTX 1080
pdev1.wynton.ucsf.edu (for PHI users) 32 256 GiB 1.1 TiB x86-64-v3 Intel E5-2640 v3  
pgpudev1.wynton.ucsf.edu (for PHI users) 56 256 GiB 0.82 TiB x86-64-v3 Intel Xeon E5-2660 v4 2.00GHz NVIDIA GeForce GTX 1080

Comment: Please use the GPU development node only if you need to build or prototype GPU software. The CPU x86-64 level is the x86-64 microarchitecture levels supported by the nodes CPU.

Compute Nodes #

The majority of the compute nodes have Intel processors, while a few have AMD processors. Each compute node has a local /scratch drive (see above for size), which is either a hard disk drive (HDD), a solid state drive (SSD), or even a Non-Volatile Memory Express (NVMe) drive. Each node has a tiny /tmp drive (4-8 GiB).

The compute nodes can only be utilized by submitting jobs via the scheduler - it is not possible to explicitly log in to compute nodes.

File System #

Scratch Storage #

The Wynton HPC cluster provides two types of scratch storage:

There are no per-user quotas in these scratch spaces. Files not added or modified during the last two weeks will be automatically deleted on a nightly basis. Note, files with old timestamps that were “added” to the scratch place during this period will not be deleted, which covers the use case where files with old timestamps are extracted from a tar.gz file. (Details: tmpwatch --ctime --dirmtime --all --force is used for the cleanup.)

User and Lab Storage #

Each user may use up to 500 GiB disk space in the home directory. It is not possible to expand user’s home directory. Research groups can add additional storage space under /wynton/group/, /wynton/protected/group/, and /wynton/protected/projects/ by purchasing additional storage.

Network #

The majority of the compute nodes are connected to the local network with 1 Gbps and 10 Gbps network cards while a few got 40 Gbps cards.

The cluster itself connects to NSF’s Pacific Research Platform at a speed of 100 Gbps - providing a file transfer speed of up to (theoretical) 12.5 GB/s = 45 TB/h.