Cluster Specifications #

Overview #

Compute nodes: 506
Physical cores: 18604
GPUs: 235 GPUs on 61 GPU nodes (88/22 GPUs/nodes are communal and 147/39 GPUs/nodes are prioritized for GPU contributors)
RAM: 48-1512 GiB/node
Local scratch: 0.1-1.8 TiB/node
Global scratch: 703 TiB
User home storage: 500 GiB/user (770 TiB in total)
Group storage: 6.5 PB
Number of accounts: 1264 of which 348 are Wynton Protected accounts
Number of projects: 746

Summary of Compute Environment #

Feature	Login Nodes	Transfer Nodes	Development Nodes	Compute Nodes
Hostname	`log[1-2].wynton.ucsf.edu`, `plog1.wynton.ucsf.edu`	`dt[1-2].wynton.ucsf.edu`, `pdt[1-2].wynton.ucsf.edu`	`dev[1-3]`, `gpudev1`, `pdev1`, `pgpudev1`	…
Accessible via SSH from outside of cluster	✓ (2FA if outside of UCSF)	✓ (2FA if outside of UCSF)	no	no
Accessible via SSH from within cluster	✓	✓	✓	no
Outbound access	Within UCSF only: SSH and SFTP	HTTP/HTTPS, FTP/FTPS, SSH, SFTP, Globus	Via proxy: HTTP/HTTPS, GIT+SSH(*)	no
Network speed	10 Gbps	10 Gbps	10 Gbps	1,10,40 Gbps
Core software	Minimal	Minimal	Same as compute nodes + compilers and source-code packages	Rocky 8 packages
modules (software stacks)	no	no	✓	✓
Global file system	✓	✓	✓	✓
Job submission	✓	no	✓	✓
CPU quota per user(**)	100% (“1 core”)	200% (“2 cores”)	not limited	not limited
Memory limit per user(**)	32 GiB	96 GiB	48 GiB	per job request
Purpose	Submit and query jobs. SSH to development nodes. File management.	Fast in- & outbound file transfers. File management.	Compile and install software. Prototype and test job scripts. Submit and query jobs. Version control (clone, pull, push). File management.	Running short and long-running job scripts.

(*) GIT+SSH access on development nodes is restricted to git.bioconductor.org, bitbucket.org, gitea.com, github.com / gist.github.com, gitlab.com, cci.lbl.gov, and git.ucsf.edu.

(**) CPU is throttled and memory is limited by Linux Control Groups (CGroups). If a process overuses the memory, it will be killed by the operating system.

All nodes on the cluster run Rocky Linux 8.10 which is updated on a regular basis. The job scheduler is SGE 8.1.9 (Son of Grid Engine) which provides queues for both communal and lab-priority tasks.

Details #

The cluster can be accessed via SSH to one of the login nodes:

log1.wynton.ucsf.edu
log2.wynton.ucsf.edu
plog1.wynton.ucsf.edu (for Wynton Protected users)

Data Transfer Nodes #

For transferring large data files, it is recommended to use one of the dedicate data transfer nodes:

dt1.wynton.ucsf.edu
dt2.wynton.ucsf.edu
pdt1.wynton.ucsf.edu (for Wynton Protected users)
pdt2.wynton.ucsf.edu (for Wynton Protected users)

which have a 10 Gbps connection - providing a file transfer speed of up to (theoretical) 1.25 GB/s = 4.5 TB/h. As for the login nodes, the transfer nodes can be accessed via SSH.

Comment: You can also transfer data via the login nodes, but since those only have 1 Gbps connections, you will see much lower transfer rates.

Development Nodes #

The cluster has development nodes for the purpose of validating scripts, prototyping pipelines, compiling software, and more. Development nodes can be accessed from the login nodes.

Node	Physical Cores	RAM	Local `/scratch`	CPU x86-64 level	CPU	GPU
`dev1.wynton.ucsf.edu`	72	384 GiB	0.93 TiB	x86-64-v4	Intel Gold 6240 2.60GHz
`dev2.wynton.ucsf.edu`	48	512 GiB	0.73 TiB	x86-64-v3	Intel Xeon E5-2680 v3 2.50GHz
`dev3.wynton.ucsf.edu`	48	256 GiB	0.73 TiB	x86-64-v3	Intel Xeon E5-2680 v3 2.50GHz
`gpudev1.wynton.ucsf.edu`	56	256 GiB	0.82 TiB	x86-64-v3	Intel Xeon E5-2660 v4 2.00GHz	NVIDIA GeForce GTX 1080
`pdev1.wynton.ucsf.edu` (for Wynton Protected users)	32	256 GiB	1.1 TiB	x86-64-v3	Intel E5-2640 v3
`pgpudev1.wynton.ucsf.edu` (for Wynton Protected users)	56	256 GiB	0.82 TiB	x86-64-v3	Intel Xeon E5-2660 v4 2.00GHz	NVIDIA GeForce GTX 1080

Comment: Please use the GPU development node only if you need to build or prototype GPU software. The CPU x86-64 level is the x86-64 microarchitecture levels supported by the nodes CPU.

Compute Nodes #

The majority of the compute nodes have Intel processors, while a few have AMD processors. Each compute node has a local /scratch drive (see above for size), which is either a hard disk drive (HDD), a solid state drive (SSD), or even a Non-Volatile Memory Express (NVMe) drive. Each node has a tiny /tmp drive (4-8 GiB).

The compute nodes can only be utilized by submitting jobs via the scheduler - it is not possible to explicitly log in to compute nodes.

File System #

Scratch Storage #

The Wynton HPC cluster provides two types of scratch storage:

Local /scratch/ - 0.1-1.8 TiB/node storage unique to each compute node (can only be accessed from the specific compute node).
Global /wynton/scratch/ and /wynton/protected/scratch/ (for Wynton Protected users) - 703 TiB storage (BeeGFS) accessible from everywhere.

There are no per-user quotas in these scratch spaces. Files not added or modified during the last two weeks will be automatically deleted on a nightly basis. Note, files with old timestamps that were “added” to the scratch place during this period will not be deleted, which covers the use case where files with old timestamps are extracted from a tar.gz file. (Details: tmpwatch --ctime --dirmtime --all --force is used for the cleanup.)

User and Lab Storage #

/wynton/home/ and /wynton/protected/home/ (for Wynton Protected users): 770 TiB storage space
/wynton/group/, /wynton/protected/group/ (for Wynton Protected users), and /wynton/protected/projects/ (for Wynton Protected users): 6500 TB (= 6.5 PB) storage space

Each user may use up to 500 GiB disk space in the home directory. It is not possible to expand user’s home directory. Research groups can add additional storage space under /wynton/group/, /wynton/protected/group/, and /wynton/protected/projects/ by purchasing additional storage.

While waiting to receive purchased storage, users may use the global scratch space, which is “unlimited” in size with the important limitation that files older than two weeks will be deleted automatically.

Importantly, note that the Wynton HPC storage is not backed up. Users and labs are responsible for backing up their own data outside of Wynton HPC.

Network #

The majority of the compute nodes are connected to the local network with 1 Gbps and 10 Gbps network cards while a few got 40 Gbps cards.

The cluster itself connects to NSF’s Pacific Research Platform at a speed of 100 Gbps - providing a file transfer speed of up to (theoretical) 12.5 GB/s = 45 TB/h.