2025-01-15: Temporary Hold on Storage and Compute Requests

As Wynton continues to expand, we are approaching the limits of space, cooling, and power capacity in the Byers Hall server room. In addition, as we work to move the administration of the cluster to the Academic Research Systems (ARS) Team, we are currently reprioritizing our workload with the support of Wynton faculty leadership to ensure that we continue to meet our most critical objectives. We understand that this may cause some delays and appreciate your patience and understanding during this period.

See also the Wynton-announcement email titled ‘Important Updates on Wynton Storage and Compute Requests’ sent to all users on 2024-11-25.

Contributions to the Wynton HPC environment are non-expiring, e.g. contribute once and keep it for life!

Contributing Member Shares #

Compute Shares #

Currently, the Wynton HPC cluster has in total member.q_total = 7023 slots available on the member.q queue. Jobs on the member.q queue will launch and finish sooner than jobs on the communal, lower-priority long.q queue. A member.q job will have higher-priority on the CPU than a long.q job in case they run on the same compute node. It is only contributing members who have access to the member.q queue - non-contributing members will only have access to queues such as the long.q queue. Contributors get non-expiring, lifetime access to a number of these member.q slots in proportion to their hardware contribution to the cluster. The number of member.q slots a particular hardware contribution, which can be monetary(*) or physical(*), adds, is based on how much compute power the contribution adds to the cluster. The amount of compute power that contributed hardware adds is based on benchmarking(*), which result in a processing-unit score (PU) for the contribution. Currently, there are in total PU_total = 20186 contributed processing units on Wynton HPC.

A lab’s contributed processing units (PU_lab) will never expire - it will remain the same until the lab makes additional contributions to the cluster.

As other labs contribute to the cluster, the total computer power (PU_total) and the total number of member.q slots (member.q_total) will increase over time. This will result in the lab’s relative compute share (PU_lab / PU_total) to decrease over time while their number of member.q slots (member.q_lab) will stay approximately(**) the same.

Example: Additional contribution from the Charlie Lab #

Assume that the last addition was from the Charlie Lab contributing 4 compute nodes. Each of these machines has a 12-core 2.2 GHz Opteron 6174 CPU and clocks in at 1.6 PUs based on the benchmarking, resulting in the processing power added for this lab, but also to the cluster as a whole, to be 4 * 1.6 PUs = +6.4 PUs. In addition to increasing the total amount of contributed PUs, the lab’s contribution also increased the total number of member.q slots on the cluster by 4 * 12 = +48 slots.

If this was Charlie Lab’s first contribution to Wynton HPC, their share on the member.q queue will be PU_lab / PU_total = 6.4 / 20186 = 0.032%. This PU share translates to member.q_lab = (PU_lab / PU_total) *member.q_total = 2 member.q slots (2.21 rounded off to the closest integer). Instead, if they already had contributed, say, in total 16.3 PUs in the past, their computational share would had become PU_lab = (16.3 + 6.4) / 20186 = 0.112%, which, would corresponds to 8 member.q slots (7.85 rounded off).

Current Compute Shares #

Below table shows the current amount of contributions in terms of Processing Units (PU) and the corresponding number of member.q slots per contributing lab.

Source: compute_shares.tsv produced on . These data were compiled from the current SGE configuration (qconf -srqs member_queue_limits and qconf -sprj <project>). In SGE terms, a processing unit (PU) corresponds to a functional share (“fshare”).

(*) To be documented.
(**) The reason for member.q_lab not remaining exactly the same when PU_lab does not change, is that the compute power per core is greater for newer hardware compared with older hardware. Because of this, a lab’s number of member.q slots is likely to, ever so slightly, decrease in the long run as the cluster keeps growing. But don’t worry, as the average compute power per member.q slot increases over time, your lab’s total compute power on the member.q queue remains constant per definition (unless your lab adds further contributions).