Monash Campus Cluster 2013 Report
The Monash Campus Cluster is a high-performance and high-throughput computational resource for Monash university researchers. It commenced operation in 2005 with initially 32 dual-core AMD Opteron compute servers; over the years, the MCC has grown to over 250 nodes of diverse processor and memory configurations.
The MCC nodes are interconnected via the Monash’s data centre’s gigabit Ethernet and utilises HSM-based NFS from LaRDS for storage of user and computational data files. The MCC has proven to be a stable and resilient platform for computing, with rare service outages throughout its many years of service.
A distinguishing characteristic of the MCC is the ability to accommodate researcher-defined hardware procurements via a co-funding partnership arrangement. More than half of the MCC’s current CPU capacity is due to partner contribution to the facility; this is indicative of our partners’ confidence in our ability to effectively deliver a Monash HPC/HTC capability.
Key Accomplishments and Successes
- MCC implemented a user-support ticket system through Jira in October 2013. This much-needed service has enabled user requests and issues to be tracked across staff members and resolved more efficiently. As of May 2014, the MCC team has processed over 350 user help requests submitted by the user base.
- A new online MCC user guide was launched in November 2013. This website is structured so that users can get information on how to prepare and launch jobs on the MCC, along with helpful information on getting started. This confluence page supersedes the previous out-of-date documentation on the cluster. The website is hosted on the MeRC confluence wiki at:
- In March 2013, MCC assisted in the transfer of MASSIVE operations to Monash University. The result of this successful transfer is that the MCC and MASSIVE HPC support teams were brought together to form a single group; with both teams co-located at Building 75. This has presented various opportunities for coordination, knowledge transfer and collaboration. Further team integration and coordination is planned for 2014.
- The MCC Team has been prototyping an extension of the cluster to use NeCTAR nodes deployed within the university’s R@CMon implementation of NeCTAR. To date, 15 virtual machines have been built and configured on a separate MCC queue, and key users from Chemistry, Business&Economics, Science and Engineering have been invited to make use of the facility. As of January 2014, Dr Ekaterina Pas has received a merit allocation of 120 cores and this is currently instantiated on the eResearch SA cell.
- Monash researchers from Engineering and Science received just under five million Service Units through the NCI NCMAS at the ANU National Facility. During the 2013 NCI merit allocation scheme, two Engineering research groups received a total of 2.7 million service units (SUs), with Geoscience and Chemistry receiving over one million SUs each. In addition, Monash researchers received a top-up of 8.5 million SUs, through the Monash LIEF grant.
- We have deployed Gaia, a sub-cluster for Quantum Chemistry. Gaia consists of a single Dell M1000e chassis with 16 Dell M620 blade servers with a Mellanox Infiniband QDR switch. This is a joint project with Dr Katya Pas of Chemistry, who has secured a Junior Infrastructure Grant of $100K from the Faculty of Science in the last quarter of 2012. The servers were purchased and delivered before the closedown of 2012 and become operational in August 2013. The usage of the facility is at near 95%.
- In February 2014, the team commissioned six servers as part of the co-funded partnership engagement with Prof Hugh Blackburn of Engineering. This has resulted in a co-investment of ~ $40K from eSolutions to provision researcher-defined hardware in the form of six Dell R815 AMD Opteron servers, each equipment with 400 GB of SSDs and about 4 TB of SAS storage.
- Throughout 2013, the MCC team has been involved with supporting the computational and storage workflow requirements for the Next Generation Sequencing facility at MIMR. This includes the development and support of a semi-automated system for converting raw sequencing data from the Illumina HiSeq instrument to data in open format, e.g., FASTQ. Some of these data are eventually analysed with the Monash Campus Cluster’s high RAM compute servers.
MCC Key Facts 2013
The table below summarizes the growth and CPU usage during the past six years.
- Throughout 2013, 337 active users submitted over 3 million jobs to the MCC: the Faculty of Engineering utilised over 5 million core hours among its 71 users, followed by the Faculty of Science with 3.5 million core hours, then the Faculty of Business and Economics at 2.9 million core hours across 41 users.
- On NCI’s Raijin, the highest utilisations were from the Monash Computational Chemistry group of Dr Ekaterina Pas and the Engineering groups:
- Prof Julio Soria;
- Prof Kerry Hourigan; and
- Prof Hugh Blackburn
The graph below shows the growth in core count of the MCC from 2008 to 2013:
The graph below shows the monthly count of active users, i.e., users who have logged in and submitted jobs to the cluster during that month. There are about 50 users running jobs on the MCC at any given time throughout the year.
Finally, this graph shows the monthly core hours throughout the six year period.
- The diagram above shows the distribution of CPU cores, age of the MCC hardware, and partnerships. Less than 50% of the MCC capacity is for general use, as recently provisioned capacity (in italics) are partnership procurements, where the partner maintains prioritised access to these systems.
- 22 compute nodes gn35 to gn56, which were in operation since 2006/7, are now powered down and decommissioned in Q1 2014. This is to reduce the power footprint of the MCC to the B28 data centre while at the same time, make space for new hardware.
- At the start of 2013, 51 Dell blades formed part of the MCC extension using pre-provisioned servers from Servers-and-Storage. Throughout the year, 32 of these has been reclaimed by SAS for provisioning hosts, leaving only 19 nodes in operation (as of April 2014), reducing the core count by 768.
MCC Team Composition
In 2013, the combined MCC Team consisted of the core group:
- Dr Shahaan Ayyub
- Mr Philip Chan
- Mr Simon Michnowicz
MCC underlying infrastructure has been support by eSolutions, mainly:
- eSolutions Servers-and-Storage Team
- eSolutions Networks Team
- eSolutions Production Facilities
The successes of the MCC to date can be attributed to the continued cooperation between the eResearch and eSolutions teams.