![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Table of Contents | Director's
Message | Executive Summary | SCD Achievements |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SCD AchievementsThe production supercomputer environment
managed by SCD for NCAR has evolved over the years. During the last 20
years, SCD has brought NCAR's science into the
multiprocessing supercomputer world. Prior to the introduction of the
4-CPU Cray X-MP in October 1986, all modeling was performed with serial
codes. Since then, the focus has been on redeveloping codes to harness
the power of multiple CPUs in a single system and, most recently, in multiple
systems.
Each server included 64 GB of memory.
The system expansion also included 10.5 TB of formatted disk storage,
which was added to the existing disk subsystem, thereby increasing bluesky's
total disk capacity to 31 TB. Of the 14 servers, only 12 were added to
bluesky, the remaining two servers are temporarily
being used for a special SCD testbed project. At end-FY2004, bluesky
is comprised of 50 POWER4 38 Regatta-H Turbo frames, making it the single
largest system of this type in the world. The 12 additional 32-way P690 SMP servers
were used to support CCSM for contributions to the IPCC process, as reported
in SCD's Annual Budget Review. The installation of the blueksy system and its subsequent augmentation has doubled
the capacities of both the Climate Simulation Laboratory and Community
computing. Further, there were several major system software upgrades performed on all supercomputers. Supercomputer systems maintained during FY2004 Distributed Shared Memory (DSM) systems:
DSM systems: As
an element of its five-year strategic plan to aggressively evaluate and
deploy potentially more cost-effective new computing technologies, SCD
acquired a large-scale Linux-based supercomputer cluster.
Following a competitive procurement process, IBM was selected to
deliver a 256-processor e1350 Linux cluster.
The system, called lightning, was delivered
in July 2004 and uses 2.2 GHz AMD Opteron processors,
has a peak computational capacity of 1.14 teraflops,
0.5 terabytes of memory, and 7 terabytes
of disk. The Production system performance and utilization statistics At the end of FY2004, the production
supercomputer environment managed by SCD for NCAR included five IBM supercomputers and
four SGI supercomputers. The following tables provide
average utilization and performance statistics for the production supercomputer
systems SCD operated in FY2004. In addition, SCD publishes monthly usage
reports at http://www.scd.ucar.edu/dbsg/dbs/.
These reports provide summary information on system usage, project allocations,
and General Accounting Unit (GAU) use. The SCD supercomputer resources are comprised
of two separate computational facilities: the Climate Simulation Laboratory
(CSL) and Community Computing facilities. Some systems, such as the IBM
SP systems, the dave system, and the dataproc system are shared between these two facilities.
The following sections describe the supercomputing systems available in
these two facilities. The Climate Simulation Laboratory facility
provided the following supercomputing resources at the end of FY2004:
Community Computing facility: The Community Computing facility provided the following supercomputing resources available at the end of FY2004:
Key maintenance activities During FY2004, SCD provided ongoing maintenance
activities to ensure the integrity and reliability of existing computational
systems and improved the quality of service to the NCAR user community.
Some of the key areas were: Maintain supercomputer operating systems Maintain stability and reliability
of systems System monitoring Computer Security and Divisional Threat Response SCD manages a diverse computational and data storage environment containing high-end computers, mass storage subsystems, data archives, visualization, e-mail, DNS, authentication and web servers, and networks (including IP telephony). Not only are these systems valuable monetarily, they comprise vital scientific research tools and business continuation systems used by the UCAR/NCAR organization and university communities. In response to a major cybersecurity incident that involved multiple high-performance computing sites in March 2004, SCD rapidly developed and deployed a long-term solution for protecting the supercomputing and mass storage systems at NCAR. SCD now requires one-time password tokens, arbitrated via encryption devices issued to all users, to access these systems. Security procedures were updated and published to provide all users with guidelines and instructions for working within the secure supercomputing environment. One of the problems encountered during the March 2004 incidents was a lack of effective communication among the affected institutions. SCD proposed a conference to bring together stakeholders from the nation's research and high-performance computing centers to prepare a coordinated response for future incidents. With funding from the National Science Foundation (NSF), SCD planned, organized, and hosted a two-day Cybersecurity Summit near Washington D.C. Attended by over 120 cybersecurity experts from some of the nation's leading research institutions, the summit explored the competing needs of having an open, collaborative research environment while protecting the security and integrity of its computing and data assets.
Cybersecurity Summit 2004 was the first step in laying the foundation for responding to future large-scale security breaches and reducing the disruptive impact of such incidents on the nation's research agenda. These research institutions are increasing their cooperation on security policies, procedures, and incident response to better protect the nation's scientific computing and data resources. Data Archiving and Management: The Mass Storage System (MSS) The NCAR Mass Storage System (MSS) is a large-scale data archive that stores data used and generated by climate models and other programs executed on NCAR's supercomputers and compute servers. At the end of FY2004, the NCAR MSS managed stored data exceeding 25 million files of over 1,247 unique terabytes (TB), and the total holdings exceeded 2,149 TB (2.1 petabytes) when including duplicate copies. The net growth rate of unique data in the MSS was approximately 30 TB per month. On average, 160,000 cartridges are being mounted each month, approximately 1% (1,000) of these by operators and 99% in the StorageTek Powderhorn Automated Cartridge Subsystems (ACS). The StorageTek Powderhorn ACS systems (also called "silos") use robotics to mount and dismount cartridges. On a daily basis, the MSS handles approximately 41,000 requests resulting in the movement of over 3,900 GB of data. During FY2004, data transfers servicing user requests to and from the MSS exceeded 1,400 TB. While some of the data stored on the NCAR MSS originate from field experiments and observations, the bulk of the data is generated by global climate-simulation models and other earth-science models that run on supercomputers. SCD therefore faces an increasing demand to archive the data generated by increasingly more powerful supercomputers. As supercomputers become larger and faster, they generate more data to be archived. Ever-greater demands for archiving data will result from the growing use of coupled atmospheric/oceanic simulation models.
During FY2003, the NCAR Mass Storage System grew from 20,340,049 files with a total of 880 unique TB to 25,121,621 files with a total of 1,247 unique TB. Total holdings grew from 1,500 TB (1.5 PB) to over 2,149 TB (2.149 PB) This was an average net growth rate of 30 unique TB (60 total TB) per month during FY2004. The MSS Today MSS Access Methods During FY2004, the technology used to
access MSS data continued to undergo substantial change. A migration is
underway from the use of the older, non-commodity, High Performance Parallel
Interface (HiPPI) technology to the use of Gigabit
Ethernet (GigE) and Fibre
Channel (FC) technologies. The HiPPI technology
provides direct storage-device access via the High-Performance Data Fabric
(HPDF). The data fabric consists of HiPPI channel
interfaces to host computers, non-blocking HiPPI
switches capable of supporting multiple bi-directional 100 MB/sec data
transfers, and protocol converters that connect the HiPPI
data fabric to the IBM-style device control units. To utilize the HPDF,
SCD staff wrote a file transport type of interface to enable users to
copy files between their host systems and the MSS. At the end of FY2004,
the HPDF data fabric supports 12 independent file transfer operations
between the tape devices and the compute servers sustaining 10 MB/sec
each, for an aggregate total of 120 MB/sec. HiPPI technology continues to be deployed only in a niche
market. It has not shown signs of spreading into the commodity marketplace,
and as a result the cost of HiPPI technology
has remained high and the number of HiPPI vendors
is small. The lack of availability of and support for HiPPI technology is becoming a critical issue to the continued
operation of the MSS. To alleviate these issues, SCD staff
wrote the UNIX-based Storage Manager (STMGR), which replaces the HPDF
as the method used to access data by host systems. STMGR isolates the
client host systems from directly accessing the storage devices, simplifying
the code SCD has to write and maintain for each type of host operating
system. It also eliminates the need for HiPPI
channel interfaces and device drivers on the client hosts. In place of
HiPPI, commodity TCP/IP networking is used to
access STMGR from the client host systems. Client host systems can use
any available network interface at any speed to access files on the MSS.
Currently, when using GigE, data rates in the range
of 30-60 MB/sec are easily achievable with recent computer hardware. Using
high-speed Ethernet as the client system interconnect means that future
deployment of higher-speed GigE will automatically
raise the capacity of the client system interconnect. The use of UNIX systems for STMGR allows
SCD to deploy the latest storage hardware and software technologies to
manage MSS data. STMGR server systems initially use a FC Storage Area
Network (SAN) to access RAID and tape drives via a high-reliability switch.
Fibre Channel is currently available in versions that support
either approximately 100 MB/sec or 200 MB/sec bidirectionally.
Multiple FC connections may be made between STMGR servers and storage
devices, and aggregate I/O rates approaching 1 GB/sec are possible with
commodity components on a single STMGR server. The use of FC RAID plus
journalling file systems allows STMGR to improve the robustness
and flexibility of the disk cache. Also, MSS administrators can have STMGR
reallocate resources between disk cache partitions or add space to disk
cache partitions on the fly without interruption to MSS clients. Near the end of FY2003, STMGR was placed
into production as a replacement for the old IBM 3390 disk farm.
The old disk farm could store approximately 180 GB, was used to buffer
files that were smaller than 15,000,000 bytes long, and supported an aggregate
transfer rate of 12 MB/sec. During the initial deployment, the STMGR disk
cache stores approximately 500 GB and supports an aggregate transfer rate
approaching 120 MB/sec. During FY2004, the STMGR disk cache was increased
to approximately 8 TB to buffer files up to 50 MB in size. In FY2005,
the STMGR disk cache will grow to approximately 60 TB, will buffer
files of all sizes, and will support an aggregate transfer rate approaching
400 MB/sec. A disk cache of this size will permit newly written files
to reside in the cache longer, which will reduce the number of tape mounts
and tape I/O. STMGR will also, with further improvements in MSS software,
allow better tape utilization by allowing files with differing storage
requirements to be segregated on separate tape media. Both of these improvements
will reduce the total number of tape drives that will be required to support
the aggregate data rates between the MSS and the client host systems. Also during FY2004, the use of HiPPI was reduced for newly written tape files when STMGR assumed the role of providing tape access. HiPPI can then be decommissioned in FY2005 once all data has oozed off the StorageTek 9840A media. New tape devices, such as the StorageTek T9940B Fibre-Channel-attached drive, store up to 200 GB and support I/O rates in the neighborhood of 30 MB/sec. This will be an improvement of 3 times in both storage density and transfer rate over the current tape devices. These improvements will allow the MSS to expand into the multi-petabyte range while reducing the latency to access MSS files. MSS Storage Hierarchy The NCAR MSS currently uses two levels
of storage: online and offline. The most frequently accessed data are
kept on the fastest storage media, which is the online storage devices:
8 TB of Fibre Channel RAID storage,
and five StorageTek Powderhorn
ACSes. The Powderhorn
ACSes use StorageTek 9840A and
9940A, as well as StorageTek 9940A technology.
Currently, the NCAR MSS has five ACSes providing
a total online capacity of approximately 2 petabytes.
The total capacity of the online archive will exceed 6 petabytes
utilizing 9940B 200 GB cartridges. Expansion of the MSS storage hierarchy
is planned over the next five years with the introduction of new tape
technologies, new ACSes, and with the integration
of a multi-terabyte disk farm cache. Simulations
of the MSS workload indicated that a 60-TB disk farm
cache can reduce the amount of tape readback
activity by as much as 60%. The disk farm cache
would not only reduce the number of tape drives required in the system
but also provide a much-improved response time to read and write requests.
In addition, the MSS Group will continue to evaluate hardware and software
solutions being developed by vendors throughout FY2005 and how they might
be integrated into the NCAR MSS. MSS Import/Export Capability Another important capability of the NCAR
MSS is the ability to import and export data to and from external portable
media. Importing data involves copying data from portable media to the
MSS data archive, while exporting data involves copying data from the
MSS data archive to portable media. Import/export allow
users to bring data to NCAR with them, as well as take data away. Import
also allows data from field experiments to be copied to the NCAR MSS archive. Options to exchange data with smaller
satellite storage systems are being investigated. Using this technique,
data generated at NCAR could be transferred to remote sites for further
analysis. The NCAR SCD storage model would thus be geographically distributed,
rather than centrally located and administered. In addition to 3480 and 3490E cartridge tapes, the NCAR MSS also offers import/export to single and double-density 8mm Exabyte cartridge tapes. The deployment of an MSS-IV Import/Export server in FY2000 provided the ability to support many more device types, such as CD-ROM, DAT, and newer Exabyte media, to name a few. MSS Accomplishments for 2004 Disk
farm cache simulator
To aid capacity planning and performance
tuning of the MSS, a simulator that includes all the major hardware and
software components of the MSS was developed in 2003. The simulator enables
the MSS group to consider different design alternatives for new software
and hardware components and estimate how the different designs will perform
before the components are added to the actual system. Simulation studies
were conducted in 2004 using an earlier version of this simulator (that
only simulated the disk cache component of the MSS) to aid in configuring
and sizing the STMGR disk cache system. In
addition, simulator output was combined with MSS warehouse information
to help measure the effectiveness of external data caches which were deployed
to avoid rereading data from the MSS, thus avoiding the abuse of a data
archive as a file server. The external cache deployment resulted in as
much as a 60% drop in such re-reads. StorageTek 9940B Technology Initial deployment of 20 StorageTek 9940B tape drives was completed in FY2004. Managed by the STMGR, these drives are servicing the files offloaded from the disk cache and local system backup files. An additional 20 9940B tape drives will be installed in early FY2005, and with the expansion of the disk cache, a data ooze will be started in FY2005 to replace the 9940A technology. As a result of the SCD-held user forum
on computing issues, MSSG compiled for SCD's
Consulting Office a short list of do's and don'ts regarding the NCAR Mass
Store to help guide users toward efficient and proper use of the MSS. New MSS Hosts The IBM eSeries
Linux Cluster, named lightning, was provided with Mass Store connectivity
in 2004. NCAR Mass Storage System growth during
FY2004 increased over FY2003. Average net growth rate during FY2003 was
27 TB per month, whereas the average net growth rate during FY2004 was
30 TB per month. This increase in the growth rate can be attributed to
several factors, such as new MSS hosts coming online, increased amounts
of local disk storage on several machines (which increases
the size and number of MSS backup files), the IPCC initiative, and 14
computing nodes added to the IBM POWER4 cluster (bluesky).
Further increases in the net growth rate are expected in FY2005 with the
addition of two Linux clusters in early FY2005. Projecting this growth into
the future, it is not difficult to realize that new storage paradigms
and user education will be required, since without this the growth in
just three to five years will be untenable. The following table compares year-end
statistics for FY2000 - FY2004.
Future Plans for the MSS Key issues to be addressed over the next
four years include:
Computational Science Research CSS's mission is to help realize the end-to-end scientific simulation environment envisioned by the NCAR Strategic Plan. To this end CSS's role is to benchmark and evaluate computer technology, learn to extract performance from it, pioneer new and efficient numerical methods, create software frameworks to facilitate scientific advancement, particularly through interdisciplinary collaborations, and share the resultant software and findings with the community through open source software, publications, talks, and websites. Applied Computer Science Research ActivitiesIn 2004, CSS-applied computer science efforts have centered on three activities: studying experimental, massively parallel architectures such as Blue Gene/L; benchmarking and evaluating Linux clusters as part of an SCD procurement; and porting applications to Linux-Itanium systems as part of the Gelato Federation. Blue Gene/L Application Research IBM has developed a novel, low power, densely packaged, massively parallel computer system called Blue Gene/L. Each node of Blue Gene/L consists of dual PowerPC 440 cores running at 700 MHz. Each core is capable of two floating multiply-adds per clock cycle, and 1,024 nodes can be packed into a single 19-inch rack. Thus a single rack of Blue Gene/L processors has a peak speed of 5.6 teraflops. This is achieved while consuming about 15 kW of electrical power, a tiny fraction of that consumed by conventional massively parallel systems. Apart from its low power and dense packaging, Blue Gene/L has several interesting architectural characteristics, for example a dedicated tree reduction and synchronization network, as well as a toroidal interconnect.
Figure 1: 512 nodes (1,024 processors) of an IBM BlueGene/L system (photograph courtesy of IBM Research). In 2004, scientists in CSS, in collaboration with researchers from CU-Denver and CU-Boulder, submitted a proposal to the NSF's Major Research Infrastructure program. The objective was to acquire a 1,024-node Blue Gene/L system to study the performance of scalable applications on it, and to evaluate its production capabilities. This proposal was recently funded by the NSF, and SCD is currently in negotiations with IBM to obtain a Blue Gene/L supercomputer for evaluation. The system will be used for high-resolution studies of moist physical processes, employing the cloud-resolving convection parameterization (CRCP). Because of its scalability, the primitive equations dynamical core for these studies of CRCP physics will be the section's prototype spectral element model, HOMME. Throughout the past year, members of CSS, working closely with IBM computer scientists, have been benchmarking a 512-node Blue Gene/L prototype located at IBM's T.J. Watson Research Center. The benchmarks have been chosen to measure the system's performance on key algorithms drawn from our proposed atmospheric science projects. All-to-all, point-to-point, and global reduction communication benchmarks have been used to measure the capabilities of the Blue Gene/L's networks, and prototype CRCP physics packages have been ported and optimized. Benchmarking, Porting and Performance Modeling Activities CSS has also been extensively involved in evaluating and benchmarking clusters for the recent procurement by SCD of a 256-processor Linux-based system. This procurement resulted in the acquisition of an Opteron/Myrinet Linux cluster, which achieved performance levels 1.3-1.4 times higher than an equivalent number of IBM 1.3-GHz POWER4 processors. In 2004, CSS performed extensive testing and evaluation of the IBM "Federation" interconnect. CSS has continued to expand its engagement with computer science students. Dr. Henry Tufo in CSS has played a key role in exploiting this opportunity by leveraging his joint appointment as a Computer Science professor at the University of Colorado to involve four graduate students in NCAR research problems. Students of Dr. Tufo are working in the areas of application porting and tuning, Linux cluster system administration, and Grid computing applications. CSS staff also provided technical support to computer science students in a course taught by John Halley at the University of San Diego, in which NCAR applications were ported to a variety of platforms. Gelato Membership The Itanium very long instruction word (VLIW) architecture represents an important departure from the traditional superscalar RISC microprocessor and CISC-like Pentium architectures used in the geosciences departments at most universities today. Since the VLIW relies on the compiler rather than on-chip circuitry to extract parallelism from the instruction stream, developing robust optimizing compilers for Itanium is critical. As Itanium microprocessors become plentiful in the geosciences community, access to reliable compilers, ported modeling applications, and open-source high-performance mathematical libraries optimized for this architecture enable scientific progress on Itanium Linux systems. In the past year, CSS's role as a member of the Gelato Federation, an organization devoted to the advancement of the Linux-Itanium technical solution, has been to "beta test" the Intel Fortran and C++ compilers on the Intel Itanium and Itanium-2 processors by porting and tuning a variety of applications, such as CAM2 and MM5 to this platform. In this capacity, CSS has closely collaborated with SGI to port CCSM to the Altix (Itanium-based) shared-memory architecture. As a result, CCSM has recently been validated and has successfully demonstrated exact restart capability on this platform. Applied Mathematics Research ActivitiesThe research activities of the Computational Science Section (CSS) at NCAR are focused on three broad goals. First, work sponsored by the Department of Energy's Climate Change Prediction Program (CCPP) is developing a new generation of accurate, efficient, and scalable general circulation models, based on high-order methods and suitable for use by the atmospheric research community. To this end, CSS has conducted applied mathematical research, tested novel numerical algorithms using the standard test cases of the atmospheric science research community, and has created efficient software implementations of these algorithms. CSS has also been working to integrate two physics packages into these models: the physics in the Community Atmospheric Model Version 2 (CAM2), recently used for IPCC simulations as a component of the Community Climate System Model (CCSM) (Blackmon, et al. 2001), and a Cloud Resolving Convective Parameterization (CRCP) sub-grid scale physics scheme acquired through a collaboration with the Cloud Dynamics Group in the MMM division at NCAR. New Semi-Implicit Implementation In 2004, CSS completed re-implementing a semi-implicit time step for the spectral element primitive equations. As before, the 3D governing primitive equations were specified in curvilinear coordinates on the cubed sphere combined with a hybrid pressure vertical coordinate. The new non-staggered formulation eliminates the interpolation for nonlinear terms that caused problems for the staggered semi-implicit during year seven of our research. The new dry dynamical core, based on a non-staggered weak formulation, has been validated using the standard 1,200-day Held Suarez test problem. The semi-implicit solver of this model is based on vertical eigenmode decomposition and an iterative conjugate-gradient elliptic solver. In tests, the performance of the solver has been greatly improved using a simple preconditioner proportional to the determinant of the metric tensor. The vertical eigenmode with the largest velocity is the last to converge and effectively controls the rate of integration. To be useful, the longer time-step allowed by the semi-implicit method must overcome the additional cost of the Helmholtz solver. Preliminary tests indicate that the semi-implicit integration rate is at least three times faster than the explicit spectral element dynamical core on a single processor. Scalability tests of the new formulation are planned for later in 2004. Adaptive Mesh Refinement of Non-conforming Spectral Elements Year 2's success rests on the outstanding work of Amik St-Cyr and John Dennis, two very promising young scientists at NCAR. In the past year they successfully developed and implemented a multi-level AMR version of the section's spectral element dynamical core. This is a fully parallel code based on the geometrically non-conforming SEM of Kruse and Fischer combined with a novel tree management strategy for AMR on the cubed-sphere called HAMR (HOMME AMR). Time-stepping restrictions caused by refinement are partially alleviated by employing the novel nonlinear operator integration factor splitting (OIFS) scheme of Thomas and St-Cyr. (As an added benefit, the resulting 3D equations are well-posed under AMR as OIFS does not require local time-stepping.) Our refinement/de-refinement technology is based on the error estimator work for spectral element methods of C. Mavriplis. Though validation testing is not complete the current release of the code has been validated on several of the shallow water test cases of Williamson, in particular test case 5. Other highlights of year 2 include numerous journal publications, conference presentations, and the involvement of several CU graduate students in the project.
Figure 2: Adaptively refined non-conformal spectral elements tracking a cosine bell test shape in shallow water equations. After investigating the currently available packages to support AMR, the decision was made in September 2003 to build our own package to support AMR on the cubed-sphere. Using the static non-conforming code developed in year 1 as a guide, an entirely new AMR implementation was developed for HOMME. The HOMME AMR implementation (HAMR) is based on the TFS communication library of Tufo. TFS is a scalable direct stiffness summation package with low setup cost. It uses unique global IDs to pair shared degrees of freedom in a distributed environment. HAMR is designed around the concept of a distributed graph, while a lightweight bit-shifting tree algorithm is used to maintain inheritance properties among the spectral elements. The topology of the cube-sphere necessitated that a minimum of six separate trees, one tree for each face of the cube, be maintained along with the connectivity information between each tree. Because of the unavoidable need for graph management, it was decided that all spectral element connectivity information be maintained in graph form (versus tree form). This decision allows for an arbitrary select of the underling base grid. The distributed graph is updated each time a spectral element is refined or coarsened. Local graph query functions are used to set the proper global degree of freedoms for the TFS library. HAMR has been demonstrated in parallel to support both refinement and coarsening for multiple levels of refinement and achieves load-balancing via element migration. In 2003, St-Cyr and Thomas developed a novel time-stepping scheme to ameliorate the time-step restrictions encountered under AMR and to maintain well-posedness of the 3D equation set. Merging the OIFS time-stepping required major revisions to the Krylov solvers, as generation of the preconditioning matrices on the fly is non trivial. In addition, HOMME implementation was generalized to remove unnecessary edge rotations. In the earlier version, a special treatment of vector quantities was necessary when on an edge of the cubed sphere. This change was necessary to use TFS library for the direct stiffness summations. The inter-element trace matching is generalized, and the masks necessary to eliminate doubled corner contributions are generated automatically. As stated earlier, the OIFS time-stepping approach needs more aggressive preconditioning techniques. Martin J. Gander is collaborating with the team to determine whether an optimized Schwarz preconditioner can be used in the P_N - P_N (non-staggered) version of HOMME. Recent results obtained by Gander and St-Cyr include a proof that changing the preconditioning matrices in the Dryja-Widlund form of the additive Schwarz procedure leads to the optimized iterates. This result will help the community in accepting these novel preconditioning techniques. Integration of Spectral Element Dynamics with CAM Physics In 2004, CSS began Integrating HOMME explicit dynamics with CAM physics from version cam_2_0_2_dev69. The API between the dynamics and the CAM physics and the necessary CAM program management units was identified. Inconsistencies and incompatibilities with respect to the grid structures were identified and resolved. Most issues related to the initialization of an "Aqua Planet" [Hyashi86] experiment have also been resolved. Integration of Cloud Resolving Convective Parameterization (CRCP) with CAM Physics In FY2004, work began interfacing a Cloud-Resolving Convection Parameterization (CRCP; a.k.a. super parameterization; Grabowski Smolarkiewicz 1999; Grabowski 2001, 2003) with the HOMME dynamics. CRCP is a novel technique for representing clouds in atmospheric models. The idea is to embed a 2D cloud-resolving model in each column of a large-scale model to represent small-scale and mesoscale processes. Khairoutdinov and Randall (2001) have tested this approach in the Community Climate System Model (CCSM). A stretched vertical coordinate has recently been implemented in the CRCP code, facilitating direct coupling to a pressure vertical coordinate. Conservative Advection using Discontinuous Galerkin Method The Discontinuous Galerkin (DG) Method is a hybrid of finite-element and finite-volume methods, and it provides a class of high-order accurate conservative algorithms for solving nonlinear hyperbolic systems. This method is known for being highly parallelizable and the for being able to capture discontinuity of the exact solution without producing spurious oscillations. In FY2004, a DG conservative transport scheme has been developed on the cubed-sphere (Nair 2004). This scheme has been further extended to a nonlinear flux-form shallow water model (SW) in curvilinear coordinates on the cubed-sphere. The spatial discretization employs a modal basis set consisting of Legendre polynomials. Fluxes along the element boundaries (internal interfaces) are approximated by a Lax-Friedrichs scheme. A third-order total variation diminishing Runge-Kutta scheme is applied for time integration, without any filter or limiter. The model has been evaluated using the standard SW test suite proposed by Williamson et al. (1992). The DG scheme shows exponential convergence for shallow water test case 2 (flow over a mountain). The DG solutions to the shallow water test cases are comparable to that of a standard spectral-element model. Even with high-order spatial discretization, the solutions do not exhibit spurious oscillations for the flow over a mountain test case. However, a spectral-element model or a global spectral model produces spurious oscillations for this particular test. The model conserves mass to the machine precision. Although the scheme does not formally conserve the global invariants such as total energy and potential enstrophy, these quantities are better preserved than in existing finite-volume models. Currently, the DG transport scheme is being implemented in the NCAR/SCD High-Order Multiscale Modeling Environment (HOMME). Radial Basis Functions (RBFs) CSS has been focusing on two areas of research within RBFs. The first is examining the interpolation properties for oscillatory Bessel RBFs: these are an entirely new group of RBFs with interesting properties. For example, it has been shown very recently that pseudospectral (PS) approximations are just a subclass of RBF approximations in the flat basis function limit, i.e. as the parameter that controls the shape of the RBF goes to zero. Not only do oscillatory Bessel RBFs possess unconditional nonsingularity of the interpolation matrix for any scattered node distribution, but they are the only class of RBFs immune to divergence of the interpolant in the limit that the shape parameter goes zero. To further explore the relationship between PS and RBF approximations, it is important to understand the accuracy of oscillatory Bessel RBF interpolation in multi-dimensions. Dr. Natasha Flyer in CSS has proven in one-dimensional space that an oscillatory Bessel RBF expansion on an infinite lattice will exactly reproduce an n-dimensional polynomial of any order. She has gone on to provide an extension of the proof to arbitrary n-dimensional space. This is a great leap forward in RBF theory, as this is the only class of RBF functions known to possess this property. Dr. Flyer is working with Dr. Elisabeth Larsson of Uppsala University to extend this result to scattered node locations rather than a lattice. The second RBF research area is to develop a theory applicable to spherical geometries. The importance of this research is to develop a new grid-free approach using RBFs to solve time-dependent PDEs in spherical domains. Such an approach is singularity free (due to its independence of any surface-based coordinate system), spectrally accurate for arbitrary node locations on the sphere, and naturally permits local mesh refinement. No other discretization method currently in use in spherical geometries can attest to these properties. Modeling Solar Coronal Mass Ejections The CSS collaboration with HAO studying Coronal Mass Ejection involves three related efforts. The first project is to extend recent results related to magnetic-field confinement in the solar corona. The second project, with Mei Zhang of the Chinese Astronomical Observatory, is to show that there is an upper bound on the amount by which the total magnetic energy in the force-free field for a dipole field configuration can exceed the Aly limit, which is defined by the amount of energy needed to completely open the solar magnetic field (i.e. have one end of a line of force anchored to the sun and the other running out to infinity). Dr. Flyer in CSS has been able to show numerically that not only does such a bound exist, but that it is 8.33%. This number has been guessed by some physicists in the field, but never before verified either numerically or analytically. The last project is to solve the hydromagnetic equations describing magnetic fields in realistic three-dimensional geometry, both in the force-free state and in force balance with plasma pressure and gravity. The general 3-D case is far more demanding computationally, featuring four coupled PDEs in three space dimensions, and is the subject of a recent CSS proposal submitted to NASA. This will be a cross-collaborative effort with HAO, CU-Boulder, and University of Wisconsin-Madison. Shallow Water Flows Develop Singularities on the Sphere In 2004, research demonstrating that certain shallow water test cases on a non-rotating sphere develop singularities was completed, and a paper on this topic has been accepted for publication [Swarztrauber 2004]. Development ActivitiesCSS development activities are aimed at providing modeling frameworks and mathematical libraries that support the research community's efforts to create portable and efficient models and scalable and efficient post-processing tools. Earth System Modeling Framework The Earth System Modeling Framework (ESMF) is building software infrastructure for climate, weather, and data assimilation applications. Collaborators include NCAR SCD, CGD, and MMM, NOAA GFDL, NOAA NCEP, MIT, the University of Michigan, DOE ANL, DOE LANL, and NASA/GSFC GMAO. The project is organized around a series of 11 milestones, the first five of which were submitted during FY2002 and FY2003. The sixth and seventh ESMF milestones, submitted during FY2004, marked the public release of Version 2.0 of the ESMF software, the demonstration of three interoperability experiments using the framework, and the Third ESMF Community Meeting held at NCAR in Boulder, Colorado on July 15, 2004. The day-long Community meeting included a discussion of features in the release, a brief tutorial on adopting ESMF, and presentations describing how ESMF has been used to create applications from existing components developed at GFDL, MIT, NCEP, NASA and NCAR. ESMF Version 2.0 code and documentation can be downloaded from the ESMF website, http://www.esmf.ucar.edu/ The ESMF Partners and active collaborators list expanded to include groups at the DOD Naval Research Laboratory and the DOD Air Force Weather Agency, as well as existing partners at the Goddard Institute for Space Studies, UCLA, the Center for Ocean-Land-Atmosphere Studies, and the NASA GSFC Land Information Systems project. ESMF continues to coordinate with the European Programme for Integrated Earth System Modeling (PRISM) and the DOE Common Component Architecture (CCA) projects. Spectral Toolkit Development Work developing high-performance portable, highly efficient, open source numerical libraries for use by the mathematical and geosciences communities made steady progress in first part of FY2004, but this effort slowed later in the year due to staff hour cutbacks in this area to support CRCP physics integration activities. In particular, development of the Spectral Toolkit library continued with the completion of the multithreaded spherical harmonic transform. Also developed were new distributed memory (MPI-based) 2D and 3D FFTs, using a generic pairwise generic transpose algorithm developed for the library. To date, completed components of the Spectral Toolkit include:
Work continued on the collaborative Earth System Modeling Framework (ESMF). A much anticipated release of ESMF software Version 2.0 occurred in July 2004. The ESMF Version 2.0 release includes software for representing and manipulating components, states, fields, grids, and arrays, as well as a number of utilities such as time management, configuration, and logging. It runs on a wide variety of computing platforms, including SGI, IBM, Compaq, and Linux variants. The Grid-BGC project completed a top-level user interface design, selected a GIS technology for handling maps and geographical information, began implementing Globus protocols, made implementation decisions regarding the software framework, completed "look-and-feel" designs for static and dynamic visualization tools, and performed data transfer and computational capacity testing on existing parallel hardware. The Earth System Grid (ESG) moved into production mode for climate model research data with dedicated service for IPCC services in the area of coupled climate model data. SCD completed work on the Web100 and NET100 projects. Computing Center Operations and Infrastructure ApplicationsReleased new versions of the MySCD portal which provides, for the first time, customizable GAU charging information directly to the users. SKIL was upgraded to include modify/delete functionality. Two collaborative projects are ongoing with the University of Colorado: the METIS event-based workflow system evaluation is nearly complete, and a group of students are modernizing the room reservation system. MySCD Portal version 2.0 Release The new version of the MySCD portal was released. This particular release was delayed as a result of security changes. These security changes required that the Portal system be retooled to support one-time passwords. The September release adds significant functionality that for the first time allows users of SCD's computational resources to get a summary view of their allocation usage including total percentage charges for computational and mass storage usage.
Additional accomplishments for applications:
Computer Room InfrastructureSCD is developing both short and long-term plans to meet the demands of future computing systems. Multiple options are being developed that include building a second data center to expand, this work will continue into FY2005. The Mesa Lab standby generators were commissioned and put into service, and some field modifications will be made to simplify their operation. The computer room has reached its maximum cooling capacity; FY2004 focused on a design and procurement process to upgrade the chilled water systems. The standby power generation system was installed and commissioned in March 2004. The shakedown and familiarization with the systems continued well into the summer with some modifications to the control sequence as the result of lessons learned.
Additional accomplishments for infrastructure:
Computer Room OperationsSupported, set up, and distributed CryptoCards as part of a new responsibility associated with strengthened security requirements. Media conversions continue with the move to 200-GB media. Rotating schedules have been a success with much more exposure of all operators to the rest of SCD staff. Operations stepped into a new support role that came about as part of new security requirements. During the March timeframe, the implementation of one-time passwords resulted in a new need to distribute and support CryptoCards. Enterprise ServicesNetwork and system security dominated the year. Several significant changes, including the introduction of one-time passwords, were completed very quickly to secure the supercomputing assets. Storage area networks were investigated along with several significant upgrades to production systems for data provisioning and web access. During March a new security perimeter was quickly established, and the Distributed Systems Group (DSG) designed, implemented, and rolled out a one-time password solution to protect supercomputing assets from intrusion attempts. These efforts were instrumental in returning the supercomputer systems to the network for community use in a three-week period. Since then, there have been a number of attempts that have been successfully turned away. In addition to the security work, a Storage Area Network (SAN) testbed was put together. A cost-effective solution is under investigation that will utilize Serial ATA technologies and a SAN solution that works in a heterogeneous environment. At this early point, Linux and Sun systems have been successfully integrated into the SAN. Additional accomplishments for enterprise services:
Network Engineering and Telecommunications The Network Engineering and Telecommunications Section (NETS) is responsible for all engineering, installation, operation, maintenance, strategy, planning, and research regarding the state-of-the-art data networking and telecommunications facilities for NCAR/UCAR. Support of these facilities requires NETS staff to:
In summary, NETS provides a vital service to the atmospheric research communities by linking scientists and supercomputing resources (including mass storage systems and other data processing resources) at NCAR to other resources and scientists throughout the university research community. Such high-performance networking activities are essential to the effective use of NCAR/UCAR's scientific resources, and they foster the overall advancement of scientific inquiry. The primary NETS accomplishments in 2004 include:
These projects along with the rest of the NETS 2004 accomplishments are described in this Annual Scientific Report. Networking research projects and technology tracking Networking research projects Steering Committee for Cyberinfrastructure Research and Development
in the Atmospheric Sciences
(CyRDAS) The Hybrid Optical and Packet Infrastructure Project
(HOPI) Web100 project The Web100 project has achieved significant progress on several of its key project milestones during the past year. By releasing the Web100 software to the general user community, numerous individuals and groups have incorporated the software into a wide and diverse set of useful applications. In addition, the Web100 TCP Extensions MIB is on the Internet Engineering Task Force (IETF) standards track and is expected to be submitted for last call by the end of this calendar year. Most importantly, the Web100 software is currently being officially incorporated into the Linux 2.6 kernel development release for use as library functions in all Linux distributions. Microsoft has reported incorporating much of the Web100 software functionality into the next release of their .NET server software, and a BSD port of the Web100 software is also underway. This project was completed in August 2004. Net100 project Network Path and Applications Diagnosis
(NPAD) NOAA High Performance Computing and Communications (HPCC) Program Access Grid Earth System Grid project Network technology tracking and transfer Local Area Network (LAN) projects NETS supports both NCAR/UCAR network needs as well as the special networking needs of SCD itself. Therefore, all LAN projects are further subdivided as being either NCAR/UCAR LAN projects or SCD LAN projects. NCAR/UCAR LAN projects UCAR network infrastructure recabling projects Concurrent with recabling, each network device is delivered 100 Mbps of dedicated bandwidth via a dedicated Ethernet packet-switch port. Such dedicated-port access offers substantial networking performance improvement over shared-media Ethernet access. NETS designed permanent network infrastructure for the CG1 and FL0 buildings. NETS assisted in the extensive relocation and related cabling for FL4 staff. NETS recabled the FL2-FL3 interbuilding cabling due to FL0 construction. NETS also participated in the following projects: CG bike path design, Jeffco hanger networking design and implementation, UNAVCO move and lease completion, and the Nextel cellular repeater design and implementation. Network monitoring project NETS continues to use HP Openview, flowscan, Prognosis and Cricket as its principal tools for network monitoring and statistics gathering. Additionally, NETS has installed certain specialized network monitors at the request of two national network-measuring organizations, namely MOAT and Internet2. NLANR's MOAT organization has placed an OC3MON monitor at NCAR's Mesa Lab and installed an OC12MON in the Front Range GigaPOP equipment racks located in CU Denver's computer room. MOAT has also placed an AMP monitor at both the Mesa Lab and the FRGP as well. On behalf of UCAID's Abilene network, Internet2. (ANS) has placed a Surveyor network monitor at both the Mesa Lab and at the FRGP. Local serial-access project NETS CSAC support project NETS is involved with CSAC since nearly all security policies involve various types of network-connected devices located between the networks belonging to the external world and the UCAR networks that are being protected from the external world. These network-attached devices can operate as filters and/or authentication devices operating at one or more OSI (Open Systems Interconnection) layers, usually at the Network/Router Layer (Layer 3) and higher. Based on CSAC recommendations, NETS continues to implement significant new gateway router filters to improve network security for UCAR. Extensive testing and extensive coordination throughout UCAR is required to implement the recommended security filters. NETS also cooperates on wireless, RAS, and VPN security measures. VLAN Splitting Project/Layer 2/3 Design NETS was driven to reevaluate this design with the addition of the third major campus, Center Green, and a desire of increased reliability for VoIP and business continuity. The old design did not include any redundant links, which would create network loops that needed to be dealt with using the spanning tree protocol. However, with the addition of the CG campus, NETS has built redundant links so the ML, FL and CG campuses are all connected in a triangle. While the spanning tree protocol works adequately in a simple, loop-free network, it is quite poor at handling redundant links and the loops they form. To take full advantage of the entire mesh of links between the campuses, it is necessary to use a more intelligent protocol at a higher layer. At the IP layer, NETS has been using OSPF for a number of years. It is
fully capable of handling the current topology, not only detecting and
routing around link failures in a matter of a few seconds, but properly
routing traffic over the shortest path between campuses. Presently, NETS
has the new backbone fully deployed and is in the process of restricting
subnets to a single campus. Completion of the project's final stage is
expected before the end of the year. UPS project In addition, SCD installed a generator at the Mesa Lab and NETS has tied their equipment into this in the computer room and in areas where safety and security support is critical. NETS is also in the process of tying their UPSs at FL into the facilities generator to provide additional business continuity. Grounding Wireless | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||