Inputs to Decadal Survey, Panel on Theory and Computation In addition to the issues identified already by the Panel, which focus primarily on the theory part of the Panel's scope, it is important that this Panel's report fully address issues related to data archives, data mining, data access and interoperability, and the steps required to develop an astrophysics- wide information system which encompasses everything from raw data to the peer-reviewed journals. Many elements of this information system are now in place or are in development, but serious effort will be required in the next decade to knit these pieces together into an easy-to-use, discipline-wide service that can handle the size, complexity, and diversity of modern multi-wavelength data sets. Astrophysics information services must be interlinked with services in related disciplines such as planetary science, space physics, and as the field emerges, astrobiology. We are right now at a critical juncture in astronomy with regard to the imminent avalanche and advent of extraordinarily large, rich, diverse, and multi-spectral astronomical databases. The concept of a "virtual sky" is now more real than ever. The opportunity to prepare for this data flood and the time to empower the research community to take advantage of the data in that flood are now upon us. Electronic Publications ----------------------- The American Astronomical Society has been a leader in the area of scholarly publications in migrating to and taking full advantage of the electronic publication medium. This includes developing e-journals with full internal linking and links to key external data and information providers, including the NASA-supported Astrophysics Data System (which provides abstracts of papers in all major astronomical publications and scanned images of the historical literature), the NASA Extragalactic Database (NED), NASA's Astronomical Data Center, the Centre de Donnees astronomiques (CDS) in Strasbourg, and many other facilities. This effort must continue; as networking and information systems technologies advance, the astrophysics community must move forward to adopt and adapt these technologies for our purposes. Interoperable Data Archives and Data Mining ------------------------------------------- Space- and ground-based observatories are producing data at tremendous rates. For example, the HST archive now comprises over 6 TB of data, and other NASA missions operating in bandpasses from radio wavelengths to high energy X-rays will produce data of comparable or larger volumes. Ground-based telescopes are now being outfitted with 8k x 8k or 16k x 16k detectors, yielding multi-gigabyte data volumes for each night of observing. In the 1990s we suffered greatly from a negligence in archiving ground-based observations, with the result that a decade or more of digital data may well be lost entirely. Key research problems simply cannot be accomplished without access to multi-spectral, multi-epoch data sources. Digital data archives must be established for all major facilities, and some level of standard data products and associated metadata describing these data products must be available via a distributed data system if we are to fully exploit the information obtained from the large, expensive telescopes we build. The archives themselves become a research resource, as data taken in support of specific projects can be utilized for broader scale surveys or reanalyzed for other purposes. And, of course, the archives establish a history of the sky invaluable for studies of time dependent phenomena. Implicit in this discussion is the requirement that data archives are maintained, with transfer to next generation storage media on a regular basis, and that ongoing operational support for these services is maintained. The NASA astrophysics community is successfully organizing its data management activities according to wavelength (with a focus on radio, IR, and interferometric data at IPAC/IRSA; near-IR, optical, and UV data at STScI/MAST; and high energy data at GSFC/HEASARC), with complementary and collaborating mission-specific data centers (e.g., the AXAF Science Center). Several large scale surveys now in progress will produce multi-terabyte archives with terabyte-scale catalogs (e.g., 2MASS, GSC-II, SDSS, NVSS, DENIS). These resources will only be fully exploited if users can easily search through their contents and discover new or unexpected correlations. Data mining technologies need to be developed to this end, ideally so that not only derived catalogs but also calibrated imaging and spectral data can be analyzed on demand, enabling discovery and investigations of new classes of objects. Such technologies will require advanced computational resources, with processing power and I/O capabilities exceeding current systems by perhaps two orders of magnitude. In recent years, baseline funding for missions and projects have been insufficient to address the issue of providing researchers with efficient, seamless data access and innovative ways of querying, combining, and analyzing very large and diverse multiwavelength data sets to enable new methods for discovery from the emerging "virtual sky". Funding in this area must be substantially boosted to keep up with the astronomical archives of the next decade and beyond. Astronomers will rely more and more on the national network infrastructure for data and information exchange. Major space missions already provide observers with their data primarily via the Internet. As data volumes increase, however, astronomers will require increasing network bandwidths (as well as efficient data compression methods). We estimate that of order 5 petabytes of data will need to be effectively archived and analyzed with new software and hardware technologies as a result of ground- and space-based astronomical missions over the next decade, and that a significant fraction of this data will likely need to traverse the Internet. Data Analysis Software and Systems ---------------------------------- The efficiency and reliability of data reduction and analysis in astronomy has been greatly increased through the use of several widely distributed, institutionally supported data analysis systems: IRAF (NOAO), AIPS (NRAO), MIDAS (ESO), and Starlink (RAL/UK). In the 1990s only one new data reduction and analysis system development effort was initiated, AIPS++, and it will become a mature system only after a few more years. The other systems were designed nearly twenty years ago, and while some are undergoing substantial incremental changes (e.g., the "OpenIRAF" initiative) others have seen little evolution or are now frozen. One alternative approach which has recently been successful is providing more general toolkits, such as the FTOOLS package used extensively in high energy astrophysics, which allow users to plug applications into a variety of environments and use those applications on data in a variety of formats. User expectations are now based on the paradigm of shrink-wrapped, mass-produced PC software backed by mega-corporations. Although we cannot hope to compete with such software given our small, specialized community, in general our community-developed data analysis systems have not kept pace with the computer industry. Major development efforts will be required in the next decade in order to provide software systems capable of handling the larger and more complex data sets that next generation observatories will produce. The astronomy community has benefited greatly from the use of a common data format -- FITS. No other scientific discipline has reached so broad a consensus on a data standard. Virtually all astronomical data from ground and space-based instruments is produced and distributed in FITS format, a situation that greatly enables multi-spectral data analysis. The FITS standard has been evolving since its introduction in 1981, but retains features characteristic of a two-decade old design and that are now serious limitations for supporting complex data structures. The community will need to develop a successor to FITS which takes into account current data management technologies yet retains backward compatibility with our legacy data holdings. The community must also make greater efforts toward software portability and re-use. We have invested at least 1000 staff-years of effort in the development of astronomical software in the past decade, and software development is too expensive for us to continuously reinvent the wheel. Object-oriented technologies promise greater re-use, and astronomy- specific directories and indexes of software libraries should be maintained in order to foster efficient development efforts. Common tools for, e.g., submission of observing proposals and schedules, should be shared within the community, and common protocols for exchange of information should be developed and widely adopted. The NASA-sponsored Astrophysics Data Program (ADP) and Applied Information Systems Research Program (AISRP) have led to highly successful, widely used tools and services such as NED, SkyView, and the IDL Astronomy Library. Agency support for such developments should be continued and strengthened. Young astronomers must be encouraged to become fully computer literate and should become involved in building the software tools required in the next decade. Summary ------- Our general concern is that the importance of support for data access, data management, and data reduction and analysis software not be overlooked by the Theory and Computing Panel or in the overall report of the Decadal Survey Committee. Without adequate support for new developments in these areas our community will not see anywhere near the full benefits of its investment in new telescopes and instrumentation. In the 1990s we have already seen an increasing need for correlative science, with observations from one facility (obtained by a different observer for a different purpose) complementing those from other facilities. As our ability to observe the universe continues to expand in sensitivity and wavelength coverage, our need to relate data from diverse sources and from theoretical models will increase further. Robert J. Hanisch Space Telescope Science Institute hanisch@stsci.edu Kirk D. Borne Raytheon Information Technology and Scientific Services NASA/Goddard Space Flight Center Cynthia Cheung NASA/Goddard Space Flight Center Giuseppina Fabbiano Smithsonian Astrophysical Observatory George Jacoby National Optical Astronomy Observatories Barry F. Madore California Institute of Technology and Carnegie Institution of Washington Joseph M. Mazzarella California Institute of Technology Thomas A. McGlynn Universities Space Research Association NASA/Goddard Space Flight Center Steven S. Murray Smithsonian Astrophysical Observatory Paolo Padovani Space Telescope Science Institute Edward J. Shaya NASA/Goddard Space Flight Center Nicholas E. White NASA/Goddard Space Flight Center Richard A. White NASA/Goddard Space Flight Center