Inputs to Decadal Survey, Panel on Theory and Computation


In addition to the issues identified already by the Panel, which focus
primarily on the theory part of the Panel's scope, it is important that this
Panel's report fully address issues related to data archives, data mining, data
access and interoperability, and the steps required to develop an astrophysics-
wide information system which encompasses everything from raw data to the
peer-reviewed journals.  Many elements of this information system are now in
place or are in development, but serious effort will be required in the next
decade to knit these pieces together into an easy-to-use, discipline-wide
service that can handle the size, complexity, and diversity of modern
multi-wavelength data sets.  Astrophysics information services must be
interlinked with services in related disciplines such as planetary science,
space physics, and as the field emerges, astrobiology.  We are right now at a
critical juncture in astronomy with regard to the imminent avalanche and advent
of extraordinarily large, rich, diverse, and multi-spectral astronomical
databases.  The concept of a "virtual sky" is now more real than ever.  The
opportunity to prepare for this data flood and the time to empower the research
community to take advantage of the data in that flood are now upon us.

Electronic Publications
-----------------------
The American Astronomical Society has been a leader in the area of scholarly
publications in migrating to and taking full advantage of the electronic
publication medium.  This includes developing e-journals with full internal
linking and links to key external data and information providers, including the
NASA-supported Astrophysics Data System (which provides abstracts of papers in
all major astronomical publications and scanned images of the historical
literature), the NASA Extragalactic Database (NED), NASA's Astronomical Data
Center, the Centre de Donnees astronomiques (CDS) in Strasbourg, and many other
facilities.  This effort must continue; as networking and information systems
technologies advance, the astrophysics community must move forward to adopt and
adapt these technologies for our purposes.

Interoperable Data Archives and Data Mining
-------------------------------------------
Space- and ground-based observatories are producing data at tremendous
rates.  For example, the HST archive now comprises over 6 TB of data, and 
other NASA missions operating in bandpasses from radio wavelengths to high
energy X-rays will produce data of comparable or larger volumes.  Ground-based
telescopes are now being outfitted with 8k x 8k or 16k x 16k detectors,
yielding multi-gigabyte data volumes for each night of observing.  In the
1990s we suffered greatly from a negligence in archiving ground-based
observations, with the result that a decade or more of digital data may
well be lost entirely.  Key research problems simply cannot be accomplished
without access to multi-spectral, multi-epoch data sources.  Digital data
archives must be established for all major facilities, and some level of
standard data products and associated metadata describing these data products
must be available via a distributed data system if we are to fully exploit the
information obtained from the large, expensive telescopes we build.  The
archives themselves become a research resource, as data taken in support of
specific projects can be utilized for broader scale surveys or reanalyzed for
other purposes.  And, of course, the archives establish a history of the sky
invaluable for studies of time dependent phenomena.  Implicit in this
discussion is the requirement that data archives are maintained, with transfer
to next generation storage media on a regular basis, and that ongoing
operational support for these services is maintained.

The NASA astrophysics community is successfully organizing its data management
activities according to wavelength (with a focus on radio, IR, and
interferometric data at IPAC/IRSA; near-IR, optical, and UV data at STScI/MAST;
and high energy data at GSFC/HEASARC), with complementary and collaborating
mission-specific data centers (e.g., the AXAF Science Center).  Several large
scale surveys now in progress will produce multi-terabyte archives with
terabyte-scale catalogs (e.g., 2MASS, GSC-II, SDSS, NVSS, DENIS).  These
resources will only be fully exploited if users can easily search through their
contents and discover new or unexpected correlations.  Data mining technologies
need to be developed to this end, ideally so that not only derived catalogs but
also calibrated imaging and spectral data can be analyzed on demand, enabling
discovery and investigations of new classes of objects.  Such technologies will
require advanced computational resources, with processing power and I/O
capabilities exceeding current systems by perhaps two orders of magnitude.

In recent years, baseline funding for missions and projects have been
insufficient to address the issue of providing researchers with efficient,
seamless data access and innovative ways of querying, combining, and analyzing
very large and diverse multiwavelength data sets to enable new methods for
discovery from the emerging "virtual sky".  Funding in this area must be
substantially boosted to keep up with the astronomical archives of the next
decade and beyond.

Astronomers will rely more and more on the national network infrastructure for
data and information exchange.  Major space missions already provide observers
with their data primarily via the Internet.  As data volumes increase, however,
astronomers will require increasing network bandwidths (as well as efficient
data compression methods).  We estimate that of order 5 petabytes of data 
will need to be effectively archived and analyzed with new software and hardware
technologies as a result of ground- and space-based astronomical missions over
the next decade, and that a significant fraction of this data will likely need
to traverse the Internet.

Data Analysis Software and Systems
----------------------------------
The efficiency and reliability of data reduction and analysis in astronomy
has been greatly increased through the use of several widely distributed, 
institutionally supported data analysis systems:  IRAF (NOAO), AIPS (NRAO),
MIDAS (ESO), and Starlink (RAL/UK).  In the 1990s only one new data reduction
and analysis system development effort was initiated, AIPS++, and it will
become a mature system only after a few more years.  The other systems were
designed nearly twenty years ago, and while some are undergoing substantial
incremental changes (e.g., the "OpenIRAF" initiative) others have seen little
evolution or are now frozen.  One alternative approach which has recently been
successful is providing more general toolkits, such as the FTOOLS package used
extensively in high energy astrophysics, which allow users to plug applications
into a variety of environments and use those applications on data in a variety
of formats.  User expectations are now based on the paradigm of shrink-wrapped,
mass-produced PC software backed by mega-corporations.  Although we cannot hope
to compete with such software given our small, specialized community, in
general our community-developed data analysis systems have not kept pace with
the computer industry.  Major development efforts will be required in the next
decade in order to provide software systems capable of handling the larger and
more complex data sets that next generation observatories will produce.

The astronomy community has benefited greatly from the use of a common
data format -- FITS.  No other scientific discipline has reached so broad
a consensus on a data standard.  Virtually all astronomical data from ground
and space-based instruments is produced and distributed in FITS format, a
situation that greatly enables multi-spectral data analysis.  The FITS
standard has been evolving since its introduction in 1981, but retains
features characteristic of a two-decade old design and that are now serious
limitations for supporting complex data structures.  The community will
need to develop a successor to FITS which takes into account current
data management technologies yet retains backward compatibility with our 
legacy data holdings.

The community must also make greater efforts toward software portability and
re-use.  We have invested at least 1000 staff-years of effort in the
development of astronomical software in the past decade, and software
development is too expensive for us to continuously reinvent the wheel.
Object-oriented technologies promise greater re-use, and astronomy- specific
directories and indexes of software libraries should be maintained in order to
foster efficient development efforts.  Common tools for, e.g., submission of
observing proposals and schedules, should be shared within the community, and
common protocols for exchange of information should be developed and widely
adopted.  The NASA-sponsored Astrophysics Data Program (ADP) and Applied
Information Systems Research Program (AISRP) have led to highly successful,
widely used tools and services such as NED, SkyView, and the IDL Astronomy
Library.  Agency support for such developments should be continued and
strengthened.  Young astronomers must be encouraged to become fully computer
literate and should become involved in building the software tools required in
the next decade.

Summary
-------
Our general concern is that the importance of support for data access, data
management, and data reduction and analysis software not be overlooked by the
Theory and Computing Panel or in the overall report of the Decadal Survey
Committee.  Without adequate support for new developments in these areas our
community will not see anywhere near the full benefits of its investment in new
telescopes and instrumentation.  In the 1990s we have already seen an
increasing need for correlative science, with observations from one facility
(obtained by a different observer for a different purpose) complementing those
from other facilities.  As our ability to observe the universe continues to
expand in sensitivity and wavelength coverage, our need to relate data from
diverse sources and from theoretical models will increase further.


Robert J. Hanisch
Space Telescope Science Institute
hanisch@stsci.edu

Kirk D. Borne
Raytheon Information Technology and Scientific Services
NASA/Goddard Space Flight Center

Cynthia Cheung
NASA/Goddard Space Flight Center

Giuseppina Fabbiano
Smithsonian Astrophysical Observatory

George Jacoby
National Optical Astronomy Observatories

Barry F. Madore
California Institute of Technology and Carnegie Institution of Washington

Joseph M. Mazzarella
California Institute of Technology

Thomas A. McGlynn
Universities Space Research Association
NASA/Goddard Space Flight Center

Steven S. Murray
Smithsonian Astrophysical Observatory

Paolo Padovani
Space Telescope Science Institute

Edward J. Shaya
NASA/Goddard Space Flight Center

Nicholas E. White
NASA/Goddard Space Flight Center

Richard A. White
NASA/Goddard Space Flight Center