NASA Research Project

LOCATING ANOMALIES IN LARGE DATA SETS
AND VERIFYING WITH SIMULATION MODELS

1998 Best Presentation at the Nasa Jove Conference

Shuttle Commander Don McDondale presented the team their award

Left to right: Matthew Heller, Karina Assiter, Don McDondale, Karen Panetta, Jamie Heller

Past years' photos

Satellite Animations:

Animated GIF files of 1 day of NASA IES satellite data.

Bumpy Globe Animation

Globe Animation

Cartesian Animation

Mission:

Current computer technology is capable of capturing and storing large amounts of data while visualization programs can present selected portions of the data. One problem scientists face is how to sift through massive data sets and locate interesting components or phenomena occurring in the data. Manual interpretation even with the aid of visualization programs is not feasible for large data sets. For example, NASA launches satellites for studying the Earth's ecosystems. The Earth Observing System (EOS) is capable of generating about a terabyte of data per day. NASA scientists are responsible for the validation and assuring the integrity of the retrieved data before it is released to the research community for analysis. One goal for this project is to provide NASA scientists an automated tool that will identify anomalies in the satellite data sets due to instrument miscalibrations or interesting phenomena. Given a large data set, the algorithms will decompose the data efficiently, identify regions of rapid changes, present the user with a relationship of parameters and generate a script to drive a visualization program such as FAST or IBM's Data Explorer. The design will be domain independent such that any data set can be analyzed provided the grid, solution or equations describing the quantities being measured are provided. Without supporting apriori information, the algorithms will perform correlation computations and display areas deemed of interest.

When analyzing massive data sets, there are two major concerns. The first is the amount of time needed to sift through the data. Another is correctly identifying the relationships among the data. This requires we understand what scientists look for and what they consider interesting phenomena. The algorithms developed will concentrate on efficiently manipulating large data sets and locating trends using parameter sensitivity and dependencies. Identified trends will be classified using neural networks so that the user may then save the grouping characteristics for further use in knowledge base systems or to tailor the analysis. For example, depending on whether a scientist finds the classified features interesting, she/he may choose to eliminate them from the analysis and search or decompose the class for more information. In the case where features are deemed not interesting, the analysis tool can be speeded up by ignoring or throwing these cases away. Finally, a final goal is to be able to interpolate between data to establish a trend. In EOS(Earth Observing Systems), data may only be measured weekly to get a daily estimate, some interpolation schemes must be employed.

This project also uses modeling and simulation to validate the observed phenomenon. The user has the choice of supplying the software with either models or equations of quantities that interest them. Specifying valid or expected ranges for specific values can also help the software refine its decomposition of the data. Multiple Domain and Concurrent Simulation (MDCS) algorithms are used to help compress, observe and understand the phenomena. Specifying an area within a grid will allow the program to zoom in on physical areas of interest. Zones in grids like bomb bays on aircraft are interesting and important areas to analyze. The computational fluid dynamics (CFD) of the system are the input to the analysis tool, along with the quantities that are of the most interest to the scientist. The major components of this project include

User Interface including accepting grids, solutions and equations of items of interest. The interface should allow users to specify unique aspects of the domain they are working in.

Data Decomposition Algorithms for efficiently isolating interesting anomalies. These algorithms will use behavioral modeling to simulate and observe anomalies within the data.

Classification of features detected using neural networks. Creating output scripts to be used in visualization programs such as FAST or Data Explorer.

Current Researchers:

Prof. Karen Panetta
Prof. Alva Couch
Robert Kogan
Karina Assiter

Tufts University
161 College Ave.
Medford, MA 02155
(617) 628-5000 x5976
email: Prof. Karen Panetta <karen@ece.tufts.edu>