When analyzing massive data sets, there are two major concerns. The first is the amount of time needed to sift through the data. Another is correctly identifying the relationships among the data. This requires we understand what scientists look for and what they consider interesting phenomena. The algorithms developed will concentrate on efficiently manipulating large data sets and locating trends using parameter sensitivity and dependencies. Identified trends will be classified using neural networks so that the user may then save the grouping characteristics for further use in knowledge base systems or to tailor the analysis. For example, depending on whether a scientist finds the classified features interesting, she/he may choose to eliminate them from the analysis and search or decompose the class for more information. In the case where features are deemed not interesting, the analysis tool can be speeded up by ignoring or throwing these cases away. Finally, a final goal is to be able to interpolate between data to establish a trend. In EOS(Earth Observing Systems), data may only be measured weekly to get a daily estimate, some interpolation schemes must be employed.
This project also uses modeling and simulation to validate the observed phenomenon. The user has the choice of supplying the software with either models or equations of quantities that interest them. Specifying valid or expected ranges for specific values can also help the software refine its decomposition of the data. Multiple Domain and Concurrent Simulation (MDCS) algorithms are used to help compress, observe and understand the phenomena. Specifying an area within a grid will allow the program to zoom in on physical areas of interest. Zones in grids like bomb bays on aircraft are interesting and important areas to analyze. The computational fluid dynamics (CFD) of the system are the input to the analysis tool, along with the quantities that are of the most interest to the scientist. The major components of this project include
User Interface including accepting grids, solutions and equations of items of interest. The interface should allow users to specify unique aspects of the domain they are working in.
Data Decomposition Algorithms for efficiently isolating interesting anomalies. These algorithms will use behavioral modeling to simulate and observe anomalies within the data.
Classification of features detected using neural networks. Creating output scripts to be used in visualization programs such as FAST or Data Explorer.