Home   About Us   News   R&D   Products & Services  Licensing   Contact   Careers   

R&D
Electronics
Materials Technology
Information Sciences
 
Human Computer Interaction
Wireless Systems & Applications
Situation Awareness & Resource Allocation
Diagnostics & Control
System Design & Software Tools
 
Design Sheet
Integrated Data Analysis
Power Systems
Modeling & Simulation
Imaging Sensors
Products & Services
Infrared & Visible FPAs
IR Imaging Subsystems
Laser Eye Protection
Mixed Signal ICs
Sub-millimeter Wave Schottky Diodes
Optical Microlenses
Micro Fabrication Services
Integrated Data Analysis - A Powerful Program for Taming Data Whiteout

Introduction
What's the difference between data and information? Data is a collection of facts. Information is data that has been interpreted for use. Unfortunately, we are all drowning in data, without the benefit of information we can actually use. That's because data comes at us constantly from a myriad of sources like water out of a fire hose - a rapid, overpowering stream of numbers, facts, and figures.

Over the past decade, Teledyne Scientific Company has been engaged in numerous, challenging real world data analysis problems and learned the following important lessons:

  1. Preprocessing is the most important step in data analysis. Unfortunately, this step is ignored in today's leading data mining software products. Why? Because preprocessing involves multiple algorithms in multiple disciplines.
  2. In most applications, over 90% of stored data is either irrelevant or redundant. The value of data must be quantified instead of blindly storing data in multiple data warehouses. Without thorough understanding of the value of information, the storage requirement can be an order of magnitude too great.
  3. Useful data are stored in multiple sources in various formats. Without integrating information from these disparate sources, it is difficult to have an integrated view of the entire solution space. Therefore, it is of paramount importance that multiple metadata views be created for multiple departments so that they can work with one voice.
  4. There is too much emphasis on sophisticated learning. In general, system errors can be divided into two types - model mismatch and data mismatch. Model mismatch occurs when the learning algorithm does not capture all the nuances present in the training data, while data mismatch is the direct result of the actual real world data being different from the training data with which the algorithm parameters are tuned. In most real world situations, the magnitude of data mismatch is much greater than model mismatch. On numerous occasions, simpler learning algorithms outperform their more sophisticated brethrens when it matters.
  5. Human operators appreciate insights. They do not like unnecessarily complex scientific charts. They would like the system to answer the question "where is the beef?" and then provide drill down capabilities.
  6. Most performance gains can be attributable to an integrated and mutually complementary processing approach. That is, there is no silver bullet in data mining. If there is, then the pattern is too trivial to be exploited. Therefore, the core data mining module must be surrounded by a complementary set of processing algorithms, where the classification algorithm is specifically designed for the underlying good feature distribution and good features carefully selected from an available set of preprocessing and transformation algorithms.

Teledyne Scientific Company has developed advanced data mining technology to tame the data glut or "whiteout" and turn it into usable information. Their new technology is actually an extremely powerful, user friendly computer program that uses sophisticated search and optimizing algorithms to extract useful information from confusing, often seemingly unrelated data.

Originally designed for use by the U.S. Navy, this technology was the core software engine created to facilitate algorithm development for real time navigation through minefields. Now it's available for development by business to help companies find their way through the minefield of stored data and gain competitive advantage in the marketplace.

What sets this technology apart from other data mining applications is its simplicity, ease-of-use, and flexibility. Most data mining tools require IT staff assistance, expensive data warehouse tools to handle legacy databases, and analysis that can often delay results. Teledyne Scientific Company's technology combines several methodologies, including state-of-the-art digital signal processing, pattern classification, knowledge extraction, reasoning under uncertainty, optimization, decision analysis, and data visualization. Its in-depth functionality enables a wide variety of data resources to be mined and exploited, even images such as photographs and x-rays. This comprehensive scrutiny enables previously unseen trends, relationships, and patterns to emerge from the data. The results are available in real time and presented in clear, simple English for immediate use by decision makers without exhaustive briefings from trained statisticians.

Teledyne Scientific Company has assembled an impressive array of preprocessing, feature extraction, data mining, and information visualization algorithms from decades of basic and applied R&D contracts for various U.S. government laboratories and industry consulting. Our data analysis experiences include sonar, radar, relational customer and patient databases, sleep disorder data, microarray image data, human tissues for cancer cell detection, time series trend analysis, accounting, diagnostics, reconnaissance imagery, and surface anomaly, and weather image. Teledyne Scientific Company has filed for seven patents in the above six areas.

Typical Problems
Below are typical data management and analysis problems that can be successfully addressed by Teledyne Scientific Company's data mining technology.

Too much data with no analysis on value of information: Sarah Brighten is a corporate IT analyst. Over the past two years, she has learned to do more with less, as economic conditions show no sign of improving. Not only is she responsible for the safe storage of mountains of data in several corporate databases and data warehouses, she has to respond to urgent daily requests from multiple departments, each asking for the latest batch of data. She wonders why she has to baby terabytes of data, much of which seem irrelevant to increasing the bottom line of the corporation. Teledyne Scientific Company's technology contains easy-to-use tools for compressing and assessing the value of information.

Software tool is too complex and difficult to use: Kim Hughes is a marketing director. She installed a multi-million dollar software system for customer relationship management (CRM) over a year ago. It took them almost a year to get the kinks out of the system and make it work reliably over gobs of customer data. Unfortunately, she feels that they bought a Rolls Royce when a Honda Civic with customized options would've been sufficient. She's certain that they probably use less than 10% of the system features. Furthermore, because the CRM software is so complex, she had to hire a software specialist who doesn't know marketing. As a result, Kim finds herself spending more time on explaining to the specialist what she wants. In addition, she finds mountains of charts and scientific graphs generated by the software system intimidating and confusing. She's frustrated. All she wanted was some actionable insights in plain English from her customer data. Teledyne Scientific Company's software tool that can be used by domain experts with little expertise in data mining.

No provision for preprocessing: Samantha Wong is a data mining specialist. She is annoyed with the data mining software she's been using. With the recent emphasis on trend analysis, she finds herself spending most of her time writing custom software that will transform her unwieldy data into a flat table that her data mining software requires. Since there are so many ways to transform data, she's not sure if her method is appropriate. Her boss doesn't seem to understand that she spends all her time massaging data. He wonders why it is taking her a long time to produce results with expensive data mining software in her arsenal. April is flabbergasted when she hears that the IT department decided to collect more time series data, change the data format, and increase the sampling rate from one hour to ten minutes. All her work during the past three months just turned into a pile of dust and she needs to write more custom software. Teledyne Scientific Company's technology includes application specific preprocessing engines that are tightly integrated with the back-end easy-to-use data mining engine.

Key Technical Concepts
Teledyne Scientific Company developed its integrated approach to data analysis through several years of government R&D work in military signal processing.

In this field the main challenge is to extract and characterize extremely low signal-to-noise ratio (SNR) events from multiple sensors with high probability of detection and low false alarm rate. Since most intercept systems must be capable of handling various types on signal, the general approach is to run multiple transformation and detection algorithms in parallel with the data fusion engine at the back end. The second challenge is to filter and present derived information so that critical knowledge can be absorbed and disseminated as quickly and completely as possible.

The key technology discriminator is that we bring sophisticated signal/image processing, optimization, and data mining algorithms to domain experts without the usual technical jargon, thus demystifying data mining using simple language of intuition and information visualization. The fundamental design principle is judicious dimension reduction through data adaptive, sequential processing, which is conceptually similar to finding sufficient statistics in data analysis. That is, what is the minimum problem dimension that characterizes the entire data? Instead of arguing over which data mining algorithm is superior, we focus on finding the right set of algorithms given data and feature characteristics while paying close attention to the point of diminishing returns. This agnostic approach to data analysis is not only intuitively appealing, but also yields far superior performance to a dogmatic method that relies on a rigid set of rules or algorithm preferences. Figure 1 shows our integrated approach to data analysis.

Figure 1: The integrated approach to knowledge discovery that combines all the salient concepts in signal processing, optimization, and learning to deal effectively with time series, image, and multidimensional data with hierarchical relationships.

For example, contrary to popular beliefs in the image analysis community, we discovered that image compression based on wavelet set partitioning in hierarchical trees actually improves automatic target recognition (ATR) performance up to a certain point mainly because the benefits of noise suppression outweigh degradation in the fidelity of desired targets with high eccentricity and rough texture. This finding has significant implications in data storage requirements and cell recognition performance. On the other hand, one dimensional discrete cosine transform (DCT) provides the best performance in image compression of sonar grams because the predominant attributes of narrow band grams are line-like. The key step here is finding the required minimum problem dimension.

Dimension reduction implies that a small number of transform coefficients in a different domain captures most of the energy spread in the original raw data space. This concept has found its niche in signal and array processing: maximize the probability that multiple signals can be sorted in space, time, and frequency through dimension reduction or subspace filtering. This simple, yet powerful concept is exploited in a systematic and integrated manner to tackle any challenging data analysis problem.

In gene chip and tissue image analysis, the same logic applies. The key step is judicious dimension reduction as the processing stage transitions from raw data to information to knowledge. For example, a two-stage classify-before-detect (CBD) algorithm is capable of detecting and characterizing low-expression spots by virtue of energy compaction and dimension reduction. Similarly, a three-stage image processing algorithm can handle various image-analysis problems (rare cell detection, tissue recognition, and spot quality assessment) using appropriate levels of abstraction and dimensionality reduction.

Let's extend the same concept to classification. If the underlying class-conditional good-feature distribution is unimodal Gaussian, there is very little reason to resort to complex classification algorithms, such as support vector machines or radial basis functions. A simple multivariate Gaussian classifier will work equally well with substantially lower computational requirements. The algorithm recommendation engine is part of the Intelligent Data Mining Wizard that guides novice users through sometimes tedious and confusing data mining steps. Furthermore, hierarchical sequential pruning classification can provide excellent performance even for situations with complex distributions thanks to sequential dimension reduction.

In summary, the most important ingredient in successful data analysis is the seamless integration of various dimension reduction methodologies, all optimized to the underlying data characteristics. Probabilistic modeling of relationships between data and algorithms is currently in progress so that we can gain better insight into data analysis methodologies. This insight, coupled with more rigorous analysis of the impacts of various transformation and compression algorithms on the accentuation of desirable signal attributes and attenuation of undesirable components, will be invaluable in demystifying data mining and turning it into an essential and appreciated partner in conquering the problem of data "whiteout."

Application Examples
The following three short examples illustrate the major functionalities of our approach to integrated data analysis:

  1. Leukemia diagnosis
  2. Magazine subscriber analysis
  3. Thrombosis diagnosis

Leukemia Diagnosis
Recent advances in cDNA and oligonucleotide gene chip technology have been instrumental in allowing us to take multiple snapshots of gene level activities at an arbitrary level of abstraction in diagnostic and prognostic applications. In order to demonstrate the utility of gene chips in diagnostic applications, the Whitehead/MIT Center for Genome Research prepared a set of 7,129 gene expression data collected from 72 patients suffering from two types of cancer - acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). Figure 2 shows the three easy data analysis steps.

Figure 2: After loading the metadata, the user can select outputs and inputs using the I/O Help wizard (a). After I/O specification, the Intelligent DM wizard takes over and recommends an appropriate set of algorithms with their parameters filled in (b). The user has an option of proceeding with the recommended algorithms or selecting his/her own algorithms. After the user clicks on the Run pushbutton, the results are summarized in a combination of text and intuitive rank order curve, which shows that three out of over seven thousand genes are required for virtually 100% accuracy in leukemia diagnosis.

Magazine Subscriber Prediction
In this case, the goal is to identify the type of consumers likely to subscribe to a magazine based on socioeconomic, demographic, and other personal data. The user can type a question (How to predict a magazine subscriber?) and the search engine can sift through the existing database to find and recommend the most relevant metadata sets. Again the I/O Help wizard can assist the user in specifying inputs and outputs. Once the user confirms the I/O specification, the DM engine performs all the necessary calculations and presents the results in a succinct and understandable format.

In this case, likely magazine subscribers hold bankcards, have pets, and contribute to various organizations (i.e., socially active).

Thrombosis Diagnosis
This data set contains three relational database tables - general patient information, thrombosis test results, and medical history data. Normally, the end user would be required to transform this data into a flat table using customized software before commencing data mining using commercial data mining tools. However, Teledyne Scientific Company's preprocessing engine turns a patient's medical history data (irregularly sampled time series) into a set of compressed features that can be appended to higher level relational database tables, thus creating a unified metadata view.

The results are summarized in terms of the best classification algorithm, the actual performance in terms of Type I/II errors, and the best feature subset to be used in thrombosis diagnosis.

Licensing/Services
Teledyne Scientific Company offers licenses as well as software development and consulting services to address each client's unique data analysis needs.

Licensing: Teledyne Scientific Company desires to license this technology to a company that would like to productize this software and take it to market. It is ideal for data mining across a broad range of disciplines, including pattern recognition and signal processing in scientific research; competition and productivity analysis, risk management, and market studies for business; risk analysis, actuarial research, and market trends for the demand forecasting and insurance industries, to name just a few.

Software development: Teledyne Scientific Company has developed several unique toolboxes for integrated data analysis and high level system design for various U.S. government laboratories and commercial research establishments. These are highly interactive and versatile tools that can be customized to adequately address each customer's specific needs. We will leverage the existing tools in preprocessing, data mining, optimization, and visualization to deliver a cost effective and highly optimized solution to each client.

Consulting: Teledyne Scientific Company will provide consulting services to work on any challenging data analysis on an outsourcing basis.

Patents Filed

Title

Description

Automatic mapping of data characteristics to image and signal processing algorithms for feature extraction As bandwidth becomes more plentiful, data mining must be able to handle spatially and temporally sampled data, such as image and time series data, respectively. This invention describes a method to find appropriate digital signal processing (DSP) and image processing (IP) algorithms based on data characteristics. DSP and IP algorithms transform raw time series and image data into projection spaces, where good features can be extracted for data mining.
Automatic data exploration to seek meaningful relationships among original and derived fields by stealing CPU cycles This invention presents a method or apparatus for automatic data exploration with no human intervention when computer resources are underutilized so that actual data mining tasks can be performed with ease and speed.
Estimation of the point of diminishing returns in data mining This invention presents a method to quantify the extent to which a data mining algorithm captures useful information embedded in input data. The key concept is forward-reverse mapping between feature space and classification space, where we perform confusion analysis. That is, we quantify the consistency in the levels of confusion in the two spaces.
Hierarchical characterization of fields from multiple tables with one-to-many relations for comprehensive data mining This invention presents a method to summarize or characterize information scattered over multiple tables that are related through one-to-many relationships. The end result is a metadata table, which is a collection of multiple relational tables.
Intelligent performance optimizer that recommends a set of classifiers and parameters based on good-feature distribution and user preferences This invention describes a knowledge-based, automated performance optimizer that characterizes good-feature probability distribution with a vector of features and assigns appropriate decision algorithms by mapping the feature vector and user's preferences onto a decision-algorithm surface.
Text display of key data mining performance results The invention conveys key performance results of a data mining operation in plain English so that a novice user can understand them without having to consult an expert for interpretation.
One-step data mining with a "where am I" interrupt button This invention describes a method or apparatus that permits one-step data mining for novice users, thereby avoiding all the headaches and confusions associated with the interactive nature of data mining, namely specification of numerous parameters associated with various steps in data mining.

Home      About Us      News      R&D      Products & Services      Licensing      Contact      Careers   
Teledyne Scientific & Imaging, LLC
1049 Camino Dos Rios, Thousand Oaks, CA 91360
Phone: (805) 373-4545   Fax: (805) 373-4775
Copyright © 2008 Teledyne Technologies Incorporated. All rights reserved.