Biohealthmatics.com The 24th annual conference TEPR 2008 will open its doors on May 19, 2008 at the Fort Lauderdale Convention Center to more than 500 speakers, close to 5,000 attendees, and approximately 200 exhibitors.
advertisement
Biohealthmatics Centers
Home
Jobs Search
Career Center
Networking Center
Company Profiles
Knowledge Center
Industry News
Web Directory
Industry Books
Featured Articles

Biohealthmatics.com....linking professionals
advertisement

Join Us

Link To Us





Redundancy-based feature selection for high-dimensional data and application in bioinformatics

by Lei Yu

Publisher: ProQuest / UMI
Publication Date: Sunday, March 19, 2006
Number of Pages: 120
ISBN: 0542175991


Book Summary:
This dissertation studies feature selection: the problem of selecting a subset of features from the original ones in a data set. In many applications such as genomic microarray analysis and text document categorization, data often contains thousands of features. Many of them can be irrelevant or redundant to classification tasks. In order for learning algorithms to perform efficiently and effectively on high-dimensional data, it is imperative to remove irrelevant and redundant features. The data characteristics of high-dimensional data hinder the success of many applications and pose severe challenges for existing feature selection methods focusing on feature relevance analysis. This dissertation describes solutions for efficiently handling redundant features. First, it shows that feature relevance alone is insufficient for efficient and effective feature selection for high-dimensional data. Next, it proposes a systematic framework to perform explicit redundancy analysis in feature selection. Under the framework, it further introduces three feature selection algorithms, Fast Correlation Based Filter (FCBF), Redundancy Based Filter (RBF), and Reporter Surrogate Variable Program (RSVP), handling various types of high-dimensional data. FCBF allows for efficient selection of relevant but non-redundant features. When applied to gene expression microarray data, RBF can efficiently identify a small set of discriminative genes for accurate classification of biological samples. Based on a real-world problem of glioma migration, this dissertation discusses results of RSVP that bridges the gap between statistically significant findings and biologically significant insights.


advertisement

Book Reviews

Post a book review for this title

No reviews for this title. Be the first to post a review.

 

More Bioinformatics BooksMore Bioinformatics Books ...

 
 

 

 

 

   
Copyright © 2007 Biohealthmatics.com. All Rights Reserved. Contact Us - About Us - Privacy Policy - Terms & Conditions - Resources
Can't find what you are looking for? View our Site Map

Last Updated: 24 November 2007.