AIME 07
home | welcome | latest news | programme | venue | accommodation | photos | organisation
speakers | panel | CFP | tutorials | doc consortium | programme committee | workshops | previous conferences

Tutorials

  1. Introduction to Applied Clinical Data Mining
  2. Advanced Applied Clinical Data Mining

Introduction to Applied Clinical Data Mining

by John H. Holmes, PhD

This tutorial is intended to bring together AMIE attendees with an interest or experience in mining medical databases of all kinds, introducing attendees to the practical application of data mining. Using a well-known data mining life cycle as a conceptual framework, attendees will experience first-hand, thorough demonstration and direct participation, the techniques of mining clinical data.

This tutorial proposes to illustrate, via demonstration and hands-on experience, the application of data mining methodologies to a clinical database. A knowledge discovery life cycle model [1] will be employed as the conceptual framework for the tutorial. The goal of this tutorial is to provide attendees with practical experience in mining a database for use in clinical research, and ultimately for assisting with statistical analysis.

The selected database for this tutorial will be the Pima Indians Diabetes Database [2]. This database was selected because it is well known in the machine learning community, thereby providing a rich literature of application of various data mining paradigms to it. In addition, this database offers a variety of attribute types and substantial complexity, even though it contains only nine variables and 768 records. Finally, it is freely available and in the public domain. It is an excellent choice for demonstration and laboratory purposes.

The Weka data mining software package [3] will be used for demonstration in the tutorial. Weka is freely available in the public domain, and runs on even modestly equipped computers within a Java runtime environment (JRE). Weka and the JRE will be distributed to attendees on CD-ROM free of charge. Attendees will be encouraged to bring laptops to the tutorial, and they will be given the opportunity to install and use Weka there. Those who do not bring laptops will benefit from the detailed demonstrations in the tutorial.

The tutorial will cover: Introduction to Weka and the demonstration database; Data preparation and reduction; Data description and visualization; Association rule mining; Clustering; Classification and prediction rule mining; Interpreting and applying the results to analysis; and Summary and conclusion.

Intended audience: Clinical and basic science researchers will benefit most from this tutorial. The content level is 75% beginner, 25% intermediate
Prerequisites: None, although some prior exposure to the basic methodologies of data mining may be helpful.

Advanced Applied Clinical Data Mining

by John H. Holmes, PhD

This tutorial is intended to provide attendees with an in-depth look at the application and evaluation of data mining methods in several current problem areas in biomedical informatics, specifically population-based epidemiologic surveillance, gene-environment interaction studies, and proteomics.

This tutorial will focus on several families of data mining methodologies, including trees, clustering, Bayesian classification, evolutionary computation, visualization, and statistical classifiers. After a discussion of the general characteristics of biomedical data, such as missing values and feature selection problems, and methods for preparing biomedical data for mining, this tutorial will introduce examples of the selected families of tools for mining biomedical data, including thorough algorithmic descriptions, functional examples, and live demonstrations of each on several real-world biomedical datasets. The applications will focus specifically on rule discovery, emergence of clinical prediction rules, classification, and clustering, as appropriate to each method. The advantages and disadvantages of each method will be discussed in detail.

The tutorial will also include a rigorous discussion of methods for evaluating the results obtained from mining biomedical data, including classification and prediction accuracy and test characteristics such as sensitivity, specificity, area under the receiver operating characteristic curve, and predictive values. Finally, methods for validating classification and prediction models discovered by the various tools will be presented and discussed. These include the choice and use of suitable validation datasets, methods for comparing models and the use of human expert panels in providing content for qualitative model validation.

Attendees will have the opportunity to work on real-life data drawn from the three problem areas (population-based epidemiologic surveillance, gene-environment interaction studies, and proteomics) using a well-known open-source data mining suite, Weka. In addition, there will be ample time to allow for discussion of attendees’ problem sets, should they wish to bring them to the tutorial.

Intended audience: Clinical and basic science researchers will benefit most from this tutorial. The content level is 50% intermediate, 50% advanced.
Prerequisites: Prior exposure to the basic methodologies of data mining. Attendance at the AIME 05 tutorial Applied Clinical Data Mining or the AIME 07 tutorial Introduction to Applied Clinical Data Mining would also suffice as a prerequisite.
home | welcome | latest news | programme | venue | accommodation | photos | organisation
speakers | panel | CFP | tutorials | doc consortium | programme committee | workshops | previous conferences