[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SEMINARI - DOTTORATO DI RICERCA



Vi annuncio che il Dipartimento di Matematica e Statistica di Napoli sta 
ospitando i Professori J.H. Friedman (Stanford University, USA), J.J. 
Meulman (Leiden University, The Netherlands), D. Jaruskova (Czech Technical 
university, Praha) e J. Antoch (Charles University, Praha).

Nell'ambito delle attivitą per il Corso di Dottorato in Statistica 
Computazionale, questi docenti terranno i seguenti seminari nei giorni 17 e 
18 settembre secondo il calendario indicato.

MARTEDI 17 SETTEMBRE dalle ore 15 alle 17
==================================

CLUSTERING OBJECTS on SUBSETS of ATTRIBUTES- COSA  PART I
Jerome H. Friedman
Stanford University, Dept. of Statistics, E-mail: jhf@stanford.edu

The goal of cluster analysis is to partition a data set with N objects into 
groups (clusters) such that objects within a particular group are more 
similar to each other than to those objects belonging to other clusters. 
The basis of such a cluster analysis is set of (pseudo)distances (also 
called dissimilarities) between the objects that are often derived from 
measured attributes (variables). This paper proposes a new approach to 
derive such distances. One of its main features is that the method assigns 
a small distance to a pair of objects that have close values on a (any) 
subset of the attribute variables, regardless of their similarity on the 
complement set of variables. Small distances are emphasized by the use of 
the inverse exponential distance, which has a close relation to the 
so-called harmonic distance. Differential weights are optimally assigned to 
the separate variables to be used in combination with the inverse 
exponential/harmonic distance for each pair of objects. Each weight is 
derived using the median distance of an object to its K nearest neighbors. 
The distances between all pairs of objects are obtained by the use of an 
iterative multistep procedure. Using the resulting distance measure in 
conjunction with standard distance based clustering algorithms encourages 
the detection of subgroups of objects that preferentially cluster on 
subsets of the variables, without having to explicitly search for the 
relevant subsets. The relevant variable subsets for each individual cluster 
can be different and may partially (or completely) overlap with those for 
other clusters. This property is especially important when the number of 
variables in the data set is very large.

CLUSTERING OBJECTS on SUBSETS of ATTRIBUTES- COSA  PART II
Jacqueline J. Meulman
Leiden University, Data Theory Group, E-mail: meulman@fsw.leidenuniv.nl

Most clustering algorithms attempt to find clusters without special 
attention to particularly desired properties. A perfect clustering 
algorithm would succeed in finding all existing clusters including those 
with the desired properties. In contrast, however, all commonly used 
clustering algorithms employ necessarilyy highly restrictive search 
strategies that might reveal strong, but uninteresting clusters, while more 
subtle clusters of higher interest remain undetected. Therefore the COSA 
method has been extended to force algorithms to focus on clusters with 
predefined "interesting" properties. In so-called targeted clustering we 
seek clusters that simultaneously group on (different) unspecified variable 
subsets near preferred values of the variables, for instance, being 
unusually high or low (or both), while strong clusters near moderate values 
are ignored. For categorical variables, one may be interested in clusters 
arising from unusually low marginal frequencies, instead of from high 
marginal frequencies that represent perhaps uninteresting commonly 
occurring categories.
Applications to several different domains will be presented. One of these 
will be the analysis of gene expression micro arrays, where the ratio of 
number of variables (genes) to number of objects (samples) is typically 
extremely large. Computational details in the multistep algorithm will be 
discussed.


MERCOLEDI 18 SETTEMBRE dalle ore 10 alle 12
=====================================

CHANGE POINT ESTIMATORS IN CONTINUOUS QUADRATIC REGRESSION
Daniela Jaruskova
Czech Technical University, Fac. of Engineering, Praha


MERCOLEDI 18 SETTEMBRE dalle ore 15 alle 17
=====================================

OPTIMAL CLASSIFICATION TREES
Jaromir Antoch
Charles University, Department of Probability and Statistics, Praha