[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SEMINARI - DOTTORATO DI RICERCA
Vi annuncio che il Dipartimento di Matematica e Statistica di Napoli sta
ospitando i Professori J.H. Friedman (Stanford University, USA), J.J.
Meulman (Leiden University, The Netherlands), D. Jaruskova (Czech Technical
university, Praha) e J. Antoch (Charles University, Praha).
Nell'ambito delle attivitą per il Corso di Dottorato in Statistica
Computazionale, questi docenti terranno i seguenti seminari nei giorni 17 e
18 settembre secondo il calendario indicato.
MARTEDI 17 SETTEMBRE dalle ore 15 alle 17
==================================
CLUSTERING OBJECTS on SUBSETS of ATTRIBUTES- COSA PART I
Jerome H. Friedman
Stanford University, Dept. of Statistics, E-mail: jhf@stanford.edu
The goal of cluster analysis is to partition a data set with N objects into
groups (clusters) such that objects within a particular group are more
similar to each other than to those objects belonging to other clusters.
The basis of such a cluster analysis is set of (pseudo)distances (also
called dissimilarities) between the objects that are often derived from
measured attributes (variables). This paper proposes a new approach to
derive such distances. One of its main features is that the method assigns
a small distance to a pair of objects that have close values on a (any)
subset of the attribute variables, regardless of their similarity on the
complement set of variables. Small distances are emphasized by the use of
the inverse exponential distance, which has a close relation to the
so-called harmonic distance. Differential weights are optimally assigned to
the separate variables to be used in combination with the inverse
exponential/harmonic distance for each pair of objects. Each weight is
derived using the median distance of an object to its K nearest neighbors.
The distances between all pairs of objects are obtained by the use of an
iterative multistep procedure. Using the resulting distance measure in
conjunction with standard distance based clustering algorithms encourages
the detection of subgroups of objects that preferentially cluster on
subsets of the variables, without having to explicitly search for the
relevant subsets. The relevant variable subsets for each individual cluster
can be different and may partially (or completely) overlap with those for
other clusters. This property is especially important when the number of
variables in the data set is very large.
CLUSTERING OBJECTS on SUBSETS of ATTRIBUTES- COSA PART II
Jacqueline J. Meulman
Leiden University, Data Theory Group, E-mail: meulman@fsw.leidenuniv.nl
Most clustering algorithms attempt to find clusters without special
attention to particularly desired properties. A perfect clustering
algorithm would succeed in finding all existing clusters including those
with the desired properties. In contrast, however, all commonly used
clustering algorithms employ necessarilyy highly restrictive search
strategies that might reveal strong, but uninteresting clusters, while more
subtle clusters of higher interest remain undetected. Therefore the COSA
method has been extended to force algorithms to focus on clusters with
predefined "interesting" properties. In so-called targeted clustering we
seek clusters that simultaneously group on (different) unspecified variable
subsets near preferred values of the variables, for instance, being
unusually high or low (or both), while strong clusters near moderate values
are ignored. For categorical variables, one may be interested in clusters
arising from unusually low marginal frequencies, instead of from high
marginal frequencies that represent perhaps uninteresting commonly
occurring categories.
Applications to several different domains will be presented. One of these
will be the analysis of gene expression micro arrays, where the ratio of
number of variables (genes) to number of objects (samples) is typically
extremely large. Computational details in the multistep algorithm will be
discussed.
MERCOLEDI 18 SETTEMBRE dalle ore 10 alle 12
=====================================
CHANGE POINT ESTIMATORS IN CONTINUOUS QUADRATIC REGRESSION
Daniela Jaruskova
Czech Technical University, Fac. of Engineering, Praha
MERCOLEDI 18 SETTEMBRE dalle ore 15 alle 17
=====================================
OPTIMAL CLASSIFICATION TREES
Jaromir Antoch
Charles University, Department of Probability and Statistics, Praha