[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Seminario Meulman - Clustering objects on subsets of variables - Cassino 16/2
UNIVERSITA' DEGLI STUDI DI CASSINO
DIPARTIMENTO DI SCIENZE ECONOMICHE
*************************************************************************
SEMINARIO
"New developments in COSA: Clustering objects on subsets of variables"
*************************************************************************
Jacqueline J. Meulman
Data Theory Group, Faculty of Social and Behavioral Sciences,
Leiden University, The Netherlands
Giovedì' 16 febbraio 2006, ore 14:00-15:00, Aula 9.01
Facolta' di Economia
Via S.Angelo - Località Folcara, 03043 Cassino
Abstract:
The motivation for clustering objects on subsets of attributes (COSA; Friedman
& Meulman, 2004) was given by consideration of data where the number of
attributes is much larger than the number of objects. Obvious application is in
systems biology (genomics, proteomics, and metabolomics. When we have a large
numbers of attributes, objects might cluster on some attributes, and be far
apart on all others. Common data analysis approaches in systems biology are to
cluster the attributes first, and only after having reduced the original many-
attribute data set to a much smaller one, one tries to cluster the objects. The
problem here, of course, is that we would like to select those attributes that
discriminate most among the objects (so we have to do this while regarding all
attributes multivariately), and it is usually not good enough to inspect each
attribute univariately.
Therefore, two tasks has to be carried out simultaneously: cluster the objects
into homogeneous groups, while selecting different subsets of variables (one
for each group of objects). The attribute subset for any discovered group may
be completely, partially or nonoverlapping with those for other groups. To
avoid local optima, it is shown in Friedman and Meulman (2004) that we need to
start with the inverse exponential mean (rather than the arithmetic mean) of
the separate attribute distances. By using a homotopy strategy, the algorithm
creates a smooth transition of the inverse exponential distance to the mean of
the ordinary Euclidean distances over attributes.
New insight will be presented for the weights that are crucial in the COSA
procedure but that were rather underexposed as diagnostics in the original
paper.
Keywords: Clustering on variable subsets, distance-based clustering, inverse
exponential distance, targeted clustering, homotopy parameter, systems biology,
genomics, proteomics, metabolomics, Leiden ApoE3 data.
References
[1] Friedman, J.H., & Meulman, J.J. (2004a). Clustering objects on subsets
of variables (with discussion). Journal of the Royal Statistical Society,
Series B, 66, 815-849
[2] Friedman, J.H., & Meulman, J.J. (2004b). The COSA program in an R-
environment, available at
http://www-stat.stanford.edu/~jhf/COSA.html
[3] Van der Greef, J., Davidov, E., Verheij, E., Vogels, J., Van der
Heijden, R., Adourian, A.S., Oresic, M., Marple, E.W., & Naylor, S. (2003). The
role of metabolomics in drug discovery: A new vision for drug discovery and
development. In Harrigan, G.G., & Goodacre R. (Eds.), Metabolic profiling: Its
role in biomarker discovery and gene function analysis (pp. 170-198). Boston:
Kluwer.
***********************************************************************
http://stat.unicas.it/
--
Giovanni C. Porzio
Dipartimento di Scienze Economiche
Università degli Studi di Cassino
Via S. Angelo - Localita' Folcara
03043 Cassino (FR) - Italia
tel: +39.0776.2993448
www.eco.unicas.it/docente/porzio
-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
=============================================================================
NOTA:
Le norme per utilizzare il forum SIS e le istruzioni per iscrizione
e cancellazione sono disponibili all'indirizzo
.
. http://w3.uniroma1.it/sis/forum.asp
.
L'archivio di tutti i messaggi (aggiornato al giorno precedente)
e' disponibile all'indirizzo
.
. http://www.stat.unipg.it/cgi-bin/wilma/sis/
.
=============================================================================