[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Seminario Meulman - Clustering objects on subsets of variables - Cassino 16/2



UNIVERSITA' DEGLI STUDI DI CASSINO
DIPARTIMENTO DI SCIENZE ECONOMICHE
*************************************************************************

SEMINARIO

"New developments in COSA: Clustering objects on subsets of variables"
*************************************************************************

Jacqueline J. Meulman
Data Theory Group, Faculty of Social and Behavioral Sciences,
Leiden University, The Netherlands

Giovedì' 16 febbraio 2006, ore 14:00-15:00, Aula 9.01 
Facolta' di Economia
Via S.Angelo - Località Folcara, 03043 Cassino

Abstract:
The motivation for clustering objects on subsets of attributes (COSA; Friedman 
& Meulman, 2004) was given by consideration of data where the number of 
attributes is much larger than the number of objects. Obvious application is in 
systems biology (genomics, proteomics, and metabolomics. When we have a large 
numbers of attributes, objects might cluster on some attributes, and be far 
apart on all others. Common data analysis approaches in systems biology are to 
cluster the attributes first, and only after having reduced the original many-
attribute data set to a much smaller one, one tries to cluster the objects. The 
problem here, of course, is that we would like to select those attributes that 
discriminate most among the objects (so we have to do this while regarding all 
attributes multivariately), and it is usually not good enough to inspect each 
attribute univariately. 
Therefore, two tasks has to be carried out simultaneously: cluster the objects 
into homogeneous groups, while selecting different subsets of variables (one 
for each group of objects). The attribute subset for any discovered group may 
be completely, partially or nonoverlapping with those for other groups. To 
avoid local optima, it is shown in Friedman and Meulman (2004) that we need to 
start with the inverse exponential mean (rather than the arithmetic mean) of 
the separate attribute distances. By using a homotopy strategy, the algorithm 
creates a smooth transition of the inverse exponential distance to the mean of 
the ordinary Euclidean distances over attributes. 
New insight will be presented for the weights that are crucial in the COSA 
procedure but that were rather underexposed as diagnostics in the original 
paper.

Keywords: Clustering on variable subsets, distance-based clustering, inverse 
exponential distance, targeted clustering, homotopy parameter, systems biology, 
genomics, proteomics, metabolomics, Leiden ApoE3 data.

References
[1]	Friedman, J.H., & Meulman, J.J. (2004a). Clustering objects on subsets 
of variables (with discussion). Journal of the Royal Statistical Society, 
Series B, 66, 815-849
[2]	Friedman, J.H., & Meulman, J.J. (2004b). The COSA program in an R-
environment, available at 
http://www-stat.stanford.edu/~jhf/COSA.html
[3]	Van der Greef, J., Davidov, E., Verheij, E., Vogels, J., Van der 
Heijden, R., Adourian, A.S., Oresic, M., Marple, E.W., & Naylor, S. (2003). The 
role of metabolomics in drug discovery: A new vision for drug discovery and 
development. In Harrigan, G.G., & Goodacre R. (Eds.), Metabolic profiling: Its 
role in biomarker discovery and gene function analysis (pp. 170-198). Boston: 
Kluwer. 


***********************************************************************
http://stat.unicas.it/

-- 
Giovanni C. Porzio
Dipartimento di Scienze Economiche
Università degli Studi di Cassino
Via S. Angelo - Localita' Folcara  
03043 Cassino (FR) - Italia
tel: +39.0776.2993448

www.eco.unicas.it/docente/porzio

-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/


=============================================================================
NOTA:
Le norme per utilizzare il forum SIS e le istruzioni per iscrizione
e cancellazione sono disponibili all'indirizzo
.
.   http://w3.uniroma1.it/sis/forum.asp
.
L'archivio di tutti i messaggi (aggiornato al giorno precedente)
e' disponibile all'indirizzo
.
.   http://www.stat.unipg.it/cgi-bin/wilma/sis/
.
=============================================================================