[Forum SIS] IFCS - Tutorial "Analisi per dati distribuzionali" 5 luglio 2015

Gio 4 Giu 2015 13:16:22 CEST

Cari colleghi,

Ho il piacere di comunicarvi che in occasione del Convegno IFCS 2015 che si terrà a Bologna dal 6 all'8 luglio 2015
il giorno 5 giugno pomeriggio - dalle ore 14:00 alle 18:00 - si svolgerà un TUTORIAL su Distributional Data Analysis
Qui di seguito e in allegato: il programma e le informazioni utili per l'iscrizione
----------------------------------------------------------------------------------------------------------------------------------------------------------------

Tutorials

The IFCS organized two interesting tutorials that will be parallely held in the afternoon of Sunday, the 5th of July, namely:

·   Distributional Data Analysis, by Prof. Rosanna Verde
·   The New Science of Big Data Analytics, Based on the Geometry and the Topology of Complex Systems, by Prof. Fionn Murtagh

The cost of ech tutorial is 40 €. The tutorials are addressed to doctoral, postdoctoral students, researchers. The maximum number of partecipants for each tutorial is 25.

The description of the first tutorial follows.

Distributional Data Analysis
Name of the organizer: Prof Rosanna Verde,

rosanna.verde a unina2.it

Abstract

In Statistics and data mining, the unit under analysis is typically an individual, which is described by numerical and/or categorical variables, each individual taking one single value for each variable. However, in some cases a set of observations is recorded for each individual or the units under analysis are not individuals but groups of individuals. In these cases, for each unit and for each variable we have a distribution of values. If we want to analyze such data we can summarize the distribution of values by a descriptive statistic (typically, the mean). Nevertheless, the mean overlooks a lot of information of the distribution of values.

Distributional data analysis provides tools to explore and extract information from data where the units under analysis are described by quantitative distributions. For operational purposes, distributions will be represented in histogram form. This topic belongs to the more general framework of Symbolic Data Analysis, where the units can be described by sets of values, intervals, histograms, etc. As particular case of symbolic data, histogram data represent a valid tool to analyze synthesis of numerical data, keeping the most information about their distribution.

In the last few years, many statistical methods (i.e. regression, forecasting models, clustering, principal component analysis) have been proposed to deal with this kind of data. This corpus of methods is mostly based on suitable dissimilarity measures between distributions.

The tutorial aims to introduce these new techniques and software tools (in R) in the consideration that in the Big data era, methods able to analyze aggregated data are particular useful and promising.

Some applications on real data sets will be shown and discussed even in comparison with classical techniques in order to validate the performance of the new methods and the strategies of analysis.

Tentative Program

Rosanna Verde (Second University of Naples, Italy:  (40 minutes)

- Introduction to distributional data analysis
- Main basic statistics
- The Wasserstein distance
- New techniques: Regression and Clustering Analysis

Sonia Manuela Mendes Dias (Universidade do Porto,Portugal) (20-30 minutes)
- The distribution and symmetric distribution regression for histogram data

Antonio Irpino (Second University of Naples, Italy ) (40-50 minutes)
- HistDAWass:  a R package for the histogram-valued data analysis:
- Basic statistics, clustering and regression of distributional data

Javier Arroyo (Universidad Complutense de Madrid, Spain) (45-60 minutes)
- Predictive techniques for time series of distributional data
- A real-life case in R

Antonio Balzanella (Second University of Naples, Italy ) (30 minutes)
- Data stream analysis based on histogram dimensional reduction: some examples of analysis

Participants

The tutorial is addressed to doctoral, postdoctoral students, researchers and practitioners working in Environmental, Social Sciences or Official Statistics domains.

It is open to whom it is interested about the dimensional reduction of Big Data especially massive sets of numerical data that can be summarized and represented by distributions, like histograms.

Main References

Arroyo J., Maté C. (2009) Forecasting histogram time series with k-nearest neighbors methods. International Journal of Forecasting 25, pp.  192–207, Elsevier.

Balzanella A, Verde R (2013). Clustering and change detection of multiple streaming time series. In: (a cura di): Kolodziej, J.; Di Martino, B.; Talia, D.; Xiong, K., Algorithms and Architectures for Parallel Processing. LECTURE NOTES IN COMPUTER SCIENCE, vol. 8285, p. 1-14, BERLI... N:Springer, ISBN: 978-3-319-03858-2, ISSN: 0302-9743, doi: 10.1007/978-3-319-03859-9_1

Bock H.H., Diday, Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, E. Springer, Berlin, 2000.

Brito, P. and Chavent, M. (2012). Divisive Monothetic Clustering for Interval and Histogram-valued Data. Proceedings of the ICPRAM’2012, Vol. 1, pp. 229-234, SciTePress.

Dias, S. and Brito, P. “New Developments in Linear Regression Models with Histogram-Valued Variables”. Third Workshop on Symbolic Data Analysis, Madrid, Spain, November 7th to 9th, 2012.

Dias, S., Brito, P. (2013). “Distribution and Symmetric Distribution Regression Model for Histogram-Valued Variables”. arXiv:1303.6199v1 [stat.ME]. Web address: http://arxiv.org/abs/1303.6199.

Irpino A, Verde R (2015). Basic statistics for distributional symbolic variables: a new metric-based approach. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, ISSN: 1862-5347, doi: 10.1007/s11634-014-0176-4

Irpino A, Verde R (2015). Linear regression for numeric symbolic variables: a least squares approach based on Wasserstein Distance. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, ISSN: 1862-5347

Irpino A, Verde R, Balzanella A (in press 2015). Dimension reduction techniques for distributional symbolic data. IEEE TRANSACTIONS ON CYBERNETICS, ISSN: 2168-2267, doi: 10.1109/TCYB.2015.2389653

Irpino A., Romano E., Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation, in: M. Noirhomme-Fraiture, G. Venturini (Eds.), EGC, volume RNTI-E-9 of Revue des Nouvelles Technologies de l’Information, Cépaduès- ´ Editions, 2007, pp. 99–110.

Irpino A., Verde R., A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data, in: V. Batanjeli, H. Bock, A. Ferligoj, A. Ziberna (Eds.), Data Science and Classification, Springer, Berlin, 2006, pp. 185–192.

Dott. Antonio Irpino, PhD
Ricercatore in Statistica
Seconda Università degli Studi di Napoli
Dipartimento di Scienze Politiche J. Monnet
email antonio.irpino a unina2.it

-------------- parte successiva --------------
Un allegato HTML è stato rimosso...
URL: <http://www.stat.unipg.it/pipermail/sis/attachments/20150604/292e320e/attachment.html>