FIRB Research project

“Mixture and latent variable model for causal inference and analysis of socio-economic data”

Home page

Research topics

Research Unit of Perugia

Research topics:

The methodological developments will concern the following themes:

1. Inferential developments on LM models within the Maximum Likelihood approach. As typically happens for latent variable models, the likelihood function of the LM model may be multimodal. Moreover, the maximization of this likelihood by the Expectation Maximization (EM) algorithm can be computationally intensive. Finally, the literature has still not provided a commonly accepted criterion for formal assessment of the number of the states of the latent chain. These issues may limit the widespread of LM models. The objectives of the Research Unit are then: (i) to study the issue of the multimodality of the likelihood function, trying to set up a rule that shows how the number of local maxima varies with the model complexity and the sample size; (ii) to propose strategies for the initialization of iterative algorithms for Maximum Likelihood estimation which allow us to obtain the convergence to the global maximum; (iii) to develop methods that allow us to verify if the estimation algorithm converges to the global maximum of the likelihood function; (iv) to propose and test accelerated versions of the EM algorithm and combinations between the EM and the Newton-Raphson algorithms; (v) to compare different criteria for the choice of the number of states and to propose criteria that take into account both the sample size and the number of time occasions.

2. Development of LM models for multilevel data. Recently, multilevel versions of the LM model have been proposed, in which latent variables are included to represent the effect of each cluster on the distribution of the response variables corresponding to every subject belonging to this cluster. Nevertheless, this effect is assumed to be time-constant. Within the present project, a multilevel version of the LM model will be developed in which the cluster effect is modelled in a dynamic way through a hierarchical structure of latent Markov chains. From the point of view of Maximum Likelihood estimation, a fundamental problem is that a tractable form is not available for the joint distribution of the response variables associated to all subjects within the same cluster. To face this problem, we plan to exploit the composite likelihood method, which considers a likelihood function based on all possible pairs of subjects in a cluster. The method is much less demanding from the computational point of view and, at the same time, guarantees consistent estimates.

3. Use of the LM model for the evaluation and in the context of causal inference. In many contexts, it is important to evaluate the causal effect of policies or treatments on the evolution of a latent characteristic. In these contexts, the LM model finds a natural application since it includes a measurement model and allows us to take into account that the characteristic of interest is only indirectly observable. Moreover, the model takes explicitly into account how the latent characteristic changes over time, also depending on observable covariates, and it is possible to consider multilevel data. The aims concerning this area are: (i) to formulate an LM model in terms of potential outcomes, so as to exploit this model for causal inference; (ii) to carry out methods that, on the basis of the parameter estimates, provide a measure of efficacy that can be used for performance evaluation.

4. Mixture of discrete variables for analysing ordinal data. A special attention will be given to a specific class of mixture models, obtained by combining a discrete Uniform and a shifted Binomial (CUB), and to their extensions. This class of models allows us to analyse the psychological process of decision when the choice/score is expressed by means of ordinal data. This condition is usually performed in numerous real situations. As CUB modelling is based on mixtures of distributions for discrete variables, it should be considered complementary to the other approaches. Therefore, one of the objectives of the project is to verify, on the same datasets, analogies and differences among the statistical models from both the statistical point of view (goodness of fit, significativity of covariates) and the interpretative point of view. More in detail, extensions of CUB models will be proposed with respect to (i) longitudinal data, in combination with an LM approach for categorical data, (ii) multilevel data and (iii) for the “shelter effect” with inclusion of covariates.

As regards the empirical developments, we expect to apply the aforementioned models to the analysis of the following datasets:

1. Juvenile employment condition. For this goal the Research Unit will deal with: (i) transition from university to labour market and (ii) job satisfaction. As concerns topic (i), three main longitudinal datasets will be used. The first one is the result of the longitudinal data produced by the Job Centres of the Region of Umbria integrated with the administrative data of the University of Perugia and the AlmaLaurea data. The second one is related with the Eurostat panel dataset EU_SILC (European Union Statistics on Income and Living Conditions), whereas the third dataset concerns transition from university to labour market in Lombardy region and it has been realised by the Research Unit Milano-Bicocca. As concerns point (ii) the job satisfaction will be studied through the analysis of a longitudinal dataset realised by the Bank of Italy. It collects information about job satisfaction, individual well-being, and happiness. With reference to both points (i) and (ii), LM and CUB models and the related extensions for multilevel data and causal inference are specially useful to treat the mentioned types of data.

2. Analysis of the dynamics related to the juvenile criminal behaviour. For this aim, the Research Unit will use a longitudinal dataset on the criminal histories of a juvenile cohort. The dataset is obtained from a national sample survey and includes the criminal history of a cohort of individuals born in 1987. The offenders are monitored during the period of potential presence into the circuit of juvenile criminals (14-18 years) until the first adult age (24 years). The use of models treated in this project allows us to answer to the policy request of estimating the probability to be a persistent offender for a specific crime as well as the probability of transition towards different typologies of crime.