You are here

Colloquium Series

Fall 2017

Fall 2017 colloquia will take place Mondays, 3pm-4pm,in ENR2 S395. 

Statistics GIDP Colloquium: Monday, September 11, 2017

Speaker: Yue (Selena) Niu, University of Arizona 

Title: Reduced Ranked Linear Discriminant Analysis

Abstract: Many high dimensional classification techniques have been
developed recently. However, most works focus on only the binary
classification problem. Available classification tools for the multi-class
cases are either based on over-simplified covariance structure or
computationally complicated. In this talk, following the idea of reduced
ranked linear discriminant analysis, we introduce a new dimension
reduction tool with the flavor of supervised principal component analysis.
The proposed method is computationally efficient and can incorporate the
correlation structure among the features. We illustrate our methods by
simulated and real data examples.

Statistics GIDP Colloquium: Monday, October 2, 2017

Speaker: Yehua Li, Iowa State University

Title: Nested Hierarchical Functional Data Modeling and Inference for the Analysis of Functional Plant Phenotypes

Abstract:  In a plant science Root Image Study, the process of seedling roots bending in response to gravity is recorded using digital cameras, and the bending rates are modeled as functional plant phenotype data. The functional phenotypes are collected from seeds representing a large variety of genotypes and have a three-level nested hierarchical structure, with seeds nested in groups nested in genotypes. The seeds are imaged on different days of the lunar cycle, and an important scientific question is whether there are lunar effects on root bending. We allow the mean function of the bending rate to depend on the lunar day and model the phenotypic variation between genotypes, groups of seeds imaged together, and individual seeds by hierarchical functional random effects. We estimate the covariance functions of the functional random effects by a fast penalized tensor product spline approach, perform multi-level functional principal component analysis (FPCA) using the best linear unbiased predictor of the principal component scores, and improve the efficiency of mean estimation by iterative decorrelation. We choose the number of principal components using a conditional Akaike Information Criterion and test the lunar day effect using generalized likelihood ratio test statistics based on the marginal and conditional likelihoods. We also propose a permutation procedure to evaluate the null distribution of the test statistics. Our simulation studies show that our model selection criterion selects the correct number of principal components with remarkably high frequency, and the likelihood-based tests based on FPCA have higher power than a test based on working independence.

Statistics GIDP Colloquium: Monday, November 6, 2017 CANCELLED

Statistics GIDP Colloquium: Monday, November 20, 2017

          Speaker: Xiaotong Shen, University of Minnesota

           Title:   Personalized Prediction and Recommender Systems

Abstract Personalized prediction predicts a user's preference for a large number of items through user-specific as well as content-specific information, based on a very small amount of observed preference scores. In a sense, predictive accuracy depends on how to pool the information from similar users and items.  Two major approaches are collaborative filtering and content-based filtering.  Whereas the former utilizes the information on users that think alike for a specific item, the latter acts on characteristics of the items that a user prefers, on which two kinds of recommender systems Grooveshark and Pandora are built. In this talk, I will review some recent advances in latent factor modeling and discuss various issues as well as scalable strategies based on a ``divide-and-conquer'' algorithm.


Statistics GIDP Colloquium: Monday, December 4, 2017

Speaker: Edward Bedrick, University of Arizona

Spring 2017


Statistics GIDP Colloquium: MondayMay 1, 2017.

Speaker: Ming Hu, Lerner Research Institute, Cleveland Clinic; 

Title: Statistical Methods, Computational Tools and Visualization of Hi-C Data​

Abstract: Harnessing the power of high-throughput chromatin conformation capture (3C) based technologies, we have recently generated a compendium of datasets to characterize chromatin organization across human cell lines and primary tissues. Knowledge revealed from these data facilitates deeper understanding of long range chromatin interactions (i.e., peaks) and their functional implications on transcription regulation and genetic mechanisms underlying complex human diseases and traits. However, various layers of uncertainties and complex dependency structure complicate the analysis and interpretation of these data. We have proposed hidden Markov random field (HMRF) based statistical methods, which properly address the complicated dependency issue in Hi-C data, and further leverage such dependency by borrowing information from neighboring pairs of loci, for more powerful and more reproducible peak detection. Through extensive simulations and real data analysis, we demonstrate the power of our methods over existing peak callers. We have applied our methods to the compendium of Hi-C from 21 human cell lines and tissues, and further develop an online visualization tool to facilitate identification of potential target gene(s) for the vast majority of non-coding variants identified from the recent waves of genome-wide association studies.

3:00 pm - 4:00 pm, Mathematics Building, room 501.


Statistics GIDP Colloquium: Friday, April 7, 2017.

Speaker: Sunder Sethuraman, Department of Mathematics, University of Arizona;

Title:  Consistency of modularity clustering and Kelvin's tiling problem

Abstract:  Given a graph, the popular `modularity' clustering method specifies a partition of the vertex set as the solution of a certain optimization problem.  In this talk, we will discuss consistency properties, or scaling limits, of this method with respect to random geometric graphs constructed from n i.i.d. points, V_n = \{X_1, X_2, . . . ,X_n\}, distributed according to a probability measure supported on a bounded domain in R^d. A main result is the following:  Suppose the number of clusters, or partitioning sets of V_n, is bounded above, then we show that the discrete optimal modularity clusterings converge in a specific sense to a continuum partition of the underlying domain, characterized as the solution of a `soap bubble', or `Kelvin'-type shape optimization problem.

3:00 pm - 4:00 pm, Mathematics Building, room 501.


Statistics GIDP Colloquium: Friday, March 3, 2017.

Speaker: Ming-Hung (Jason) Kao, Associate Professor, School of Mathematical and Statistical Sciences, Arizona State University;

Title: Experimental Designs for Functional Brain Imaging with fMRI

Abstract. Functional magnetic resonance imaging (fMRI) experiments are widely conducted in many fields for studying functions of the brain. One of the important first steps of such experiments is to select a good experimental design to allow for a valid and precise statistical inference. However, the identification and construction of high-quality fMRI designs can be quite challenging. In this talk, we introduce some methods for constructing good fMRI designs, and discuss the statistical optimality of these designs.

3:00 pm - 4:00 pm, Mathematics Building, room 501.


Statistics GIDP Colloquium: Friday, February 3, 2017.

Speaker: Ge Yong, Management Information Systems, University of Arizona;

Title: Point-of-Interest Recommendations in Location-based Social Networks

Abstract. With the rapid development of Location-based Social Network (LBSN) services, a large number of Point-Of-Interests (POIs) have been available, which consequently raises a great demand of building personalized POI recommender systems. A personalized POI recommender system can significantly assist users to find their preferred POIs and help POI owners to attract more customers. However, it is very challenging to develop a personalized POI recommender system because a user's check-in decision making process is very complex and could be influenced by many factors such as social network, geographical position, and the dynamics of user preferences. In the first part of this talk, we propose to divide the whole recommendation space into two parts: social friend space and user interest space, and we develop models for each space for generating recommendations. In the second part of this talk, we introduce a new ranked based method for implicit feedback-based recommendation. To evaluate the proposed methods, we conduct extensive experiments with many state-of-the-art baseline methods and evaluation metrics on the real-world data sets.

Bio. Dr. Yong Ge is an assistant professor at MIS Dept. of UoA. He received his Ph.D. in Information Technology from Rutgers, The State University of New Jersey in 2013, the M.S. degree in Signal and Information Processing from the University of Science and Technology of China (USTC) in 2008, and the B.E. degree in Information Engineering from Xi'an Jiao Tong University in 2005. He received the ICDM-2011 Best Research Paper Award, Excellence in Academic Research (one per school) at Rutgers Business School in 2013, and the Dissertation Fellowship at Rutgers University in 2012. He has published prolifically in refereed journals and conference proceedings, such as IEEE TKDE, ACM TOIS, ACM TKDD, ACM TIST, ACM SIGKDD, and IEEE ICDM. His work have been supported by UoA, NSF and NIH.

3:00 pm - 4:00 pm, Mathematics Building, room 501.

Fall 2016

Statistics GIDP Colloquium: Wednesday, September 7, 2016.

  • Speaker: Matti Morzfeld, Department of Mathematics, University of Arizona
  • Title: U2 can UQ -- Projects and Life in Uncertainty
  • Abstract: 
    I will give an overview about mathematical and computational problems I face when I combine numerical models and data. I will first review basic tools such as Bayes' rule and importance sampling, then explain what difficulties arise when using these tools, and then present two specific applications.
    The first application uses low-dimensional models to describe and predict reversals of the geomagnetic dipole, the second uses adaptive importance sampling to solve a parameter estimation problem in combustion modeling, leveraging parallelism of DOE's super computers.
  • 3:00 pm - 4:00 pm, Mathematics Building, room 402.


  • Statistics GIDP Colloquium: Wednesday, October 5, 2016.
  • Speaker: Han Xiao, Dept of Statistics and Biostatistics, Rutgers University
  • Title: On the maximum cross correlations under high dimension
  • Abstract: Multiple time series often exhibits cross lead-lag relationship among its component series. It is very challenging to identify this type of relationship when the number of series is large. We study the lead-lag relationship in the high dimensional context, using the maximum cross correlations and some other variants. Asymptotic distributions are obtained. We also use moving blocks bootstrap to improve the finite sample performance.
  • 3:00 pm - 4:00 pm, Mathematics Building, room 501.
  • This talk will be preceded by a graduate student lunch - contact Kristina Souders ( for information.


  • Statistics GIDP Colloquium: Wednesday, November 2, 2016.
  • Speaker: Haiquan Li, Assistant Professor, Director for Translational Bioinformatics, Department of Medicine, University of Arizona; 
  • Title: Scattered disease-linked variants and convergent functions: discovery from big data integration
  • Abstract: Genome-wide association studies (GWAS) has identified thousands of disease-linked single nucleotide polymorphisms (SNP) in the human genome. Most of them have a small effect size (OR<1.4) and locate independently across multiple chromosomes. It remains unclear how they collectively cause the diseases due to the issue of missing heritability. Classic tests of genetic interactions suffer from insufficient power. Here, we will present an integrative approach that leverages several omics datasets to obtain additional information beyond genotypes and thus reducing the number of hypotheses. We combine traditional semantic similarity for genes’ functions and very deep network permutations (100K times) to quantify the empirical significance of downstream function similarity of any pair of SNPs. This approach enabled us to discover a fundamental biological mechanism for complex diseases:  SNPs associated with the same disease are more likely to associate with the same downstream genes or functionally similar genes than unrelated diseases (OR>12). We also found 40-50% of prioritized SNP-pairs have significant genetic interactions from three independent GWAS datasets. These results provide new biological interpretation to genetic interactions and a “roadmap” of disease mechanisms emerging from GWAS SNPs, especially those out of coding regions.
  • 3:00 pm - 4:00 pm, Mathematics Building, room 501.


  • Statistics GIDP Colloquium: Wednesday, December 7, 2016.
  • Speaker: Timothy Hanson, Professor, Department of Statistics, University of South Carolina
  • Title: A unified framework for fitting Bayesian semiparametric models to arbitrarily censored spatial survival data
  • Abstract: A comprehensive, unified approach to modeling arbitrarily censored spatial survival data is presented for the three most commonly-used semiparametric models: proportional hazards, proportional odds, and accelerated failure time. Unlike many other approaches, all manner of censored survival times are simultaneously accommodated including uncensored, interval censored, current-status, left and right censored, and mixtures of these. Left truncated data are also accommodated leading to models for time-dependent covariates.  Both georeferenced (location observed exactly) and areally observed (location known up to a geographic unit such as a county) spatial locations are handled. Variable selection is also incorporated.  Model fit is assessed with conditional Cox-Snell residuals, and model choice carried out via LPML and DIC.  Baseline survival is modeled with a novel transformed Bernstein polynomial prior. All models are fit via new functions which call efficient compiled C++ in the R package spBayesSurv. The methodology is broadly illustrated with simulations and real data applications.  An important finding is that proportional odds and accelerated failure time models often fit significantly better than the commonly-used proportional hazards model.
  • 3:00 pm - 4:00 pm, Mathematics Building, room 501.


Spring 2016

  • Statistics GIDP Colloquium: Wednesday, February 3, 2016.
  • Speaker: Walt Piegorsch, PhD, University of Arizona, GIDP;
  • Title: Model uncertainty in environmental risk assessment
  • Abstract: Estimation of low-dose ‘benchmark’ points in environmental risk analysis is discussed. Focus is on the increasing recognition that model uncertainty and misspecification can drastically affect point estimators and confidence limits built from limited dose-response data, which in turn can lead to imprecise risk assessments with uncertain, even dangerous, policy implications. Some possible remedies are mentioned, including use of parametric (frequentist) model averaging over a suite of potential dose-response models, and nonparametric dose-response analysis via isotonic regression.  An example on formaldehyde toxicity illustrates the calculations.
  • 12:00 pm - 1:00 pm, Physics and Atmospheric Sciences Building, room 314.


  • Statistics GIDP Colloquium: Wednesday, March 2, 2016. 
  • Speaker: Zhaoxia Yu, PhD, University of California Irvine, Dept of Statistics;
  • Title: Strategies on Identifying Gene-Gene Interactions
  • Abstract: Characterizing gene-gene interactions is of fundamental importance in unraveling the etiology of complex human diseases. However, due to the ultra high-dimensional nature of the problem, the degree to which genes jointly affect disease risk is largely unknown. Two major obstacles toward this goal are the enormous computational effort and heavy burden of multiple testing in testing gene-gene interactions. In this talk I will discuss several strategies using three examples. In this first example we derived close-form and consistent estimates of interaction parameters for case-control data. The derived Wald tests gave very similar results with the gold standard but were ten times faster. In a study of multiple sclerosis, we identified interactions within the major histocompatibility complex region. In the second example, we used information that is independent of interaction testing to prioritize gene-gene pairs for case-parents design. The application of this strategy provided suggestive evidence for interactions between two genomic regions: the major histocompatibility complex region on chr 6 and the killer-cell immunoglobulin-like receptor region on chr19. In the last example, we borrowed information across distinct but similar diseases. We found that genes interacting in multiple sclerosis also interacted with each other in type 1 diabetes.
  • 12:00 pm - 1:00 pm, Physics and Atmospheric Sciences Building, room 314.


  • Statistics GIDP Colloquium: Wednesday, April 6, 2016. 
  • Speaker: Jie Chen, PhD, Georgia Regents University, Dept of Biostatistics & Epidemiology;
  • Title: Change point models in the Bayesian Perspective and their applications in CNV study
  • Abstract: Biomedical researchers now use advanced technologies, such as the comparative genomic hybridization (CGH), the array-based comparative genomic hybridization (aCGH), and the high throughput next generation sequencing (NGS), to conduct DNA copy number experiments for detecting DNA copy number variations (CNVs) as cancer development, genetic disorders, and many other diseases are usually relevant to CNVs on the genome.   Identifying boundaries of CNV regions on a chromosome or a genome can be viewed as a change point problem of detecting signal/intensity changes presented in the genomic data.  The analysis of high throughput genomic data for possible changes has become one of the most recent viable applications of statistical change point analysis. In this talk, I present several change point models suitable to formulate different data types resulting from the aCGH and the NGS technologies and provide Bayesian solutions to these models.  Applications of these methods to tumor cell line data will also be given.
  • 12:00 pm - 1:00 pm, Physics and Atmospheric Sciences Building, room 314.


  • Statistics GIDP Colloquium: Monday, May 2, 2016. 
  • Speaker: Bikas Sinha, PhD, Retired Faculty, Indian Statistical Institute, Kolkata, India
  • Title: Mixture Experiments: Theory and Applications
  • Abstract: This is a review talk dealing briefly with mixture models, standard mixture designs and optimal mixture experiments. Some application areas will be highlighted. 
  • 12:00 pm - 1:00 pm, Physics and Atmospheric Sciences Building, room 314.



Fall 2015

  • Statistics GIDP Colloquium: Wednesday, September 2, 2015, 2014. Speaker: Neng Fan, Assistant Professor, Systems and Industrial Engineering Department, University of Arizona. Title: Learning from Data with Uncertainties via Data-Driven Optimization
  • 12:00 pm - 1:00 pm, Saguaro Hall 114.
  • Abstract: In the last several decades, many advanced technologies have been developed to collect and store data continuously, and data and decisions are more strongly linked together than ever before. In most cases, the data includes a lot of uncertainties, such as missing or incomplete information, measurement errors, noise, etc. Traditional machine learning methods for decisions are dealing with the exact information of data. Only to some extent, the data uncertainty, modeled by some support sets, mean or moment values, has been considered for robust decisions. In this talk, we discuss statistical models for data uncertainties and data-drive optimization approaches for decision-making under uncertainty, especially in the case of big data. Some robust and chance-constrained optimization models and algorithms for support vector machines will be introduced, and numerical experiments will be performed to validate the proposed approaches. 


  • Statistics GIDP Colloquium: Wednesday, September 30, 2015. Speaker: Clayton Morrison, Associate Professor, School of Information, University of Arizona.
  • Title: Finding Structure in Time: Inferring Structured Latent Sequences and Activity Descriptions
  • 12:00 pm - 1:00 pm, Modern Languages Building 410.
  • Abstract: Humans excel at understanding complex dynamic histories, recognizing relevant context and using that context to interpret events that are sometimes hierarchically and recursively structured.  Our research group has found the tools of Bayesian nonparametric modeling and inference well suited for approaching several aspects of these problems.  In this talk I present ongoing work on two applications that require methods for inferring structurally rich representations of time series: identifying context relevant to interpreting biochemical reactions described in cancer biology research papers, and constructing descriptions of coordinated activities from observations in video.


  • Statistics GIDP Colloquium: Wednesday, November 4, 2015. Speaker: Professor Avelino Arellano, Jr., Dept of Atmospheric Science, University of Arizona. 
  • Title: Towards Seamless Prediction of Chemical Weather
  • 12:00 pm - 1:00 pm, Modern Languages Building 410.
  • Abstract


  • Statistics GIDP Colloquium: Wednesday, December 2, 2015. Speaker: Professor Gen Li, Department of Biostatistics, Mailman School of Public Health, Columbia University.  
  • Title: Supervised Principal Component Analysis and Extensions

  • Abstract: It is increasingly common to have heterogeneous data sets measured on the same set of samples. Integrative analysis of multi-source data promises to reveal a more comprehensive picture of the underlying truth than individual analysis. In this talk, I will introduce a novel integrative dimension reduction framework called the Supervised Principal Component Analysis (SupPCA). The research is motivated by applications where people are interested in the low rank structure of some primary data while auxiliary variables are also available on the same set of samples. The proposed method can make use of the extra information in the auxiliary data to accurately extract underlying structures that are more interpretable. The model is formulated in a hierarchical fashion using latent variables, and subsumes many existing models as special cases. The asymptotic properties of parameter estimation are derived. We also extend the framework to accommodate special features, such as high-dimensional data, functional data, and multi-modal data. Applications to bioinformatics and business analytics problems demonstrate the advantage of the proposed methodology. 

  • 12:00 pm - 1:00 pm, Modern Languages Building 410.

Last updated 6 Nov 2017