Insert title here

S.No	Title	Abstract	Download
1	Efficient Row Column Designs for Microarray Experiments Author: Ananta Sarkar, Rajender Prasad, V.K. Gupta, Kashinath Chatterrjee and Abhishek Rathore Pages: 89-117	This article deals with the problem of obtaining efficient designs for 2-colour microarray experiments where same set of genes are spotted on each array. In the literature, optimality aspects of designs for microarray experiments have been investigated under a restricted model involving array and variety effects. The dye effects have been ignored from the model. If dye effects are also included in the model, then the structure of the design becomes that of a row-column design where arrays represent columns, dyes represent rows and varieties represent treatments. Further, the array effects in microarray experiments may be taken as random {see e.g. Kerr and Churchill (2001a), Lee (2004)}. For obtaining efficient row-column designs under fixed/ mixed effects model, exchange and interchange algorithms of Eccleston and Jones (1980) and Rathore et al. (2006) have been modified. The algorithm has been translated into a computer program using Microsoft Visual C++. The algorithm is general in nature and can be used for generating efficient row-column designs for any 2 ? k < v, where v is the number of treatments (varieties) and k is number of rows (dyes). Here, the algorithm has been exploited for computer aided search of efficient row- column designs for making all possible pairwise treatment comparisons for k = 2 (2-colour microarray experiments) in the parametric range 3 ? v ? 10, v ? b ? v(v ? 1)/2; 11 ? v ? 25, b = v and (v, b) = (11, 13), (12, 14), (13, 14) and (13, 15), where b is the number of arrays (columns). Efficient row-column designs obtained under fixed effects model have been compared with the best available designs and best even designs. 45 designs have been obtained with higher efficiencies than the best available designs and even designs. The robustness aspect of efficient row-column designs obtained under a fixed effects model and best available designs were investigated under a mixed effects model. Strength of the algorithm for obtaining row-column designs for 3-colour microarray experiments has been demonstrated with the help of examples. Keywords: Microarray experiments, Fixed/Mixed effects model, Row-column designs, A-efficiency, D-efficiency.	Abstract This article deals with the problem of obtaining efficient designs for 2-colour microarray experiments where same set of genes are spotted on each array. In the literature, optimality aspects of designs for microarray experiments have been investigated under a restricted model involving array and variety effects. The dye effects have been ignored from the model. If dye effects are also included in the model, then the structure of the design becomes that of a row-column design where arrays represent columns, dyes represent rows and varieties represent treatments. Further, the array effects in microarray experiments may be taken as random {see e.g. Kerr and Churchill (2001a), Lee (2004)}. For obtaining efficient row-column designs under fixed/ mixed effects model, exchange and interchange algorithms of Eccleston and Jones (1980) and Rathore et al. (2006) have been modified. The algorithm has been translated into a computer program using Microsoft Visual C++. The algorithm is general in nature and can be used for generating efficient row-column designs for any 2 ? k < v, where v is the number of treatments (varieties) and k is number of rows (dyes). Here, the algorithm has been exploited for computer aided search of efficient row- column designs for making all possible pairwise treatment comparisons for k = 2 (2-colour microarray experiments) in the parametric range 3 ? v ? 10, v ? b ? v(v ? 1)/2; 11 ? v ? 25, b = v and (v, b) = (11, 13), (12, 14), (13, 14) and (13, 15), where b is the number of arrays (columns). Efficient row-column designs obtained under fixed effects model have been compared with the best available designs and best even designs. 45 designs have been obtained with higher efficiencies than the best available designs and even designs. The robustness aspect of efficient row-column designs obtained under a fixed effects model and best available designs were investigated under a mixed effects model. Strength of the algorithm for obtaining row-column designs for 3-colour microarray experiments has been demonstrated with the help of examples. Keywords: Microarray experiments, Fixed/Mixed effects model, Row-column designs, A-efficiency, D-efficiency.
2	Statistical Genomics for Crop Improvement: Opportunities and Challenges Author: B.M. Prasanna Pages: 77-87	Effective analysis of molecular data in combination with rigorous phenotypic data using appropriate statistical methods can provide enhanced understanding of the genetic and molecular bases of complex phenotypic traits. Coupled with the rapid developments related to genome sequencing of crop plants, advances in statistical methods have aided in detecting Quantitative Trait Loci (QTL) influencing an array of traits, including epistatic QTLs, besides analysis of genotype × environment interactions, discovery of ?consensus QTL? through meta-analysis of data, expression-QTL (eQTL) through genetical genomics, and even epigenomic QTL. The profusion of powerful DNA-based markers, particularly single nucleotide polymorphisms (SNPs) and the evolution of statistical algorithms and experimental strategies, including the extension of the concept of linkage disequilibrium (LD)-based association mapping in crop plants, further promise to revolutionize the discovery of marker-trait associations for several important traits. While these exciting advances have brought closer the statisticians, bioinformatics experts, geneticists and molecular biologists, the new focus on genomics has also highlighted a significant challenge: how to integrate the different views of the genome given by various types of experimental data and provide a proper biological perspective that can lead to crop improvement. In this article, from the user?s perspective, I shall review some of the ongoing work on the above-mentioned areas in crop plants, especially using maize as a model system, and the opportunities and challenges for application of statistical genomics in molecular plant breeding. Keywords : Molecular markers, Haplotypes, QTL, Association mapping, Statistical genomics, Crop plants.	Abstract Effective analysis of molecular data in combination with rigorous phenotypic data using appropriate statistical methods can provide enhanced understanding of the genetic and molecular bases of complex phenotypic traits. Coupled with the rapid developments related to genome sequencing of crop plants, advances in statistical methods have aided in detecting Quantitative Trait Loci (QTL) influencing an array of traits, including epistatic QTLs, besides analysis of genotype × environment interactions, discovery of ?consensus QTL? through meta-analysis of data, expression-QTL (eQTL) through genetical genomics, and even epigenomic QTL. The profusion of powerful DNA-based markers, particularly single nucleotide polymorphisms (SNPs) and the evolution of statistical algorithms and experimental strategies, including the extension of the concept of linkage disequilibrium (LD)-based association mapping in crop plants, further promise to revolutionize the discovery of marker-trait associations for several important traits. While these exciting advances have brought closer the statisticians, bioinformatics experts, geneticists and molecular biologists, the new focus on genomics has also highlighted a significant challenge: how to integrate the different views of the genome given by various types of experimental data and provide a proper biological perspective that can lead to crop improvement. In this article, from the user?s perspective, I shall review some of the ongoing work on the above-mentioned areas in crop plants, especially using maize as a model system, and the opportunities and challenges for application of statistical genomics in molecular plant breeding. Keywords : Molecular markers, Haplotypes, QTL, Association mapping, Statistical genomics, Crop plants.
3	Cover Author: ISAS Pages: 1	N.A	Abstract N.A
4	Information Based Agglomerative Segmentation in Metric Spaces Author: Francesca Chiaromonte and James Taylor Pages: 33-44	In this article, we introduce an approach to agglomerate points in a metric space into spatially contiguous groups which preserve both distance and frequency structure of the data. This is achieved using a traditional distance criterion to define candidate mergers, and then selecting among these candidates as to maximize the mutual information between pre- and post- merger partitions. Our information based agglomerative segmentation is particularly effective when grouping data that does not present spatially separated clusters, and can therefore be employed for reducing data complexity in a number of scientific applications. We illustrate the procedure using a simulated data structure and an application to the analysis of multi-species genomic alignment data. Keywords : Agglomerative clustering, Mutual information and entropy, Data complexity reduction, Genomics.	Abstract In this article, we introduce an approach to agglomerate points in a metric space into spatially contiguous groups which preserve both distance and frequency structure of the data. This is achieved using a traditional distance criterion to define candidate mergers, and then selecting among these candidates as to maximize the mutual information between pre- and post- merger partitions. Our information based agglomerative segmentation is particularly effective when grouping data that does not present spatially separated clusters, and can therefore be employed for reducing data complexity in a number of scientific applications. We illustrate the procedure using a simulated data structure and an application to the analysis of multi-species genomic alignment data. Keywords : Agglomerative clustering, Mutual information and entropy, Data complexity reduction, Genomics.
5	Page 1 Author: ISAS Pages: 1	N.A	Abstract N.A
6	Preface Author: ISAS Pages: 1	N.A	Abstract N.A
7	Announcement Author: ISAS Pages: 1	N.A	Abstract N.A
8	Analysis of Correlated Gene Expression Data on Ordered Categories Author: Shyamal D. Peddada, Shawn F. Harris and Ori Davidov Pages: 45-60	A bootstrap based methodology is introduced for analyzing repeated measures/longitudinal microarray gene expression data over ordered categories. The proposed non-parametric procedure uses order-restricted inference to compare gene expressions among ordered experimental conditions. The null distribution for determining significance is derived by suitably bootstrapping the residuals. The procedure addresses two potential sources of correlation in the data, namely, (a) correlations among genes within a chip (?intra-chip? correlation), and (b) correlation within subject due to repeated/longitudinal measurements (?temporal? correlation). To make the procedure computationally efficient, the adaptive bootstrap methodology of Guo and Peddada (2008) is implemented such that the resulting procedure controls the false discovery rate (FDR) at the desired nominal level. A bootstrap based methodology is introduced for analyzing repeated measures/longitudinal microarray gene expression data over ordered categories. The proposed non-parametric procedure uses order-restricted inference to compare gene expressions among ordered experimental conditions. The null distribution for determining significance is derived by suitably bootstrapping the residuals. The procedure addresses two potential sources of correlation in the data, namely, (a) correlations among genes within a chip (?intra-chip? correlation), and (b) correlation within subject due to repeated/longitudinal measurements (?temporal? correlation). To make the procedure computationally efficient, the adaptive bootstrap methodology of Guo and Peddada (2008) is implemented such that the resulting procedure controls the false discovery rate (FDR) at the desired nominal level.	Abstract A bootstrap based methodology is introduced for analyzing repeated measures/longitudinal microarray gene expression data over ordered categories. The proposed non-parametric procedure uses order-restricted inference to compare gene expressions among ordered experimental conditions. The null distribution for determining significance is derived by suitably bootstrapping the residuals. The procedure addresses two potential sources of correlation in the data, namely, (a) correlations among genes within a chip (?intra-chip? correlation), and (b) correlation within subject due to repeated/longitudinal measurements (?temporal? correlation). To make the procedure computationally efficient, the adaptive bootstrap methodology of Guo and Peddada (2008) is implemented such that the resulting procedure controls the false discovery rate (FDR) at the desired nominal level. A bootstrap based methodology is introduced for analyzing repeated measures/longitudinal microarray gene expression data over ordered categories. The proposed non-parametric procedure uses order-restricted inference to compare gene expressions among ordered experimental conditions. The null distribution for determining significance is derived by suitably bootstrapping the residuals. The procedure addresses two potential sources of correlation in the data, namely, (a) correlations among genes within a chip (?intra-chip? correlation), and (b) correlation within subject due to repeated/longitudinal measurements (?temporal? correlation). To make the procedure computationally efficient, the adaptive bootstrap methodology of Guo and Peddada (2008) is implemented such that the resulting procedure controls the false discovery rate (FDR) at the desired nominal level.
9	Bayesian Hierarchical Models to Identify Quantitative Trait Loci using Replicated Lines Author: Susan J. Simmons, Ann E. Stapleton, Fang Fang, Qijun Fang and Karl Ricanek Pages: 11-18	The identification of locations on a genetic map that associate with quantitative traits is an important issue in plant breeding and gene identification in crops. Many of the available algorithms for quantitative trait loci (QTL) allow only one observation per genotype distribution. Information within plant lines is summarized into a single observation in order to utilize available programs. However, important variation information within lines is lost. We propose using a Bayesian hierarchical model that incorporates the multiple observations within plant lines. A Markov chain Monte Carlo model composition strategy is used to search and identify genetic markers associated with a quantitative trait. An extensive simulation study illustrates the effectiveness of this method. Results from applying this algorithm to Bay-0 × Shahdara Arabidopsis thaliana recombinant inbred line QTL experiment are discussed. Keywords : Bayesian hierarchical model, Markov chain Monte Carlo model composition, Activation probability.	Abstract The identification of locations on a genetic map that associate with quantitative traits is an important issue in plant breeding and gene identification in crops. Many of the available algorithms for quantitative trait loci (QTL) allow only one observation per genotype distribution. Information within plant lines is summarized into a single observation in order to utilize available programs. However, important variation information within lines is lost. We propose using a Bayesian hierarchical model that incorporates the multiple observations within plant lines. A Markov chain Monte Carlo model composition strategy is used to search and identify genetic markers associated with a quantitative trait. An extensive simulation study illustrates the effectiveness of this method. Results from applying this algorithm to Bay-0 × Shahdara Arabidopsis thaliana recombinant inbred line QTL experiment are discussed. Keywords : Bayesian hierarchical model, Markov chain Monte Carlo model composition, Activation probability.
10	Cover-2 Author: ISAS Pages: 1	N.A	Abstract N.A
11	Cover-3 Author: ISAS Pages: 1	N.A	Abstract N.A
12	Lassoing Mixtures with Applications to Proteomic Mass Spectroscopy Analysis Author: Guan Xing and J. Sunil Rao Pages: 61-76	We propose a new estimation method for finite mixture models. Important in this estimation process is the determination of the number of mixture components. Traditional methods either perform sequential hypothesis testing, or perform model selection based on some criteria such as AIC, BIC, and Kullback-Leibler (KL) distance. We treat the component densities as predictors and generate pseudo-response based on the CDF/PDF of a saturated mixture model. To get a sparse component representation, we use a variation of the LASSO ? a L1-constraint optimization that produces many zero components weights. We then iterate between LASSO and EM steps to update the estimates of the component density parameters and component weights. Our approach is very general and can be extended naturally to handle finite multivariate mixtures and mixtures with non-normal components. A series of simulations illustrate the competitiveness of our approach. We then apply the methodology to a problem of classifying ovarian cancer patients based on protein mass spectroscopy data profiles. Keywords : Mixture models, LASSO, Proteomic mass spectroscopy, Selection, Shrinkage.	Abstract We propose a new estimation method for finite mixture models. Important in this estimation process is the determination of the number of mixture components. Traditional methods either perform sequential hypothesis testing, or perform model selection based on some criteria such as AIC, BIC, and Kullback-Leibler (KL) distance. We treat the component densities as predictors and generate pseudo-response based on the CDF/PDF of a saturated mixture model. To get a sparse component representation, we use a variation of the LASSO ? a L1-constraint optimization that produces many zero components weights. We then iterate between LASSO and EM steps to update the estimates of the component density parameters and component weights. Our approach is very general and can be extended naturally to handle finite multivariate mixtures and mixtures with non-normal components. A series of simulations illustrate the competitiveness of our approach. We then apply the methodology to a problem of classifying ovarian cancer patients based on protein mass spectroscopy data profiles. Keywords : Mixture models, LASSO, Proteomic mass spectroscopy, Selection, Shrinkage.
13	Hindi Supplement Author: Suresh Chandra Rai Pages: 119-123	N.A	Abstract N.A
14	Last Page Author: ISAS Pages: 1	N.A	Abstract N.A
15	Resolving Isoform Expression using Digital Gene Expression Data Author: Naomi S. Altman, Qingyu Wang, Vishesh Karwa and Aleksandra Slavkovic Pages: 19-31	In many organisms, alternative splicing increases proteomic complexity by generating multiple mRNA (and protein) isoforms from a single gene. The ability to quantify specific mRNA isoform expression levels is therefore more important to the understanding of biological function than quantifying overall gene expression. Next generation ultra-high throughput sequencing technologies make it possible to measure overall gene expression directly by identifying mRNAs in a sample (RNA- seq and digital gene expression). However, because the technologies typically sequence only short fragments of mRNA, and because mRNA isoforms encoded by the same gene often share substantial sequence regions, quantifying isoform expression from sequencing data requires resolving counts of mRNA fragments into mRNA isoform counts. In this paper, we discuss statistical methods to resolve isoform expression from digital gene expression data using restriction enzyme fragmentation. Methodology for determining the margins of contingency tables are used to deconvolve the fragment counts and infer isoform counts. Keywords : RNA-seq, DGE, .Splice variant, Count data, Marginal distribution, Contingency table, Alternative splicing	Abstract In many organisms, alternative splicing increases proteomic complexity by generating multiple mRNA (and protein) isoforms from a single gene. The ability to quantify specific mRNA isoform expression levels is therefore more important to the understanding of biological function than quantifying overall gene expression. Next generation ultra-high throughput sequencing technologies make it possible to measure overall gene expression directly by identifying mRNAs in a sample (RNA- seq and digital gene expression). However, because the technologies typically sequence only short fragments of mRNA, and because mRNA isoforms encoded by the same gene often share substantial sequence regions, quantifying isoform expression from sequencing data requires resolving counts of mRNA fragments into mRNA isoform counts. In this paper, we discuss statistical methods to resolve isoform expression from digital gene expression data using restriction enzyme fragmentation. Methodology for determining the margins of contingency tables are used to deconvolve the fragment counts and infer isoform counts. Keywords : RNA-seq, DGE, .Splice variant, Count data, Marginal distribution, Contingency table, Alternative splicing
16	Differential Meta-Analysis for Testing the Relative Importance of Two Competing Null Hypotheses over Multiple Experiments Author: Mehmet Kocak, Gaalin Zheng, Giri Narsimhan, E. Olusegun George and Saumyadipta Pyne Pages: 1-10	Gene expression experiments conducted under a variety of conditions can allow us to concurrently test more than one hypothesis for the same gene. For instance, if a particular gene has alternative modes of regulation, then it might be interesting to test the relative significance of each of those alternatives based on the gene?s expressions under different conditions. In particular, if the significance values for two hypotheses about a gene appear to differ consistently in favor of a particular hypothesis in multiple independent experiments, then our new differential meta-analysis method can summarize the differences to test whether that hypothesis is overall more significant than the other hypothesis for the gene. Alternatively, one could first obtain differentially expressed gene sets with traditional meta-analysis of individual hypotheses, and then compare the sets, but such an approach addresses the problem post hoc and lacks sensitivity and statistical power. Our method, in contrast, addresses the problem directly and rigorously based on a novel statistic designed for testing the relative importance of two competing null hypotheses over multiple experiments. We also specify an analytical distribution for this combined statistic. We applied our method to genome-wide fission yeast cell cycle expression data and discovered interesting gene sets based on two interacting hypotheses. Keywords : Meta-analysis, Logit, Gene Expression, Cell Cycle, Fission Yeast.	Abstract Gene expression experiments conducted under a variety of conditions can allow us to concurrently test more than one hypothesis for the same gene. For instance, if a particular gene has alternative modes of regulation, then it might be interesting to test the relative significance of each of those alternatives based on the gene?s expressions under different conditions. In particular, if the significance values for two hypotheses about a gene appear to differ consistently in favor of a particular hypothesis in multiple independent experiments, then our new differential meta-analysis method can summarize the differences to test whether that hypothesis is overall more significant than the other hypothesis for the gene. Alternatively, one could first obtain differentially expressed gene sets with traditional meta-analysis of individual hypotheses, and then compare the sets, but such an approach addresses the problem post hoc and lacks sensitivity and statistical power. Our method, in contrast, addresses the problem directly and rigorously based on a novel statistic designed for testing the relative importance of two competing null hypotheses over multiple experiments. We also specify an analytical distribution for this combined statistic. We applied our method to genome-wide fission yeast cell cycle expression data and discovered interesting gene sets based on two interacting hypotheses. Keywords : Meta-analysis, Logit, Gene Expression, Cell Cycle, Fission Yeast.