This is the **twenty third** article from my column **Mathematical Statistics and Machine Learning for Life Sciences**, where I discuss in plain language some mysterious analytical techniques that are common in Computational Biology. DNA sequencing technologies applied to archeological material tremendously enriched our knowledge about the **human past**. For example, analyzing DNA from human remains in historical burials can provide great information about ancient pandemics such as **plague** that is caused by *Yersinia pestis** *bacterium. However, common methods of detecting **ancient pathogens** often suffer from the lack of specificity and may result in false discoveries. In the present article, I…

This is the **twenty second** article of my column **Mathematical Statistics and Machine Learning for Life Sciences**, where I discuss in plain language some mysterious analytical techniques that are common in Computational Biology. Genome-wide **genotyping** and** ****whole-genome sequencing**** (WGS)** brought unprecedented resolution to genetic studies in Life Sciences, but also resulted in rapid growth of high-dimensional genetic variation data suffering from the **Curse of Dimensionality**. In the present article, I will give some theoretical background of the problem, and discuss why it is extremely challenging to do any meaningful and robust analysis in modern Genetics and Genomics.

Everyone who works…

This is the **twenty first** article from my column **Mathematical Statistics and Machine Learning for Life Sciences** where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. In my previous post Select Features for OMICs Integration I gave examples of multivariate feature selection and mentioned its advantages over the univariate feature selection without actually demonstrating it. In this post, we will compare predictive capacities of multivariate models such as **LASSO**, **PLS** and **Random Forest** with univariate models, e.g. the famous differential gene expression tool **DESeq2** as well as traditional **Mann-Whitney U…**

This is the **twentieth** article from the column **Mathematical Statistics and Machine Learning for Life Sciences** where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. Data integration is an important next step for improving analysis accuracy by utilizing synergistic effects via combining multiple sources of information. In Computational Biology and Biomedicine, data integration is making particular advances in Single Cell research area. Last year, **Nature** recognized **Single Cell Multimodal Omics Integration** as a **method of the year 2019**. …

This is the nineteenth article from the column **Mathematical Statistics and Machine Learning for Life Sciences** where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. This is the final article in the series dedicated to the Linear Mixed Model (LMM). Previously we talked about **How Linear Mixed Model Works**, how to derive and program **Linear Mixed Model from Scratch**** **in R from the **Maximum Likelihood (ML)** principle. …

This is the eighteenth article from the column **Mathematical Statistics and Machine Learning for Life Sciences** where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. **Linear Mixed Model** (also called Linear Mixed Effects Model) is widely used in Life Sciences, there are many tutorials showing how to run the model in R, however it is sometimes unclear how exactly the Random Effects parameters are optimized in the **likelihood maximization** procedure. In my previous post **How Linear Mixed Model Works** I gave an introduction to the concepts of the model, and…

This is the seventeenth article from my column **Mathematical Statistics and Machine Learning for Life Sciences** where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. **Linear Mixed Model (LMM)** also known as Linear Mixed Effects Model is one of key techniques in traditional Frequentist statistics. Here I will attempt to **derive LMM** solution **from scratch** from the Maximum Likelihood principal by optimizing mean and variance parameters of Fixed and Random Effects. However, before diving into derivations, I will start slowly in this post with an introduction of **when and how…**

This is the sixteenth article from the column **Mathematical Statistics and Machine Learning for Life Sciences** where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. In my previous post, tSNE vs. UMAP: Global Structure, I touched the limit of **large perplexity** as a potential way for tSNE to preserve more of a **global data structure** that becomes important when attempting to use tSNE beyond visualization for addressing hierarchical relations between clusters of data points (clustering). …

This is the fifteenth article from the column **Mathematical Statistics and Machine Learning for Life Sciences** where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. Dimension reduction techniques such as **tSNE**** and ****UMAP**** are absolutely central** for many types of data analysis, yet there is surprisingly little understanding of how exactly they work. Previously I started comparing tSNE vs. UMAP in my articles How Exactly UMAP Works, How to Program UMAP from Scratch, and Why UMAP is Superior over tSNE. Today I will share my views on **to what extent…**

This is the fourteenth post from the **Mathematical Statistics and Machine Learning for Life Sciences** column, where I try to explain **in a simple way** some mysterious analytical techniques used in Bioinformatics, Biomedicine, Genetics etc. In my previous posts **How Exactly UMAP Works** and **How to Program UMAP from Scratch** I explained **limitations of tSNE** and the way **UMAP overcomes them**. From the feedback I received, it seemed to me that the main message of the posts was not emphasized enough. …

Bioinformatician, SciLifeLab, Sweden