Machine learning-based tools to model and to remove the off-target effect
A RNA interference, also called a gene knockdown, is a biological technique which consists of inhibiting a targeted gene in a cell. By doing so, one can identify statistical dependencies between a gene and a cell phenotype. However, during such a gene inhibition process, additional genes may also be modified. This is called the “off-target effect”. The consequence is that there are some additional phenotype perturbations which are “off-target”. In this paper, we study new machine learning tools that both model the cell phenotypes and remove the “off-target effect”. We propose two new automatic methods to remove the “off-target” components from a data sample. The first method is based on vector quantization (VQ). The second method we propose relies on a classification forest. Both methods rely on analyzing the homogeneity of several repetitions of a gene knockdown. The baseline we consider is a Gaussian mixture model whose parameters are learned under constraints with a standard Expectation–Maximization algorithm. We evaluate these methods on a real data set, a semi-synthetic data set, and a synthetic toy data set. The real data set and the semi-synthetic data set are composed of cell growth dynamic quantities measured in time laps movies. The main result is that we obtain the best recognition performance with the probabilistic version of the VQ-based method.