Benchmarking Time-Series Data Discretization on Inference Methods


The rapid development in quantitatively measuring DNA, RNA, and protein has stimulated a great interest in the development of reverse-engineering methods, that is, data-driven approaches to infer the network structure or a dynamical model of the system. Many reverse engineering methods require discrete quantitative data as input, while many experimental data are continuous. While data discretization has been considered as a pre-processing step, some studies on the reverse-engineering of intracellular networks have started to reveal the impact that the choice of data discretization has on the performance of reverse-engineering methods. However, more comprehensive studies are still greatly needed to systematically and quantitatively understand the impact that discretization methods have on inference methods. Furthermore, there is an urgent need for systematic comparative methods that can help select between discretization methods. In this work, we consider 4 different published intracellular networks with their respective time-series datasets. We consider each publication’s original reverse-engineering method and original datasets, and discretized the data with different discretization methods. Across the 4 studied cases, changing the data discretization to a more appropriate one, improved the reverse engineering methods’ performance by either increasing the number of inferred true positives or decreasing the number of inferred false positives. We observed no universal best discretization method across different time-series data sets. Thus, we propose TEDIE, a two-step evaluation metric for ranking discretization methods for time-series data. The underlying assumption of TEDIE is that an optimal discretization method should preserve the dynamic patterns observed in the original data across all variables. We used the 4 different published data sets and networks to validate TEDIE.

Bioinformatics, Jan 18 2019. doi: 10.1093/bioinformatics/btz036. PMID: 30657860