The Minimal Hitting Set Generation Problem: Algorithms and Computation


Finding inclusion-minimal hitting sets (MHSs) for a given family of sets is a fundamental combinatorial problem with applications in domains as diverse as Boolean algebra, computational biology, and data mining. Although many algorithms are available in the literature to generate these MHSs, application papers typically consider only a few before selecting one (or introducing a novel algorithm), suggesting the need for a comprehensive survey and performance comparison. We introduce several of these applications, discussing how MHS generation is applied in each domain and which algorithms have been used, providing a unified view of these applications for researchers from diverse areas. We survey twenty-one algorithms for MHS generation from across a variety of domains, considering their history, classification, and useful features. We provide the results of a comprehensive suite of benchmarks of public software implementations of seventeen of these algorithms, including six we implemented ourselves in C++, emphasizing problem instances taken from real applications in the literature. We find that the fastest algorithms in practice are not those with the tightest complexity bounds or those most commonly used in applications, suggesting that, for a given application, benchmarking from across the broad span of available algorithms will enable a better choice. Finally, we provide a public repository of these software implementations as ready-to-use, platform-agnostic Docker containers based on the AlgoRun framework, so interested computational scientists can easily perform similar tests with inputs from their own research areas, either on their own computers or through a convenient Web interface or deploy the algorithms in their own analysis pipelines.

SIAM Journal on Discrete Mathematics. 2017; 31(1):63–100.