1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027; 2. School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121; 3. School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798 Singapore; 4. School of Information Science and Technology, University of Science and Technology of China, Hefei 230026
Abstract:Causal feature selection methods,also known as Markov boundary discovery methods, select features by learning the Markov boundary(MB) of the target variable. Hence, causal feature selection methods possess better interpretability and robustness than the traditional methods. In this paper, the existing causal feature selection methods are reviewed comprehensively. The methods are divided into two types, single MB discovery algorithms and multiple MB discovery algorithms. Based on the development history of each type, the typical algorithms as well as the recent advances are introduced in detail, and the accuracy, efficiency and data dependency of the algorithms are compared. Moreover, the extended MB discovery algorithms for special applications, including semi supervised learning, multi-label learning, multi-source learning and streaming data learning, are summarized. Finally, the current hotspots and the research directions in the future of causal feature selection are analyzed. Additionally, a toolbox for causal feature selection is developed(http://home.ustc.edu.cn/~xingyuwu/MB.html), where the commonly used packages and datasets are provided.
[1] LIU H, MOTODA H. Feature Selection for Knowledge Discovery and Data Mining. Berlin, Germany: Springer, 1998. [2] ZEBARI R R, ABDULAZEEZ M A, ZEEBAREE D Q, et al. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. Journal of Applied Science and Technology Trends, 2020, 1(2): 56-70. [3] BOLÓN-CANEDO V, ALONSO-BETANZOS A. Ensembles for Feature Selection: A Review and Future Trends. Information Fusion, 2019, 52: 1-12. [4] SOLORIO-FERNÁNDEZ S, CARRASCO-OCHOA J A, MARTÍN-EZ-TRINIDAD J F. A Review of Unsupervised Feature Selection Methods. Artificial Intelligence Review, 2020, 53(2): 907-948. [5] LI X P, WANG Y D, RUIZ R. A Survey on Sparse Learning Models for Feature Selection. IEEE Transactions on Cybernetics, 2022, 52(3): 1642-1660. [6] SOLORIO-FERNÁNDEZ S, CARRASCO-OCHOA J A, MARTÍN-EZ-TRINIDAD J F. A Survey on Feature Selection Methods for Mixed Data. Artificial Intelligence Review, 2021. DOI: 10.1007/s10462-021-10072-6. [7] GUYON I, GUNN S, NIKRAVESH M, et al. Feature Extraction:Foundations and Applications. Berlin, Germany: Springer, 2006. [8] LIU W, WANG J Y. Recursive Elimination Current Algorithms and a Distributed Computing Scheme to Accelerate Wrapper Feature Selection. Information Sciences, 2022, 589: 636-654. [9] BOMMERT A, SUN X D, BISCHL B, et al. Benchmark for Filter Methods for Feature Selection in High-Dimensional Classification Data. Computational Statistics & Data Analysis, 2020, 143. DOI: 10.1016/j.csda.2019.106839. [10] JIANG B B, WU X Y, YU K, et al. Joint Semi-Supervised Feature Selection and Classification through Bayesian Approach // Proc of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI, 2019: 3983-3990. [11] GUYON I, ELISSEEFF A. An Introduction to Variable and Feature Selection. The Journal of Machine Learning Research, 2003, 3: 1157-1182. [12] ALIFERIS C F, STATNIKOV A, TSAMARDINOS I, et al. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation. The Journal of Machine Learning Research, 2010, 11: 171-234. [13] ALIFERIS C F, STATNIKOV A, TSAMARDINOS I, et al. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Exten-sions. The Journal of Machine Learning Research, 2010, 11: 235-284. [14] PEARL J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, USA: Morgan Kaufmann Publishers, 2014. [15] TSAMARDINOS I, ALIFERIS C F. Towards Principled Feature Selection: Relevancy, Filters and Wrappers[C/OL]. [2021-12-22]. http://proceedings.mlr.press/r4/tsamardinos03a/tsamardinos03a.pdf. [16] PELLET J P, ELISSEEFF A. Using Markov Blankets for Causal Structure Learning. The Journal of Machine Learning Research, 2008, 9: 1295-1342. [17] SPIRTES P, GLYMOUR C, SCHEINES R. Causation, Prediction,Search. Berlin, Germany: Springer, 1993. [18] MCDONALD J H. Handbook of Biological Statistics. 2nd Edition. Baltimore, USA: Sparky House Publishing, 2014. [19] PEÑA J M. Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control // Proc of the European Confe-rence on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Berlin, Germany: Springer, 2008: 165-176. [20] ZHANG K, SCHÖLKOPF B, SPIRTES P, et al. Learning Causality and Causality-Related Learning: Some Recent Progress. National Science Review, 2018, 5(1): 26-29. [21] STATNIKOV A, LEMEIR J, ALIFERIS C F, et al. Algorithms for Discovery of Multiple Markov Boundaries. The Journal of Machine Learning Research, 2013, 14(1): 499-566. [22] KOLLER D, SAHAMI M. Toward Optimal Feature Selection // Proc of the 13th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 1996: 284-292. [23] MARGARITIS D. Toward Provably Correct Feature Selection in Arbitrary Domains // Proc of the 22nd International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2009: 1240-1248. [24] TSAMARDINOS I, ALIFERIS C F, STATNIKOV A, et al. Algorithms for Large Scale Markov Blanket Discovery // Proc of the 16th International Florida Artificial Intelligence Research Society Conference. Palo Alto, USA: AAAI, 2003: 376-380. [25] YARAMAKALA S. Fast Markov Blanket Discovery. Master's Dissertation. Ames, USA: Iowa State University, 2004. [26] YARAMAKALA S, MARGARITIS D. Speculative Markov Blanket Discovery for Optimal Feature Selection // Proc of the 5th IEEE International Conference on Data Mining. Washington, USA: IEEE, 2005. DOI: 10.1109JZCOM.2005.134. [27] BORBOUDAKIS G, TSAMARDINOS I. Forward-Backward Selection with Early Dropping. The Journal of Machine Learning Research, 2019, 20(1): 276-314. [28] LIU X Q, LIU X S. Swamping and Masking in Markov Boundary Discovery. Machine Learning, 2016, 104(1): 25-54. [29] YANG X L, WANG Y J, OU Y, et al. Three-Fast-Inter Incremental Association Markov Blanket Learning Algorithm. Pattern Recognition Letters, 2019, 122: 73-78. [30] TSAMARDINOS I, BORBOUDAKIS G, KATSOGRIDAKIS P, et al. A Greedy Feature Selection Algorithm for Big Data of High Dimensionality. Machine Learning, 2019, 108(2): 149-202. [31] ACID S, DE CAMPOS L M, FERNÁNDEZ M. Score-Based Me-thods for Learning Markov Boundaries by Searching in Constrained Spaces. Data Mining and Knowledge Discovery, 2013, 26(1): 174-212. [32] LI Y, KORB K B, ALLISON L.Markov Blanket Discovery Using Minimum Message Length[C/OL]. [2021-12-22].https://arxiv.org/pdf/2107.08140v1.pdf. [33] WU X Y, JIANG B B, ZHONG Y, et al. Tolerant Markov Boun-dary Discovery for Feature Selection // Proc of the 29th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2020: 2261-2264. [34] BAKER C R. Joint Measures and Cross-Covariance Operators. Tran-sactions of the American Mathematical Society, 1973, 186: 273-289. [35] FUKUMIZU K, BACH F R, JORDAN M I. Kernel Dimension Reduction in Regression. The Annals of Statistics, 2009, 37(4): 1871-1905. [36] TSAMARDINOS I, ALIFERIS C F, STATNIKOV A. Time and Sample Efficient Discovery of Markov Blankets and Direct Causal Relations // Proc of the 9th ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining. New York, USA: ACM, 2003: 673-678. [37] ALIFERIS C F, TSAMARDINOS I, STATNIKOV A. HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection[C/OL]. [2021-12-22]. http://www.ccdlab.org/paper-pdfs/AMIA_2003.pdf. [38] PEÑA J M, BJÖRKEGREN J, TEGNÉR J. Scalable, Efficient and Correct Learning of Markov Boundaries under the Faithfulness Assumption // Proc of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Berlin, Germany: Springer, 2005: 136-147. [39] PEÑA J M, NILSSON R, BJÖRKEGREN J, et al. Towards Sca-lable and Data Efficient Learning of Markov Boundaries. International Journal of Approximate Reasoning, 2007, 45(2): 211-232. [40] FU S K, DESMARAIS M C. Fast Markov Blanket Discovery Algorithm via Local Learning within Single Pass // Proc of the Confe-rence of the Canadian Society for Computational Studies of Intelligence. Berlin, Germany: Springer, 2008: 96-107. [41] ZENG Y F, HE X, XIANG Y P, ,et al. Dynamic Ordering-Based Search Algorithm for Markov Blanket Discovery // Proc of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin. Germany: Springer, 2011, II: 420-431. [42] GAO T, JI Q. Efficient Markov Blanket Discovery and Its Application. IEEE Transactions on Cybernetics, 2017, 47(5): 1169-1179. [43] LING Z L, YU K, WANG H, et al. BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection. ACM Transactions on Intelligent Systems and Technology, 2019, 10(5): 1-25. [44] WANG H, LING Z L, YU K, et al. Towards Efficient and Effective Discovery of Markov Blankets for Feature Selection. Information Sciences, 2020, 509: 227-242. [45] DE MORAIS S R, AUSSEM A. A Novel Scalable and Data Efficient Feature Subset Selection Algorithm // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2008, II: 298-312. [46] DE MORAIS S R, AUSSEM A, CORBEX M. Handling Almost-Deterministic Relationships in Constraint-Based Bayesian Network Discovery: Application to Cancer Risk Factor Identification // Proc of the European Symposium on Artificial Neural Networks-Advances in Computational Intelligence and Learning Bruges. Berlin, Germany: Springer, 2008: 101-106. [47] WU X Y, JIANG B B, YU K, et al. Accurate Markov Boundary Discovery for Causal Feature Selection. IEEE Transactions on Cybernetics, 2020, 50(12): 4983-4996. [48] WU X Y, JIANG B B, YU K, et al. Separation and Recovery Markov Boundary Discovery and Its Application in EEG-Based Emotion Recognition. Information Sciences, 2021, 571: 262-278. [49] GUO X J, YU K, CAO F Y, et al. Error-Aware Markov Blanket Learning for Causal Feature Selection. Information Sciences, 2022, 589: 849-877. [50] NIINIMÄKI T, PARVIAINEN P. Local Structure Discovery in Ba-yesian Networks // Proc of the 28th Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI, 2012: 634-643. [51] GAO T, JI Q. Efficient Score-Based Markov Blanket Discovery. International Journal of Approximate Reasoning, 2017, 80: 277-293. [52] NATSOULIS G, EL GHAOUI L, LANCKRIET G R G, et al. Classification of a Large Microarray Data Set: Algorithm Comparison and Analysis of Drug Signatures. Genome Research, 2005, 15(5): 724-736. [53] LIU H W, LIU L, ZHANG H J. Ensemble Gene Selection for Cancer Classification. Pattern Recognition, 2010, 43(8): 2763-2772. [54] FLEURET F. Fast Binary Feature Selection with Conditional Mu-tual Information. The Journal of Machine Learning Research, 2004, 5: 1531-1555. [55] LIU H W, LIU L, ZHANG H J. Ensemble Gene Selection by Grouping for Microarray Data Classification. Journal of Biomedical Informatics, 2010, 43(1): 81-87. [56] YU K, WANG D W, DING W, et al. Tornado Forecasting with Multiple Markov Boundaries // Proc of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2015: 2237-2246. [57] CAI R C, ZHANG Z J, HAO Z F. BASSUM: A Bayesian Semi-Supervised Method for Classification Feature Selection. Pattern Recognition, 2011, 44(4): 811-820. [58] SECHIDIS K, BROWN G. Markov Blanket Discovery in Positive-Unlabelled and Semi-Supervised Data // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2015, I: 351-366. [59] SECHIDIS K, BROWN G. Simple Strategies for Semi-Supervised Feature Selection. Machine Learning, 2018, 107(2): 357-395. [60] WANG J L, ZHAO P L, HOI S C H, et al. Online Feature Selection and Its Applications. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(3): 698-710. [61] WU X D, YU K, DING W, et al. Online Feature Selection with Streaming Features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(5): 1178-1192. [62] YOU D L, LI R Q, SUN M M, et al. Online Markov Blanket Discovery with Streaming Features // Proc of the IEEE International Conference on Knowledge Graph. Washington, USA: IEEE, 2020: 92-99. [63] WU X D, YU K, WANG H, et al. Online Streaming Feature Selection // Proc of the 27th International Conference on Machine Learning. New York, USA: ACM, 2010: 1159-1166. [64] YU K, DING W, SIMOVICI D A, et al. Mining Emerging Patterns by Streaming Feature Selection // Proc of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2012, 60-68. [65] YU K, WU X D, DING W, et al. Exploring Causal Relationships with Streaming Features. The Computer Journal, 2012, 55(9): 1103-1117. [66] YU K, DING W, SIMOVICI D A, et al. Classification with Strea-ming Features: An Emerging-Pattern Mining Approach. ACM Transactions on Knowledge Discovery from Data, 2015, 9(4): 1-31. [67] KHAN W, KONG L F, BREKHNA B, et al. Online Streaming Features Selection via Markov Blanket. Symmetry, 2022, 14(1). DOI: 10.3390/sym14010149. [68] LIU C F, YANG S, YU K. Markov Boundary Learning with Strea-ming Data for Supervised Classification. IEEE Access, 2020, 8: 102222-102234. [69] ROURE J, MOORE A W. Sequential Update of ADtrees // Proc of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006: 769-776. [70] YU K, LIU L, LI J Y. Learning Markov Blankets from Multiple Interventional Data Sets. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(6): 2005-2019. [71] YU K, LIU L, LI J Y, et al. Multi-source Causal Feature Selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(9): 2240-2256. [72] PETERS J, BÜHLMANN P, MEINSHAUSEN N. Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals. Journal of the Royal Statistical Society Series B(Statistical Methodology), 2016, 78(5): 947-1012. [73] LIU X Q, LIU X S. Markov Blanket and Markov Boundary of Multiple Variables. The Journal of Machine Learning Research, 2018, 19(43): 1658-1707. [74] WU X Y, JIANG B B, YU K, et al. Multi-label Causal Feature Selection. Proceeding of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 6430-6437. [75] WU X Y, JIANG B B, ZHONG Y, et al. Multi-label Causal Variable Discovery: Learning Common Causal Variables and Label-Specific Causal Variables[C/OL].[2021-12-22]. https://arxiv.org/pdf/2011.04176.pdf. [76] YU K, LIU L, LI J Y. A Unified View of Causal and Non-causal Feature Selection. ACM Transactions on Knowledge Discovery from Data, 2021, 15(4): 1-46. [77] LEWIS D D. Feature Selection and Feature Extraction for Text Ca-tegorization // Proc of the Workshop on Speech and Natural Language, Stroudsburg, USA: ACL, 1992: 212-217. [78] PENG H C, LONG F H, DING C. Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. [79] YANG H H, MOODY J. Data Visualization and Feature Selection: New Algorithms for Nongaussian Data // Proc of the 12th International Conference on Advances in Neural Information Processing Systems. San Diego, USA: NeurIPS, 1999: 687-702. [80] LAURITZEN S L, SPIEGELHALTER D J. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of The Royal Statistical Society: Series B(Methodology), 1988, 50(2): 157-224. [81] KORB K B, NICHOLSON A E. Bayesian Artificial Intelligence.2nd Edition. Boca Raton, USA: CRC Press, 2011. [82] SCUTARI M, DENIS J B. Bayesian Networks: With Examples in R. Boca Raton, USA: CRC Press, 2014. [83] BEINLICH I A, SUERMONDT H J, CHAVEZ R M, et al. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks // Proc of the 2nd European Conference on Artificial Intelligence in Medicine. Berlin, Germany: Springer, 1989: 247-256. [84] BINDER J, KOLLER D, RUSSELL S, et al. Adaptive Probabilis-tic Networks with Hidden Variables. Machine Learning, 1997, 29(2/3): 213-244. [85] SPIEGELHALTER D J, COWELL R G. Learning in Probabilistic Expert Systems. Bayesian Statistics, 1992, 4: 447-465. [86] CONATI C, GERTNER A S, VANLEHN K, et al. On-Line Stu-dent Modeling for Coached Problem Solving Using Bayesian Networks // Proc of the 6th International Conference on User Mode-ling. Berlin, Germany: Springer, 1997: 231-242. [87] JENSEN C S, KONG A. Blocking Gibbs Sampling for Linkage Analysis in Large Pedigrees with Many Loops. The American Journal of Human Genetics, 1999, 65(3): 885-901. [88] ONISKO A, DRUZDZEL M J, WASYLUK H.A Probabilistic Causal Model for Diagnosis of Liver Disorders[C/OL]. [2021-12-22].https://sites.pitt.edu/~druzdzel/psfiles/malbork.pdf. [89] TSAMARDINOS I, BROWN L E, ALIFERIS C F. The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 2006, 65(1): 31-78. [90] SCUTARI M. Learning Bayesian Networks with the Bnlearn R Package[C/OL]. [2021-12-22]. https://arxiv.org/pdf/0908.3817.pdf. [91] STATNIKOV A, TSAMARDINOS I, BROWN L E, et al. Causal Explorer: A Matlab Library of Algorithms for Causal Discovery and Variable Selection for Classification[C/OL]. [2021-12-22]. http://proceedings.mlr.press/v3/supplemental/Software_WCCI08_challenge.pdf. [92] YU K, GUO X J, LIU L, et al. Causality-Based Feature Selection: Methods and Evaluations. ACM Computing Surveys, 2021, 53(5): 1-36. [93] Mens X Machina. Probabilistic Graphical Model Toolbox[D/OL]. [2021-12-22]. http://mensxmachina.org/en/software/probabilistic-graphical-model-toolbox. [94] KALISCH M, MÄCHLER M, COLOMBO D, et al. Causal Infe-rence Using Graphical Models with the R Package pcalg. Journal of Statistical Software, 2012, 47(11): 1-26. [95] MURPHY K P. The Bayes Net Toolbox for Matlab. Computing Science and Statistics, 2001, 33: 1024-1034. [96] SCHEINES R, SPIRTES P, GLYMOUR C, et al. The TETRAD Project: Constraint Based Aids to Causal Model Specification. Multivariate Behavioral Research, 1998, 33(1): 65-117. [97] YU Y, CHEN J, GAO T, et al. DAG-GNN: DAG Structure Lear-ning with Graph Neural Networks[C/OL]. [2021-12-22]. http://proceedings.mlr.press/v97/yu19a/yu19a.pdf.