|
|
How to Add Transparency to Artificial Neural Networks |
HU BaoGang, WANG Yong, YANG ShuangHong, QU HanBing |
Institute of Automation, Chinese Academy of Sciences, Beijing 100080 Graduate School, Chinese Academy of Sciences, Beijing 100080 |
|
|
Abstract The main issue about “black box” inherent in artificial neural networks (ANN’s) is discussed. Adding transparency is well recognized to be an effective solution to dealing with this problem. Significant benefits are obtained through using this approach, such as providing a certain degree of comprehensive power, decreasing model size, speeding learning process and improving generalization capability. A hierarchical classification is applied to the existing approaches for better understanding of their intrinsic features and limitations. The first level of classification is made by two strategies: building prior knowledge into neural networks; extracting rules embedded within networks. Most of important approaches are introduced and compared in detail with further classifications within each strategy. Finally, the personal perspectives to the studies of machine learning are presented. Other objective functions are suggested for the extension of studies, such as performancetocost ratio and transparency. The study of increasing transparency to ANN’s is considered as the most fundamental and direct solution to the other existing issues. A new machine learning approach called Knowledge Increasing via Feedback is proposed.
|
Received: 27 July 2006
|
|
|
|
|
[1] Duda R O, Hart P E, Stork D. Pattern Classification. 2nd Edition. New York, USA: John Willy, 2001 (Duda R O, Hart P E, Stork D. 模式分类.李宏东,姚天翔,译.北京:机械工业出版社, 2003) [2] Haykin S. Neural Networks: A Comprehensive Foundation. 2nd Edition. New York, USA: Printice Hall, 1999 (Haykin S. 神经元网络原理.叶世伟,史忠植,译.北京:机械工业出版社, 2004) [3] Mitchell T M. Machine Learning. New York, USA: McGrawHill, 1997 (Mitchell T M. 机器学习.曾华军,张银奎,译.北京:机械工业出版社, 2003) [4] Vapnik V. Statistical Learning Theory. New York, USA: John Wiley and Sons, 1998 (Vapnik V. 统计学习理论.许建华,张学工,译.北京:电子工业出版社, 2004) [5] AbuMostafa Y S. Learning from Hints in Neural Networks. Journal of Complexity, 1990, 6(2): 192198 [6] Joerding W H, Meador J L. Encoding a Priori Information in Feedforward Networks. Neural Networks, 1991, 4(6): 847856 [7] Niyogi P, Girosi F, Poggio T. Incorporating Prior Information in Machine Learning by Creating Virtual Examples. Proc of the IEEE, 1998, 86(11): 21962209 [8] Scholkopf B, Simard P, Smola A, Vapnik V. Prior Knowledge in Support Vector Kernels // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems 10. Cambridge, USA: MIT Press, 1998: 640646 [9] Opitz D, Shavlik J. Connectionist Theory Refinement: Genetically Searching the Space of Network Topologies. Journal of Artificial Intelligence Research, 1997, 6(1): 177209 [10] AbuMostafa Y S. Financial Model Calibration Using Consistency Hints. IEEE Trans on Neural Networks, 2001, 12(4): 791808 [11] 吕柏权,村田纯一,平泽宏太朗.使用三层神经网络的先验信息新学习方法.中国科学E辑, 2004, 34(4): 374390 [12] Barnard E, Casasent D. Invariance and Neural Nets. IEEE Trans on Neural Networks, 1991, 2(5): 498508 [13] Poggio T, Girosi T. Networks for Approximation and Learning. Proc of the IEEE, 1990, 78(9): 14811497 [14] Goutte C, Hansen L K. Regularization with a Pruning Prior. Neural Networks, 1997, 10(6): 10531059 [15] Hagiwara K. Regularization Learning, Early Stopping and Biased Estimator. Neurocomputing, 2002, 48(1): 937955 [16] Waibel A. Modular Construction of TimeDelay Neural Networks for Speech Recognition. Neural Computation, 1989, 1(1): 3946 [17] Lampinen J, Vehtari A. Bayesian Approach for Neural Networks-Review and Case Studies. Neural Networks, 2001, 14(3): 257274 [18] Smith S A. A Derivation of Entropy and the Maximum Entropy Criterion in the Context of Decision Problems. IEEE Trans on Systems, Man, and Cybernetics, 1974, 4(1): 157184 [19] Bernardo J M, Smith A F M. Bayesian Theory. New York, USA: Wiley, 1994 [20] Lin C T, Lee C S G. NeuralNetworkBased Fuzzy Logic Control and Decision System. IEEE Trans on Computers, 1991, 40(12): 13201336 [21] Towell G, Shavlik J. KnowledgeBased Artificial Neural Networks. Artificial Intelligence, 1994, 70(1/2): 119165 [22] Thrun S. Explanation Based Neural Network Learning: A Lifelong Learning Approach. Boston, USA: Kluwer Academic Publisher, 1996 [23] Psichogios D, Ungar L H. A Hybrid Neural Network-First Principles Approach to Process Modeling. AICHE Journal, 1992, 38(10): 14991511 [24] CozzioBueler R A. The Design of Neural Networks Using a Priori Knowledge. Ph.D Dissertation. Zurich, Switzerland: Swiss Federal Institute of Technology, 1995 [25] Huang Deshuang. A Constructive Approach for Finding Arbitrary Roots of Polynomials by Neural Networks. IEEE Trans on Neural Networks, 2004, 15(2): 477491 [26] Hu B G, Qu H B, Wang Y, Yang S H. A Generalized Constraint Neural Networks Model: Associating Partially Known Relationships for Nonlinear Regressions [EB/OL]. Submitted to IEEE Trans on Neural Networks, 2005[20070202]. http://liama.ia.ac.cn/hubg/paper.html [27] Hu Baogang, Ying Hao. Review of Fuzzy PID Control Techniques and Some Important Issues. Acta Automatica Sinica, 2001, 27(4): 374390(in Chinese) (胡包钢,应 浩.模糊PID控制技术研究发展回顾及其面临的若干重要问题.自动化学报, 2001, 27(4): 567584) [28] Schapire R E, Rochery M, Rahim M, Gupta N. Boosting with Prior Knowledge for Call Classification. IEEE Trans on Speech and Audio Processing, 2005, 13(2): 174181 [29] Johansen T A. Identification of NonLinear Systems Using Empirical Data and Prior Knowledge-An Optimization Approach. Automatica, 1996, 32(3): 337356 [30] Bishop C. Improving the Generalization Properties of Radial Basis Function Neural Networks. Neural Computation, 1991, 3(4): 579588 [31] Karras D A, Perantonis S J. An Efficient Constrained Training Algorithm for Feedforward Networks. IEEE Trans on Neural Networks, 1995, 6(6): 14201434 [32] Wilson J A, Zorzetto L F M. A Generalised Approach to Process State Estimation Using Hybrid Artificial Neural Network/Mechanistic Model. Computers and Chemical Engineering, 1997, 21(9): 951963 [33] Xue Fuzheng, Bai Jie. Nonlinear Modeling and Predictive Control Based on Prior Knowledge and Neural Networks. Journal of System Simulation, 2004, 16(5): 10571059,1063(in Chinese) (薛福珍,柏 洁.基于先验知识和神经网络的非线性建模与预测控制.系统仿真学报, 2004, 16(5): 10571059,1063) [34] Sjoberg J, Zhang Q, Ljung L, Benveniste A, Delyon B, Glorennec P Y, Hjalmarsson H, Juditsky A. Nonlinear BlackBox Modeling in System Identification: A Unified Overview. Automatica, 1995, 31(12): 16911724 [35] Gallant S I. Connectionist Expert Systems. Communications of the ACM, 1988, 31(2): 152169 [36] Jang J S R. ANFIS: AdaptiveNetworkBased Fuzzy Inference System. IEEE Trans on Systems, Man, and Cybernetics, 1993, 23(3): 665685 [37] McGarry K, Wermter S, MacIntyre J. Hybrid Neural Systems: from Simple Coupling to Fully Integrated Neural Networks. Neural Computing Surveys, 1999, 2(1): 6293 [38] Liu B. Theory and Practice of Uncertain Programming. Heidelberg, Germany: PhysicaVerlag, 2002 [39] Mitra S, Pal S K, Mitra P. Data Mining in Soft Computing Framework: A Survey. IEEE Trans on Neural Networks, 2002, 13(1): 314 [40] Zhou Zhihua, Chen Shifu. Rule Extraction from Neural Networks. Journal of Computer Research and Development, 2002, 39(4): 398405(in Chinese) (周志华,陈世福.神经网络规则抽取.计算机研究与发展, 2002, 39(4): 398405) [41] Zhang Zhaohui, Lu Yuchang, Zhang Bo. Discovering Classification Rules by Using the Neural Networks. Chinese Journal of Computers, 1999, 22(1): 108112(in Chinese) (张朝晖,陆玉昌,张 钹.利用神经网络发现分类规则.计算机学报, 1999, 22(1): 108112) [42] Schmitz G P J, Aldrich C, Gouws F S. ANNDT: An Algorithm for Extraction of Decision Trees from Artificial Neural Networks. IEEE Trans on Neural Networks, 1999, 10(6): 13921401 [43] Buntine W. Graphical Models for Discovering Knowledge // Fayyad U M, PiatetskyShapiro G, Smyth P, Uthurusay R, eds. Advances in Knowledge Discovery and Date Mining. Cambridge, USA: MIT Press, 1995: 5983 [44] Gevrey M, Dimopoulos I, Lek S. Review and Comparison of Methods to Study the Contribution of Variable in Artificial Neural Network models. Ecological Modeling, 2003, 160(2): 249264 [45] Tzeng F Y, Ma K L. Opening the Black Box-Data Driven Visualization of Neural Networks // Proc of the IEEE Conference on Visualization. Hangkong, China, 2005: 383390 [46] Garson G D. Interpreting NeuralNetwork Connection Weights. AI Expert, 1991, 6(4): 4751 [47] Olden J D, Jackson D A. Illuminating the BlackBox: A Randomization Approach for Understanding Variable Contributions in Artificial Neural Networks. Ecological Modeling, 2002, 154(1): 135150 [48] Craven M W, Shavlik J W. Visualizing Learning and Computation in Artificial Neural Networks. International Journal on Artificial Intelligence Tools, 1991, 1(3): 399425 [49] Geng Xin, Zhan Dechuan, Zhou Zhihua. Supervised Nonlinear Dimensionality Reduction for Visualization and Classification. IEEE Trans on Systems, Man, and Cybernetics, 2005, 35(6): 10981107 [50] Keller T, Gerjets P, Scheiter K, Garsoffky B. Information Visualizations for Knowledge Acquisition: The Impact of Dimensionality and Color Coding. Computers in Human Behavior, 2006, 22(1): 4365 [51] Yuan Zhuzhi, Chen Zengqiang, Li Xiang. A Survey of Connectionist Intelligent Control. Acta Automatica Sinica, 2002, 28(Supplement): 3859 (in Chinese) (袁著祉, 陈增强,李 翔.联结主义智能控制综述.自动化学报, 2002, 28(增刊): 3859) [52] Chen T P, Chen H. Universal Approximation to Nonlinear Operators by Neural Networks with Arbitrary Activation Functions and Its Application to Dynamical Systems. IEEE Trans on Neural Networks, 1995, 6(4): 911917 [53] Churchland P S, Sejnowski T J. The Computational Brain. Cambridge, USA: MIT Press, 1992 [54] Yan Pingfan, Huang Duanxu. Artificial Neural Networks-Model Analysis and Application. Hefei, China: Anhui Education Press, 1993 (in Chinese) (阎平凡,黄端旭.人工神经网络——模型,分析与应用.合肥:安徽教育出版社,1993) [55] Wang Jue, Shi Chunyi. Discussion on Knowledge Representation. Chinese Journal of Computers, 1995, 18(3): 212224 (in Chinese) (王 珏,石纯一.关于知识表示的讨论.计算机学报, 1995, 18(3): 212224) [56] Shi Zhongzhi. Machine Learning. Beijing, China: Tsinghua University Press, 2002(in Chinese) (史忠植.机器学习.北京:清华大学出版社, 2002) [57] Mallat S. A Wavelet Tour of Signal Processing. New York, USA: Academic Press, 1998 (Mallat S. 信号处理的小波引导.杨力华,戴道清,董文良,湛秋辉,译.北京:机械工业出版社, 2002) [58] Xu L. BYY Harmony Learning, Independent State Space, and Generalized APT Financial Analyses. IEEE Trans on Neural Networks, 2001, 12(4): 822849 |
|
|
|