JavaScript Malicious Script Detection Algorithm Based on Multi-class Features
FU Lei-Peng, ZHANG Han, HUO Lu-Yang
College of Computer and Control Engineering, Nankai University, Tianjin 300071 Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300071
Abstract:Aiming at features of different levels in the script sample set, such as obfuscation, statistics and semantics, a malicious JavaScript script detection algorithm based on multi-class feature is proposed. The JavaScript analysis system, JavaScript codes analysis and detection, is implemented. The obfuscation features of the JavaScript are extracted and the obfuscated scripts are analyzed and deobfuscated by C4.5 algorithm. The static statistical features of the JavaScript are extracted, and according to the semantics, the JavaScript is serialized. Dangerous sequence tree is generated by the proposed algorithm to extract the dangerous sequence features of the malicious JavaScript. Three types of features are used as the input. The probabilistic neural network with strong ability to adapt to non-uniformity and the increasing quantity of the input samples is applied to construct the classifier for the detection of malicious JavaScript. The experimental results show that the proposed algorithmhas better detection accuracy and stability.
[1] Negrino T, Smith D. JavaScript: Visual QuickStart Guide. 8th Edition. Berkeley, USA: Peachpit Press, 2012 [2] Xu W, Zhang F F, Zhu S C. The Power of Obfuscation Techniques in Malicious JavaScript Code: A Measurement Study // Proc of the 7th International Conference on Malicious and Unwanted Software. Fajardo, USA, 2012: 9-16 [3] Likarish P, Jung E J, Jo I. Obfuscated Malicious JavaScript Detection Using Classification Techniques // Proc of the 4th International Conference on Malicious and Unwanted Software. Montreal, Canada, 2009: 47-54 [4]Egele M, Wurzinger P, Kruegel C, et al. Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks // Proc of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Como, Italy, 2009: 88-106 [5] Fraiwan M, Al-Salman R, Khasawneh N, et al. Analysis and Identification of Malicious JavaScript Code. Information Security Journal: A Global Perspective, 2012, 21(1): 1-11 [6] Blanc G, Ando R, Kadobayashi K. Term-Rewriting Deobfuscation for Static Client-Side Scripting Malware Detection[EB/OL]. [2014-09-28]. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5720649 [7] Blanc G, Miyamoto D, Akiyama M, et al. Characterizing Obfuscated JavaScript Using Abstract Syntax Trees: Experimenting with Malicious Scripts // Proc of the 26th International Conference on Advanced Information Networking and Applications Workshops. Fukuoka, Japan, 2012: 344-351 [8] Al-Taharwa I A, Lee H M, Jeng A B, et al. RedJsod: A Readable JavaScript Obfuscation Detector Using Semantic-Based Analysis // Proc of the 11th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. Liverpool, UK, 2012: 1370-1375 [9] Lu G, Debray S. Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach // Proc of the 6th IEEE International Conference on Software Security and Reliability. Gaithersburg, USA, 2012: 31-40 [10] Cova M, Kruegel C, Vigna G. Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code // Proc of the 19th International World Wide Web Conference. Raleigh, USA, 2010: 281-290 [11] Nazario J. PhoneyC: A Virtual Client Honeypot[EB/OL]. [2014-09-28]. https://www.usenix.org/legacy/events/leet09/tech/full_papers/nazario/nazario.pdf [12] Harstein B. Jsunpack[EB/OL]. [2012-04-08]. http://jsunpack.jeek.org/dec/go [13] Curtsinger C, Livshits B, Zorn B G, et al. ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection[EB/OL]. [2014-09-28]. http://research.microsoft.com/en-us/um/people/livshits/papers/pdf/usenixsec11b.pdf [14] Mozilla Foundation. Rhino: JavaScript for Java[EB/OL]. [2012-04-08]. http://www.mozilla.org/rhino [15] Canali D, Cova M, Vigna G, et al. Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages // Proc of the 20th International World Wide Web Conference. Hyderabad, India, 2011: 197-206 [16] Seifert C, Steenson R. Capture-HPC[EB/OL]. [2012-04-08]. https://projects.honeynet.org/capture-hpc [17] Dewald A, Holz T, Freiling F C. ADSandbox: Sandboxing JavaScript to Fight Malicious Websites[EB/OL]. [2012-04-08]. http://www.iseclab.org/papers/adsandbox-sac10.pdf [18] Rieck K, Krueger T, Dewald A. Cujo: Efficient Detection and Prevention of Drive-by-Download Attacks // Proc of the 26th Annual Computer Security Applications Conference. Austin, USA, 2010: 31-39 [19] Choi J H, Kim H Y, Choi C, et al.Efficient Malicious Code Detection Using N-Gram Analysis and SVM // Proc of the 14th International Conference on Network-Based Information Systems. Tirana, Albania, 2011: 618-621 [20] Specht D F. Probabilistic Neural Networks. Neural Networks, 1990, 3(1): 109-118 [21] Moody J, Darken C J. Fast Learning in Networks of Locally-Tuned Processing Units. Neural Computation, 1989, 1(2): 281-294 [22] Yang S Y. Matlab Technology to Realize the Pattern Recognition and Intelligent Computing. Beijing, China: Publishing House of Electronics Industry, 2008 (in Chinese) (杨淑莹.模式识别与智能计算——Matlab技术实现.北京:电子工业出版社, 2008) [23] Quinlan J R. C4.5: Programs for Machine Learning. San Mateo, USA: Morgan Kaufmann Publishers, 1993 [24] Lielmanis E. JS Beautifier[EB/OL]. [2012-04-08]. http:// jsbeautifier.org [25] Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd Edition. New York, USA: John Wiley & Sons, 2000 [26] Pandya A S, Macy R B. Pattern Recognition with Neural Networks in C++. Boca Raton, USA: CRC Press, 1996 [27] Kirkby R, Frank E. WEKA[EB/OL]. [2012-04-08]. http://www.cs.waikato.ac.nz/ml/weka/ [28] Han J W, Kamber M, Pei J. Data Mining: Concepts and Techniques. 3rd Edition. Waltham, USA: Morgan Kaufmann Publi-shers, 2011 [29] Kojm T. Clam AntiVirus[EB/OL]. [2012-04-08]. http://www.clamav.net