Using the LISp-Miner System for Credit Risk Assessment

Petr Berka

Abstract


Credit risk assessment, credit scoring or loan applications approval are one of the typical tasks that can be solved using machine learning or data mining techniques. From this perspective, loan applications evaluation is a classification task, in which the final decision can be either a crisp yes/no decision about the loan or a numeric score expressing the financial standing of the applicant. The knowledge to be used is inferred from data about past decisions. These data usually consists off socio-demographic characteristics, economic characteristics (e.g. income, deposit), the characteristics of the loan, and, the loan approval decision. A number of machine learning algorithms can be used for this purpose.

In this paper we show how this task can be solved using the LISp-Miner system, a tool that is under development at the University of Economics, Prague. LISp-Miner is primary oriented on mining for various types of association rules, but unlike "classical" association rules proposed by Agrawal, LISp-Miner introduces a greater variety of different types of relations between left-hand and right-hand side of a rule. Beside this, two other procedures that can be used for classification task are implemented in LISp-Miner as well. We describe the 4FT-Miner and KEX procedures and show how they can be used to analyze data about loan applications. We also compare the results obtained using the presented algorithms with results from standard rule learning methods.


Keywords


Data mining, Decision rules, Association rules, Credit scoring

References


AGRAWAL R., IMIELINSKI T., SAWAMI A. Mining associations between sets of items in massive databases. In: Proc. of the ACM-SIGMOD Int. Conference on Management of Data, Washington D.C., 1993, pp. 207--216, doi: 10.1145/170036.170072.

BERKA P. Workshop Notes on Discovery Challenge, Prague, Univ. of Economics, 1999.

BERKA P. ETree Miner - A New GUHA Procedure for Building Exploration Trees. In KRYSZKIEWICZ, RYBINSKI, SKOWRON, RAS, eds, Foundations of Intelligent Systems. 19th Int. Symposium on Methodologies for Intelligent Systems (ISMIS 2011), Springer LNCS 6804, 2011, pp. 96--101, doi: 10.1007/978-3-642-21916-0_11.

BERKA P. Learning compositional decision rules using the KEX algorithm, Intelligent Data Analysis, Vol. 16, No. 4, 2012, pp. 665--681, doi: 10.3233/IDA-2012-0543.

BRUHA I., BERKA P. Empirical Comparison of Various Discretization Procedures. Int.J. of Pattern recognition and Artificial Intelligence, Vol. 12 No. 7, 1998, pp. 1017--1032, doi: 10.1142/S0218001498000567.

CENDROWSKA J. PRISM: An algorithm for inducing modular rules. Int. J. of Man-Machine Studies, 27(4), 1987, pp. 349--370, doi: 10.1016/s0020-7373(87)80003-2.

COHEN W. Fast Effective Rule Induction. In: 12th International Conference on Machine Learning, 1995, pp. 115--123, doi: 10.1016/b978-1-55860-377-6.50023-2.

DUDA R.O., GASCHING, J.E., HART, P. Model Design in the Prospector Consultant System for Mineral Exploration. In: WEBER, NILSSON ed., Readings in Artificial Intelligence, Elsevier, 1981, doi: 10.1016/B978-0-934613-03-3.50028-3.

FAYYAD U., IRANI K. Multi-interval discretization of continuous-valued attributes for classification learning, In: Proc. 13th Joint Conf. of Artificial Intelligence (IJCAI'93), 1993, pp. 1022--1027.

FRANK E., WITTEN I.H. Generating Accurate Rule Sets Without Global Optimization, In: Fifteenth International Conference on Machine Learning, 1998, pp. 144--151.

GAINES B.R., COMPTON P. Induction of Ripple-Down Rules Applied to Modeling Large Databases J. Intell. Inf. Syst., 5(3), 1995, pp. 211--228, doi: 10.1007/BF00962234.

GALINDO J., TAMAYO P. Credit Risk Assessment using Statistical and Machine Learning: Basic Methodology and Risk Modeling Applications. Computational Economics, Vol. 15, No. 1-2, 2000, pp. 107--143.

HAJEK P. Int. J. Man-Machine Studies 22, 1985, pp. 59-76, doi: 10.1016/S0020-7373(85)80077-8.

HAJEK, P., HAVRANEK, T. Mechanising Hypothesis Formation - Mathematical Foundations for a General Theory. Springer, 1978, doi: 10.1007/978-3-642-66943-9.

HALL, M., et al. The WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue 1, 2009, doi: 10.1145/1656274.1656278.

HAND D., MANILLA H., SMYTH P. Principles of Data Mining. MIT Press, 2002, ISBN 0-262-08290-X.

CHAPMAN P., et al. CRISP-DM 1.0 Step-by-step data mining guide. SPSS Inc., 2000,

CHEN W., et al. Credit risk Evaluation by hybrid data mining technique. Systems Engineering Procedia 3, 2012, pp. 194--200, doi: 10.1016/j.sepro.2011.10.029.

KERBER R. ChiMerge: Discretization of Numeric Attributes. In: Proc. AAAI-92 conference, AAAI Press, 1992, pp. 123--128.

KIM K.S., HWANG H.J. An Integrated Data Mining Model for Customer Credit Evaluation. In: Proc. Int. Conf. Computational Science and Its Applications ICCSA, Springer LNCS 3482, 2005, pp. 798--805, doi: 10.1007/11424857_87.

KOTSIANTIS S. Credit risk analysis using a hybrid data mining model. Int. J. Intelligent Systems Technologies and Applications, Vol. 2, No. 4, 2007, pp. 345-356, doi: 10.1504/ijista.2007.014030.

LEE T.S., et al. Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics & Data Analysis, Volume 50, Issue 4, 2006, pp. 1113--1130, doi: 10.1016/j.csda.2004.11.006.

LEE C., SHIN D. A context-sensitive discretization of numeric attributes for classification learning, In: COHN ed., 11th European Conf. on Artificial Intelligence (ECAI'94), John Wiley, 1994, pp. 428--432.

QUINLAN J.R. Simplifying decision trees. Int J Man-Machine Studies 27, 1987, pp. 221--234, doi: 10.1016/S0020-7373(87)80053-6.

RAS Z., WIECZORKOWSKA A. Action-Rules: How to Increase Profit of a Company. In: ZIGHED, KOMOROWSKI, ZYTKOW, eds., Principles of Data Mining and Knowledge Discovery, Springer, 2000, pp. 587--592, doi: 10.1007/3-540-45372-5_70.

RAUCH J. Observational Calculi and Association Rules. Springer, 2013, doi: 10.1007/978-3-642-11737-4.

SIMUNEK M. Academic KDD Project LISp-Miner. In: ABRAHAM, FRANKE, KOPPEN, eds., Advances in Soft Computing Inteligent Systems Design and Applications, Springer-Verlag, 2003, pp. 263--272, doi: 10.1007/978-3-540-44999-7_25.

VINCIOTTI V., HAND D.J. Scorecard construction with unbalanced class sizes. Journal of the Iranian Statistical Society, Vol. 2, No. 2, 2003, pp. 189--205.

ZHOU L., WANG W. Loan Default Prediction on Large Imbalanced Data Using Random Forests. TELKOMNIKA Indonesian Journal of Electrical Engineering. Vol.10, No.6, 2012, pp. 1519--1525, doi: 10.11591/telkomnika.v10i6.1323.




DOI: http://dx.doi.org/10.14311/NNW.1901.%25x

Refbacks

  • There are currently no refbacks.


Should you encounter an error (non-functional link, missing or misleading information, application crash), please let us know at nnw.ojs@fd.cvut.cz.
Please, do not use the above address for non-OJS-related queries (manuscript status, etc.).
For your convenience we maintain a list of frequently asked questions here. General queries to items not covered by this FAQ shall be directed to the journal editoral office at nnw@fd.cvut.cz.