L’usage des entropies est-il justifie en apprentissage a partir des donnees?Is it justified to use entropy measures in machine learning applications?

Djmal Abdelkader Zighed

Resumo


De nombreux algorithmes d’apprentissage machine utilisent les mesures d’entropie comme critère de construction qu’ils cherchent ensuite à optimiser. Parmi les mesures le plus employées, l’entropie de Shannon est certainement la plus populaire. Cependant, dans les applications réelles, l’usage des mesures d’entropie s’avère totalement inapproprié à la fois sur le plan pratique et sur le plan théorique. De nombreuses hypothèses sont en fait retenues de manière implicites alors qu’elles sont infondées. Dans cette présentation, nous allons essayer d’identifier ces hypothèses sous-jacentes et montrer qu’elles sont inadaptées en apprentissage à partir des données. Nous énoncerons ensuite, de façon intuitive d’abord, de nouvelles propriétés qui se requises pour définir des mesures pouvant déboucher sur des algorithmes plus efficients pour l’apprentissage machine.

 

 

Abstract

Many machine learning algorithms use entropy measures as a criterion of construction that they seek to optimize. Among the most applied measures, Shannon's entropy is certainly the most known. However, in the real world applications, the use of the entropy measure turns out to be totally inadequate both in theory and in practice. Indeed, many hypothesis are in fact implicitly assumed whereas they are unfounded, therefore unjustified. In this paper, we will try to identify those hypothesis and we will demonstrate that they are unsuitable in machine learning with real data. Then, we will introduce, intuitively, a set of new prosperities that should be required for measures that are supposed to lead to efficients algorithms.


Palavras-chave


Mesures d’entropie, Apprentissage machine

Texto completo:

PDF

Referências


ACZEL, J., Daroczy, Z.: On Measures of Information and Their Characterizations. Academic Press, NY, S. Francisco, London (1975)

BARANDELA, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36(3) (2003) 849–851

CHAI, X., Deng, L., Yang, Q., Ling: Test-cost sensitive naive bayes classification. In IEEE, ed.: ICDM apos;04. Fourth IEEE International Conference on Data Mining, ICDM’04 (2004) 973–978

CHEN, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. Technical Report 666, Berkeley, Department of Statistics, University of California (2004)

DOMINGOS, P.: Metacost: A general method for making classifiers cost-sensitive. Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99) (1999) 155–164

ELKAN, C.: The foundations of cost-sensitive learning. In Nebel, B., ed.: IJCAI, Morgan Kaufmann (2001) 973–978

FORTE, B.: Why shannon’s entropy. In Conv. Inform. Teor., 15 (1973) 137–152

HARTLEY, R.V.: Transmission of information. Bell System Tech. J. 7 (1928) 535–563

HENCIN, A.J.: The concept of entropy in the theory of probability. Math. Found. of Information Theory (1957) 1–28

PROVOST, F.: Learning with imbalanced data sets. Invited paper for the AAAI’2000 Workshop on Imbalanced Data Sets (2000)

RÉNYI, A.: On measures of entropy and information. 4th Berkely Symp. Math. Statist. Probability 1 (1960) 547–561

RITSCHARD, G., Zighed, D., Marcellin, S.: Données déséquilibrées, entropie décentrée et indice d’implication. In Gras, R., Orús, P., Pinaud, B., Gregori, P., eds.: Nouveaux apports théoriques à l’analyse statistique implicative et applications (actes des 4èmes rencontres ASI4, 18-21 octobre 2007), Castellón de la Plana (España), Departament de Matemàtiques, Universitat Jaume I (2007) 315–327

SHANNON, C.E.: A mathematical theory of communication. Bell System Tech. J. 27 (1948) 379–423

SHANNON, C.A., Weaver, W.: The mathematical of communication. University of Illinois Press (1949)

ZIGHED, D.A., Marcellin, S., Ritschard, G.: Mesure d’entropie asymétrique et consistante. In Noirhomme-Fraiture, M., Venturini, G., eds.: EGC. Volume RNTI-E-9 of Revue des Nouvelles Technologies de l’Information., Cépaduès-Éditions (2007) 81–86

ZIGHED, D., Rakotomalala, R.: Graphe d’induction: Apprentissage et Data Mining. Hermès, Paris (2000)

BREIMAN, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Chapman and Hall, New York (1984)

DOMINGOS, P.: The role of occam’s razor in knowledge discovery. Data mining and knowledge discovery 3(4) (1999) 409–425

LENCA, P., Lallich, S., Do, T.N., Pham, N.K.: A comparison of different off-centered entropies to deal with class imbalance for decision trees. In: Advances in Knowledge Discovery and Data Mining. Springer (2008) 634–643

MARCELLIN, S., Zighed, D.A., Ritschard, G.: Evaluating decision trees grown with asymmetric entropies. In: Foundations of Intelligent Systems. Springer (2008) 58–67

OLSHEN, L.B.J.F.R., Stone, C.J.: Classification and regression trees. Wadsworth International Group (1984)

PROVOST, F.J., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. Knowledge Discovery and Data Mining (1997) 43–48

QUINLAN, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

SEBBANÜ, M., NockO, R., Chauchat, J., Rakotomalala, R.: Impact of learning set quality and size on decision tree performances. IJCSS 1(1) (2000) 85


Métricas do artigo

Carregando Métricas ...

Metrics powered by PLOS ALM


INDEXADORES DA REVISTA