Not Logged In

PapersDB

Combining Naive Bayes and n-Gram Language Models for Text Classification

Fuchun Peng, Department of Computer Science, University of Massachusetts at Amherst

Dale Schuurmans, AICML

Full Text: ecir03.ps

We augment the naive Bayes model with an n�gram lan� guage model to address two shortcomings of naive Bayes text classifiers. The chain augmented naive Bayes classifiers we propose have two ad� vantages over standard naive Bayes classifiers. First, a chain augmented naive Bayes model relaxes some of the independence assumptions of naive Bayes---allowing a local Markov chain dependence in the observed variables---while still permitting e#cient inference and learning. Second, smoothing techniques from statistical language modeling can be used to recover better estimates than the Laplace smoothing techniques usu� ally used in naive Bayes classification. Our experimental results on three real world data sets show that we achieve substantial improvements over standard naive Bayes classification, while also achieving state of the art performance that competes with the best known methods in these cases.

Citation

"Combining Naive Bayes and n-Gram Language Models for Text Classification"

. ECIR, January 2003.

Keywords: naive, Bayes, machine learning Category: In Conference

BibTeX

@incollection{Peng+Schuurmans:ECIR03, author = {Fuchun Peng and Dale Schuurmans}, title = {Combining Naive Bayes and n-Gram Language Models for Text Classification}, booktitle = {}, year = 2003, }

Last Updated: June 01, 2007

Submitted by Staurt H. Johnson