Abstract A generalized method is presented for unsupervised text clustering. The relevance of the features to the mixture components is introduced to the mixture model as a set of latent variables. Then the model selection, feature selection and parameter estimation of the mixture model are integrated into one general framework. Experimental results on four large scale document datasets show that the proposed method achieves fine results in model selection, feature selection and clustering performance.
