Friday, April 11, 2014

Week 13 Reading Notes: Text classification and clustering

1.       Text classification and Naïve Bayes:
Many users have ongoing information needs, One way of doing this is to issue the query multicore against an index of recent newswire articles each morning. A standing query is like any other query except that it is periodically executed on a collection to which new documents are incrementally added over time.
2.       Vector space classification
Adopt a different representation for text classification, the vector space model. It represents each document as a vector with one real-valued component, usually a tf-idf weight, for each term.
3.       Flat clustering
(1)     Clustering algorithms group a set of documents into subsets CLUSTER or clusters. The algorithms’ goal is to create clusters that are coherent internally, but clearly different fromeach other. In other words, documents within a cluster should be as similar as possible; and documents in one cluster should be as dissimilar as possible from documents in other clusters.
(2)     Cluster hypothesis. Documents in the same cluster behave similarly with respect to relevance to information needs.

4.       Hierarchical clustering: single-link, completelink, group-average, and centroid similarity

No comments:

Post a Comment