1.
Text classification and Naïve Bayes:
Many users have ongoing information
needs, One way of doing this is to issue the query multicore against an index
of recent newswire articles each morning. A standing query is like any other query
except that it is periodically executed on a collection to which new documents
are incrementally added over time.
2.
Vector space classification
Adopt a different representation for
text classification, the vector space model. It represents each document as a
vector with one real-valued component, usually a tf-idf weight, for each term.
3.
Flat clustering
(1)
Clustering algorithms group a
set of documents into subsets CLUSTER or clusters. The algorithms’ goal is to
create clusters that are coherent internally, but clearly different fromeach
other. In other words, documents within a cluster should be as similar as
possible; and documents in one cluster should be as dissimilar as possible from
documents in other clusters.
(2)
Cluster hypothesis. Documents
in the same cluster behave similarly with respect to relevance to information
needs.
4.
Hierarchical clustering: single-link,
completelink, group-average, and centroid similarity
No comments:
Post a Comment