Quote from Wikipedia, the free encyclopedia
Text mining, sometimes alternately referred to as text
data mining, roughly equivalent to
text analytics, refers generally to the process of deriving high-quality
information from text. High-quality information is typically derived through the dividing of patterns and trends through means such as
statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a
database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of
relevance,
novelty, and interestingness. Typical text mining tasks include
text categorization,
text clustering,
concept/entity extraction, production of granular taxonomies,
sentiment analysis,
document summarization, and entity relation modeling (i.e., learning relations between
named entities).
HistoryLabour-intensive manual text-mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance swiftly during the past decade. Text mining is an
interdisciplinary field which draws on
information retrieval,
data mining,
machine learning,
statistics, and
computational linguistics. As most information (over 80%[
citation needed]) is currently stored as text, text mining is believed to have a high commercial potential value. Increasing interest is being paid to multilingual data mining: the ability to gain information across languages and cluster similar items from different linguistic sources according to their meaning.