Communications of the
ACM
Vol. 37, No. 10 (Oct. 1994), Pages 56-73
[Index
Terms] ..... [Review]
[Full
Text in PDF Format, 6092 KB]
General Terms
DESIGN, EXPERIMENTATION, MANAGEMENT, PERFORMANCE
Categories and Subject Descriptors
| H.3.1 | Information Systems, INFORMATION STORAGE AND RETRIEVAL, Content Analysis and Indexing, Abstracting methods. | |
| H.4.2 | Information Systems, INFORMATION SYSTEMS APPLICATIONS, Types of Systems, Decision support. | |
| H.4.3 | Information Systems, INFORMATION SYSTEMS APPLICATIONS, Communications Applications, Computer conferencing, teleconferencing, and videoconferencing. | |
| H.3.1 | Information Systems, INFORMATION STORAGE AND RETRIEVAL, Content Analysis and Indexing, Linguistic processing. | |
| I.2.7 | Computing Methodologies, ARTIFICIAL INTELLIGENCE, Natural Language Processing, Text analysis. |
The authors attempt to automatically summarize an electronic brainstorming
session (EBS) through the selection of term clusters that are representative of
the important topics raised in the EBS. An EBS is structured as a collection of
textual comments made by the human participants. The procedures used are
essentially statistical, with the exception of a preliminary word exclusion and
normalization through the stop-wording of 1000 common function words and
"pure verbs" such as "calculate" (although verb forms are
not typically used in topic terms, verb elimination is not otherwise justified)
and word stemming (neither the rules for and utility of dropping 22 suffixes,
nor the comparison with other stemming algorithms, are made clear in detail).
Topic terms are selected from the remaining words and strings of up to three
words. Only "common" terms are selected, by eliminating from
consideration any term with fewer than four occurrences in the EBS corpus. The
"combining weight" of a term These results were compared against topic lists created by members of the EBS
itself as well as by independent human reviewers of the EBS corpus. Despite
inspired efforts to use ranking and core term recall and precision results, the
considerable variation among the topic lists and judgments made valid,
quantified comparisons difficult. To the extent that, in several cases, the
authors' system list rated better than the worst of the other five lists
compared, one may say that the system shows some promise. Nevertheless, despite
an interesting and valiant attempt to solve a difficult problem, the authors'
efforts may be best remembered as a useful benchmark indicating the difficulty
of the problem and suggesting that more sophisticated and interactive techniques
than simple statistical ones will be needed to do an effective job in handling
such complex linguistic communication tasks. From Computing
Reviews
R. S. Marcus