add to favorites : reference url back to results : previous : next
 

SYNTAX-BASED CONCEPT EXTRACTION FOR QUESTION ANSWERING
Access this item.
TitleSYNTAX-BASED CONCEPT EXTRACTION FOR QUESTION ANSWERING
AuthorGlinos, Demetrios
Keywordsnatural language processing
concept extraction
question answering
knowledge acquisition
knowledge representation
concept network
AbstractQuestion answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems.
AdviserGomez, Fernando
PublisherUniversity of Central Florida
DegreePh.D.
Degree DisciplineSchool of Computer Science
Degree GrantorEngineering and Computer Science
Degree ProgramComputer Science
Graduation Date2006-05-01
TypeDoctoral dissertation
Access LevelPublic - Allow Worldwide Access
Release Date2007-01-31
RepositoryUniversity Archives
Repository CollectionElectronic Theses and Dissertations
IdentifierCFE0000985
Access Linkhttp://purl.fcla.edu/fcla/etd/CFE0000985

add to favorites : reference url back to results : previous : next
powered by CONTENTdm ® | contact us  ^ to top ^