Uma metodologia para a utilização do processamento de linguagem natural na busca de informações em documentos digitais

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal do Espírito Santo

Resumo

This dissertation proposes a methodology for searches in digital texts based in the Discourse Nominal Structure from Freitas [Freitas 2005] proposed to anaphora resolution. The anaphora resolution process allows the identification of text s formation structure intended by the author. Information Retrieval (IR) presents several models to create a computational representation of text s, besides differ in aspects as text representation or methodology to search all have in common the intention to attend user information need. IR classical models, as the Vector Space Model[Salton, Wong e Yang 1975] or the Latent Semantic Indexing [Deerwester et al. 1990], consider as basic element to create text s computational representation the words presented by it. This models a query made by a set of terms T is compared with indexed documents to find documents that present these words. The predicted relevant documents set is then returned as the query s result. But, natural language texts not always had explicit references to it s main entity. Anaphoras it s a common linguistic tool used in such texts and it s use can affect classical IR models representation power. Once, that entities presented by one word can be refered by another terms or even omitted. An alternative structuralmodel[Baeza-Yates e Ribeiro-Neto 1998], witch takes into account anaphora use, to made it s computational representation of texts is the model presented by Seibel Júnior[Seibel Júnior e Freitas 2007]. In [Seibel Júnior 2007] documents are epresented by the Discourse Nominal Structure for Queries (ENDB) or Query Structure, with was created from Freitas Discourse Nominal Structure (END)[Freitas 2005, Freitas e Lopes 1995, Freitas e Lopes 1994, Freitas e Lopes 1993, Freitas 1992] witch has as objective the anaphora resolution. Once that a document had it s END representation. Seibel Junior s methodology adapts the END to a structure made to IR and the method to make searches in the structure. The Seibel Júnior methodology does not take into account any information besides the phrases focus, the main entity in the text s phrase. But, the END can provide more information than only the phrases focus. Pereira et al presented in[Pereira, Seibel Júnior e Freitas 2009] an new IR methodology based in anaphora resolution. In it s work theQuery structure construction takes all entities presented by a text s phrase. With this, it has a better qualitative performance during the searches. This works details Pereira et al s method showing the algorithms to it s definition and experimentations with the new search methodology.

Descrição

Palavras-chave

Citação

PEREIRA, Francisco Santiago do Carmo. Uma metodologia para a utilização do processamento de linguagem natural na busca de informações em documentos digitais. 2009. 109 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2009.

Avaliação

Revisão

Suplementado Por

Referenciado Por