Enhancing deep text understanding using graph models, named entity recognition and word embeddings
Named entities are specific language elements that belong to predefined categories such as names, locations, organizations, chemical elements or names of space missions. They are not easy to find and classify. However, named entities are of significant help for various tasks such as improving search capabilities, relating documents with internal or with external information, and causes with effects. Hence, named entity recognition is one of the key components of information extraction (IE) and knowledge discovery (KD). In this blog post, Dr. Alessandro Negro, Chief Scientist and Dr. Vlasta Kůs, Data Scientist at GraphAware, highlight how combining graph models and named entity recognition (NER) can provide higher accuracy than the pure Stanford natural language processing (NLP) NER.
In this post, the authors demonstrate the utility of NER with a specific use case by creating their own training and testing dataset by scraping Wikipedia. The authors show how they improved the baseline quality of the classification provided by Stanford with other tools present out-of-the-box in Hume— a GraphAware product that transforms information into searchable, understandable, and actionable knowledge. They also describe the various obstacles faced while creating a proper training dataset. In addition, the authors demonstrate how they refined the named entities using word embeddings.
According to the authors, having a proper class, which defines a portion of the text (a single word or a small set of words), improves tasks related to information retrieval. Additionally, it helps discover more information from the text, extracting relationships between elements and much more.
Click here to read the informative blog post.