Semantic Enrichment

Semantic metadata enables content intelligence by extracting domain-specific entities and concepts from content, and relating them in a meaningful way to identify related content and facilitate intelligent answers to user queries.

Semantic enrichment services from Scope aim to enhance content/data by adding contextual information by tagging, categorizing and/or classifying data in relationship to each other. Our semantic enrichment services enable users to find more relevant information, receive deep insight and provide decision-making support.

With nearly 15 years of association with several leading publishers, online digital libraries and information providers, Scope has recognized the need for semantic enrichment of content. Scope uses a combination of ML and NLP based algorithms, ontologies and subject matter experts (SMEs) to deliver semantic enrichment services. We extract concepts from content using advanced AI based algorithms and ontologies to identify relationships across concepts and express them in the form of Triples (Subject-Predicate-Object). Further curation by Subject Matter Experts help improve the contextual accuracy of such relationships. This iterative process also helps enhance machine learning and improve the level of accuracy in our automatic text mining solutions. Scope’s proprietary semantic workflow platform employs sentence extraction, text classification, parts of speech (POS) tagging, controlled vocabulary (CV) based semantic tagging for entity and concept extraction and generates relationships among concepts using ontologies.

Scope’s Semantic Enrichment architecture consists of the following components:

Domain Knowledge:

Scope focuses on using domain-specific controlled vocabularies (CVs) such as taxonomies, thesauri and ontologies for semantic enrichment. This is further complemented by the domain knowledge of subject matter experts from the industry and academia. Based on client’s requirements, Scope can either process existing CVs or build new CVs by extracting keywords from source documents and classifying them into hierarchical structures. CVs can also be built by adopting an open source CV and further updating it with the keywords extracted from the source documents.

Semantic Annotation:

Semantic Annotation involves the extracting and tagging of Named Entities and Concepts from source documents using ML, NLP and statistical algorithms and also based on the CVs. The relationships between concepts are built using Ontologies and NLP techniques. Triplets (Subject/Predicate/Object) extracted from each document are stored into a RDF store, which is referred during the annotation process to extract similar terms and relationships when further documents are passed through Semantic Annotation platform.

INTEGRATION:

Scope team can supply back the output in industry-acceptable standards such as RDF XML, RDF store (N3 format), SKOS and OWL formats. This also facilitates seamless integration with the content management systems of the clients.

KEY FEATURES

Entity Extraction: Named entities   as   well as domainspecific concepts    are   automatically extracted   using Scope’s proprietary AI driven indexing algorithm and controlled vocabularies.SMEs develop the knowledge framework, which is integrated into our semantic enrichment workflow platform for entity and concept extraction

Semantic Tagging: Tagging XML documents with conceptual tags using domain specific mark up languages, such as CML (Chemical Markup Language), MatML (Material Markup Language), GML (Geographic Markup Language), etc.

Semantic Annotation:  Annotating the texts in the documents based on concepts in the ontologies, allowing relevant content to be linked.

Semantic Indexing: Indexing documents with domain-specific concepts using controlled vocabularies such as taxonomy and thesauri to provide the domain context to the terms.

Semantic Linking: Identifying the semantic relationships across concepts and using such relationships to link content within the legacy content of the publishers and also from external open access databases. The relationships among concepts are extracted using NLP techniques or Ontologies, and delivered as Triple Stores.

Semantic Authoring: Authoring abstracts that could extract and explain concepts described in a document, and delivering such abstracts in a structured format with pre-defined concept labels

Scope’s Semantic Processing – Value Chain

process_taxonomy