English Abstraction and Indexing Solution for Non-English Literature

About the Solution

Scope has been offering an array of services for non-English language documents, such as abstracting, indexing, semantic tagging, etc., which has accounted for 30% of the total documents processed in 2010. Scope has developed a unique translation-enabled content enhancement service that provides English language abstracting and indexing for non-English research literature to dramatically increase its potential discoverability in the sea of Web content.

Scope has created English abstracts and indices for about 2.6 million documents including patents, book chapters, clinical summaries, standards and other technical literature in European (including German and French) and Asian (including Chinese, Japanese and Korean) languages. Leveraging the vast experience and expertise gained, Scope has evolved a hybrid abstraction and indexing solution for multilingual documents for a wide spectrum of non-English documents. The solution uses a rational combination of technology-enabled automation processes including statistical methods and Natural Language Processing (NLP) rules and subject matter expert (SME) curation to achieve high-quality abstracts that represent the core themes of the non-English document.

Scope first translates the non-English document to English using a robust translation architecture equipped with language and domain dictionaries, glossaries and ontologies. The translation is further curated by SMEs who possess native speaker level skills in the respective languages of the documents processed, in addition to domain knowledge in a range of subjects such as applied and physical sciences (Nuclear Physics, Material Science, Waste Water Treatment etc), chemical engineering, biochemistry, engineering, etc. are used for curating the automatically translated output and subsequent abstraction.

Features and Benefits

Scope has designed its English Abstraction and Indexing Solution for Non-English Literature with various performance-oriented and result-yielding features to provide clients with manifold benefits, a few of which are discussed below.

Elimination of time consuming pre-processing tasksby supporting multi-format inputs

Creation of keyword-rich abstracts using a semi-automatic abstraction module, which also facilitates SME validation for content accuracy

On-demand access to resources and experts to handle STM documents of varied types, including structured documents (patents, standards, and clinical reviews) and unstructured documents (book chapters, journal articles and conference proceedings)

The seamless integration of Scope's assisted automation platform for abstraction with its translation architecture significantly reduces the time spent in translation, abstract creation, and indexing thereby enabling immense cost savings to clients. To ensure high quality of the abstracts and indexed keywords, Scope employs time-tested statistical and linguistic rules-based algorithms in its modules. Further, Scope engages domain experts to frame rules for the selection of keywords that represent the key concepts of the document processed, and has integrated dictionaries and contextual lookup features for multilingual documents in its solution to guarantee high-quality deliverables.