The Need for Indexing
The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, a search engine would scan every document in the corpus, which would require considerable time and computing power.
However, to meet today’s user needs, indexing of more than just basic text is required. Users want precise search results for not only textual content, but also images, graphs, data tables, maps and other supplemental data, which requires a more robust, deep indexing solution.
Poor indexing leads to irrelevant search results and poses a real challenge to researchers. Also, incomplete indexing causes many relevant documents to be missed in a user query. Generally, relying on authors and editorial staff for indexing is not ideal due to differing styles and the amount of time taken leading to publication delays. All of this points to the need for an assisted automation solution
With over a decade of experience in offering outsourced indexing services for technical documents, Scope has created the InDEXr platform to increase retrieval precision and comprehensive recall of content, thus minimizing irrelevancy of documents in search results.
Scope’s InDEXr offers publishers and online information providers a unique opportunity to enhance the discoverability of their content by accurate and precise indexing. This is achieved by combining technology and human judgment on the same platform.. InDEXr uses a combination of proprietary software, in the form of statistical, linguistics and Natural Language Processing (NLP) rule, and subject matter experts (SMEs). The platform auto extracts key concepts/entities from the documents based on frequency, co-occurrence and location heuristics. SMEs then validate the auto-generated index terms to ensure relevancy and accuracy.
InDEXr can handle a wide range of document types - textual technical literature (such as patents, journals, articles, books and book chapters, news articles, e-learning content and product data) and non-textual content (such as images, multi-media such as audio and video).
InDEXr is capable of deep indexing, and can execute the extraction of relevant keywords from tables, figures, images, maps and other media objects in full-text documents for more relevant results.
Types of Indexing:
With the InDEXr, Scope offers various types of indexing specific to content type:
- Keyword Indexing: The proprietary platform automatically extracts keywords from input documents and ranks them. Scope’s SMEs validate the auto-extracted keywords and based on their relevancy, create the final keywords list.
- Controlled Vocabulary (CV) Indexing: InDEXr can integrate clients’ CV (such as thesaurus), extract the existing terms, and rank them for SME validation and selection of the preferred terms. In addition, InDEXr can display broader and narrower terms for a CV term. Furthermore, the SMEs can suggest new keywords which can be updated in the CV.
- Indexing of Named Entities: InDEXr can extract details such as people, places, organizations etc. for indexing.
- Geospatial Tagging: InDEXr is designed to tag places with their respective geological coordinates as a latitude/longitude.
- Image Indexing (Graphs, Tables & Photographs): InDEXr can extract keywords from the captions of images; it can also index images using controlled vocabulary. Data from the tables and graphs can be extracted to enable a technical content search.
- Indexing Multimedia Content: Scope can transcribe multimedia content, such as audio and video, into searchable text and extract metadata and index the content with the aid of InDEXr.
Features and Benefits
- Multi-format input support: InDEXr can handle multiple input formats, such as TEXT, PDF, MS Word, XML, HTML, images, audio and video.
- SMEs across domains: Indexing expertise in various domains, such as Science, Engineering, Technology, Medicine, Social Sciences, Economics, Humanities and Education.
- Enhance discoverability: Embed SEO-friendly keywords and deep indexing to enable better content discoverability
- Scalability: Ability to handle large volume in short time, thus reducing indexing bottleneck
- Customized input format: InDEXr can deliver output in client preferred formats, including TEXT, MS Word or XML.