A free fortnightly newsletter on Taxonomy, Thesauri & Ontology and Semantic Publishing
Dataset Search—making it easier to discover datasets
Thousands of data repositories available on the web provide access to millions of datasets. Similarly, local and national governments around the world are also publishing their data. To enable scientists and journalists access this data, Google has launched the Dataset Search. According to Natasha Noy, research scientist at Google AI, this product would help anyone to find the data required for their work or to simply satisfy their intellectual curiosity.
Dataset Search works like Google Scholar. It lets you find datasets wherever they are hosted. To facilitate this process, Google has developed guidelines for dataset providers. The guidelines will help dataset providers to describe their data in a way that search engines like Google can understand the content published by them. Furthermore, these guidelines include salient information about datasets. For instance, who created them, when was it published, how was it collected, and what are the terms for using the data, etc. Google then collects and links this information, analyzes the different available sources of the same dataset, and finds relevant publications that may be describing or discussing the dataset.
Google’s approach for describing this information (schema.org) is an open standard, and any provider’s dataset can be described in this manner. Consequently, large and small dataset providers can adopt this common standard so that all datasets are part of this robust ecosystem. The variety and coverage of datasets that users will find in Dataset Search will continue to grow as more providers continue to use the schema.org standard to describe their datasets.
Click here to read the announcement.