A free fortnightly newsletter on Taxonomy, Thesauri & Ontology and Semantic Publishing
New Patent reveals Google is aiming for a more self-sufficient knowledge graph
When Google introduced the knowledge graph in 2012, it marked a shift. It began offering search results that were a semantically structured body of knowledge about an entity. However, the results appeared to be mechanically redacted versions missing the crucial nuances and even distorting the facts.
Ever since the early days of the knowledge graph, Google has been working to improve and expand its ambitions. This fact is evident from Bill Slawski’s—editor of SEO by the Sea, a prominent search engine optimization blog recent analysis of a patent issued to Google in February of 2019. In the patent, Google has detailed its methodology behind extracting and classifying information about entities. The patent describes Google’s attempt to record in the obvious types of entity classifications and the superclasses to which those classes belong in a database. Equally, it describes the attempts to capture the subclasses such as lifespan, marital status, or important works.
Furthermore, the ambitious aim outlined in the patent is to provide information about the relationships between each node of meaning. In Slawski’s analysis of the patent, the exact scope of the knowledge graph comes into view and that its scope is humungous— map everything about an entity. To establish this, Google has to classify and cross-reference information as a native, self-sustaining activity on web pages themselves instead of relying on secondary sources. This is what makes the patent filing a little different from the evidence of the knowledge graph seen earlier.
When examined, it appears that Google in collaboration with other search engines had already created the Schema markup to cover this ground. Google, however, requires that human beings apply semantic tags to text on web pages, and because of this requirement Schema markup never scaled. To explain, in the context of the knowledge graph, Schema will never serve the ultimate purpose of the knowledge graph. For a vast majority of content, classifications cannot be captured with prescriptive markup. Hence, Google needs to train its technology to impose semantic structure, similar to how human beings do, on raw unstructured text, to achieve its intent behind introducing the knowledge graph.
In the patent, there is information on how the knowledge graph would be able to apply what it knows to any new content it encounters. Typically, the knowledge graph will learn about new entities by inference, setting aside what it already knows to isolate what it does not. If the process is successful, it could eventually run itself making the knowledge graph effective at learning new things by learning about the more new things. However, the impact of this shift in the actual search results is still to be noticed.
While this more ambitious way of surfacing information about entities is not yet a standard, Google’s new interface for hotels represents a real-world example. The Google Hotels interface contains structured information about each listed hotel culled from a variety of internal and third party sources.
Machine learning appears to perform a significant role in this interface. In one instance, machine learning is used for analyzing the historical trend of hotel prices thereby helping Google highlight deals that are lower than the usual price. Subsequently, the recommendations are adjusted according to the new information on hotel prices ingested by it. Additionally, the process by which the interface curates content like photos and reviews appear similar to the process of linking data to entities. In this instance, machine learning may have a role in constructing a dynamic user interface that presents the most engaging, useful, or popular reviews and photos from various sources.
Significantly, Google’s Hotels interface considers the hotel property as the major entity, with other entities like contact information, hotel class ratings, feedback from hoteliers, etc., linking to it. It represents an expanded dataset with a more complex mix of sources. In addition, it is a step away from its initial dependence on a pre-existing body of knowledge, and a step toward a self-sufficient population of the knowledge graph.