Google has created a new search engine to help users find open datasets.
Google is well-known for being trailblazers of the technological era, consistently working to improve how the world organizes its information on the commercial web. Now, Google is aiming to do the same for the scientific community by creating a new search engine specifically designed for datasets. With their new tool, researchers are able to find the data they require more efficiently.
Launched earlier this week, Google Dataset Search is the latest addition to Google’s specialized search engine set. This new search engine is designed to make it much easier for developers and other individuals to search for files and databases that have been shared with the public.
How does Dataset Search Work?
Natasha Noy, a research scientist with Google AI, recently discussed the potential for the new platform as well as plans for its future, noting that the idea behind the Database Search is to “grant easier access to data” thereby facilitating the work of “scientists, data journalists, data geeks, or anyone else…”
The Dataset Search program goes hand in hand with Google Scholar (an academic search engine). With both programs, users are able to search for information published by the government and other institutions such as universities, using metadata tags. These metadata tags include information such as who created the document, how it was collected and who it was published by. This provides researchers and scientists with peace of mind as to where the information is coming from.
These metadata tags provide the information that will then be indexed by Google’s Dataset Search program and consolidated with input from Google’s Knowledge Graph. Although previous results typically included multiple institutions and sources, it is believed that this new search engine will bridge the gaps left behind. This new and improved data search will efficiently find data openly available for use and re-use!
Moving Forward
Noy noted that the previous dataset publication was extremely fragmented. In combining the two, search results will be more accurate and efficient. “As more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search, will continue to grow.” Additionally, Dataset Search can be used in multiple languages, with developers noting plans to add more languages in the near future.
The new Dataset Search platform searches the millions of open data repositories on the web for desired datasets. It scans publisher sites, digital libraries, author’s personal web pages, and other sources.
However, the new search engine relies on dataset publishers to correctly label their datasets with the appropriate information, or metadata tags, as they’re otherwise known. All a user needs to do is enter what they are looking for and Google will help guide them to their desired published dataset on the correlating repository provider’s site.