Technology companies have been taking action to prevent and control the epidemic.
In the context of the spread of the global epidemic, on March 31, Google launched a project called “COVID-19 Public Datasets”, hosting a public data database related to the epidemic, and Open to the outside world for free, users can freely access and analyze the data information.
What is “COVID-19 Public Datasets”?
According to the official explanation of the project, the data in “COVID-19 Public Datasets” includes the JHU CSSE (Johns Hopkins University System Science and Engineering Center) data set, the World Bank’s global health data and OpenStreetMap data, all of which are Stored on Google Cloud with the “COVID-19” tag, researchers can access and query for free, and use these data directly to train advanced machine learning models internally through the BigQuery ML service (a fully managed data warehouse), Until September 15, 2020.
Google said that the launch of “COVID-19 Public Datasets” is to better provide empowerment services for workers who are “for education and research purposes”, we will not join or manage PHI (a database of pathogenic bacteria) Or PII (Personal Verification Information) data, hoping to do their best to stop the spread of the epidemic.
However, it should be noted that if users want to use “COVID-19 Public Datasets” in combination with other non-new coronavirus data sets, then BigQuery Sandbox will be calculated in bytes, and the monthly free amount is only 10GB storage space and 1TB Outside the query space, the excess will be charged according to the amount.
Under the epidemic, what does it mean to open data sets?
Statistics show that as of 2 pm yesterday, the number of confirmed infections of COVID-19 virus in the world has exceeded 720,000 and the death toll has reached 34,000.
But at the same time, researchers are facing extremely high-intensity data analysis because of the huge amount and scattered data. In addition, the incompleteness and partial disclosure of data and information have also allowed the public to understand the epidemic situation to a certain extent. Therefore, open and accessible, complete, fine-grained, timely, machine-readable, and structured data are particularly important.
Prior to this, in order to strengthen the global epidemic prevention and control linkage, multiple scientific and academic institutions jointly launched a public data set “CORD-19”, which covers nearly 30,000 new coronaviruses as of March 13 Papers, and text processing toolkit SciSpacy optimized for text, BERT model pre-trained on scientific text SciBERT, open research corpus and API, etc.
In response to Google’s “COVID-19 Public Datasets” data set, Sam Skillman, engineering director of Descartes Labs, commented, “Google’s opening of BigQuery and providing COVID-19 data will greatly promote researchers’ data analysis, especially free The launch of the query service will attract more people to participate in this project, which is very helpful for global data sharing, improving data analysis capabilities, and popularizing viral information. “