Data Policy

Scientists will be able to study events such as tropical storm Karl, which developed in the Atlantic in September 2016, using the OpenIFShome project. (Image: NASA Visible Earth, LANCE/EOSDIS Rapid Response team)

Data Policy

Data produced by the climateprediction.net (CPDN for short) client and model on the individual participants’ computers is the property of the computer’s owner. By running the software each participant agrees to return a subset of their simulations’ results, unaltered, to the CPDN upload servers and to acknowledge the CPDN project in any publication based thereon. As a matter of scientific etiquette, we would expect anyone wishing to publish results based on their own simulations (other than simply posting figures on the Web) to propose a diagnostic sub-project to the CPDN core team (represented by the University of Oxford on behalf of the participating institutions listed on the project web site), which would give them access to a much wider range of simulations.

In the future it is hoped to develop additional software which will enable the project to undertake further analysis of the data held on the distributed personal computers, after a simulation has been completed. Such additional software will only be installed in any future software upgrades on a PC with the approval of the computer’s owner.

Data stored on the CPDN servers:

Data stored on any server associated with the CPDN project is the property of that server’s owner. By agreeing to participate in the project, server owners agree to provide access to that data in accordance with the following conditions as their resources allow.

Access to summary data on the CPDN central servers:

Summary data derived from all successfully uploaded simulations will be made publicly available on the CPDN web site. Summary results for individuals’ own simulations will also be provided on the web site for comparison with the whole dataset. We hope to allow team members to view summary statistics of their team’s simulations if resources permit.

Access to data on an individual upload server:

Organisations hosting an upload server agree to provide the project core team with access to that data in full. They are requested not to use the upload server itself for any data analysis in order to keep the load on these servers to a minimum as distributed analysis software is developed and implemented. Instead, datasets should be copied and analysed on alternative machines.

Access to the complete dataset held on upload servers:

When software and computing resources are available, data collected by CPDN and held on upload servers will be made available to the climate research community.

Following the precedent of other modelling projects (e.g. the Coupled Model Intercomparison Project (CMIP)) access to the complete dataset for each set of experiment will initially be restricted to authorised diagnostic sub-projects. This is necessary to avoid potential duplication of research and overload of the distributed analysis computing resources.

Initially priority will be given to diagnostic sub-projects proposed by organisations hosting CPDN upload servers and involving a collaboration with at least one member of the core team.

No data other than the summary statistics mentioned in (2) will be released to organisations external to the core team until suitable software has been developed and hardware put in place for distributed access and analysis (and approved by the owners of upload servers). Given the unpredictable scale of the dataset, we cannot provide a specific time-frame for this release.

Data held on upload servers will be made available without restriction when resources are available to do so without hampering the activities of the authorised diagnostic sub-projects.

For each new experiment, and each individual simulation, there will be a delay in the release of collected data due to the need to carry out fundamental analysis to verify the reliability of those data.

The datasets collected from the distributed PCs are limited by upload bandwidth. In order to include participants with modem connections the amount of data sent to upload servers at the end of each simulation has been limited to O(5MB), although O(500MB) is produced on each distributed PC for each simulation. These figures will change as the project progresses and new experiments are released. However, in terms of conventional climate modelling, the data collected from each run will remain very small. It is therefore necessary to include as much analysis as possible in the distributed clients. We therefore strongly encourage those wishing to make use of CPDN in the future to contribute effort to the preparation of future experiments in collaboration with the core team. In this way the data analysis they wish to have undertaken can be carried out on the larger, massively distributed datasets using distributed computing software.