Computational Science Technical Note CSTN-008


Scientific Data Management in a Grid Environment

H. A. James and K. A. Hawick

8 July 2004.


Managing scientific data is by no means a trivial task even in a single site environment with a small number of researchers involved. We discuss some issues concerned with posing well-specified experiments in terms of parameters or instrument settings and the metadata framework that arises from doing so. We are particularly interested in parallel computer simulation experiments, where very large quantities of warehouse-able data are involved. We consider databases and other framework technologies for manipulating experimental data and controlling the outputs from parallel runs arising from large cross products of parameter combinations. More complex issues arise when experimental data must be managed in a distributed grid context and multi-site users are sharing or exchanging simulation outputs. Considerable extra value can be obtained from simulation output that can subsequently be data-mined.

Keywords: distributed computing; cluster computing; metadata; data mining; data management.

Full Document Text: PDF version.

Citation Information: BiBTeX database for CSTN Notes.

BiBTeX reference:

  author = {H. A. James and K. A. Hawick},
  title = {Scientific Data Management in a Grid Environment},
  journal = {Journal of Grid Computing},
  year = {2005},
  volume = {3},
  pages = {39-51},
  number = {1-2},
  month = {September},
  note = {ISSN: 1570-7873 (Paper) 1572-9814 (Online)},
  doi = {10.1007/s10723-005-5464-y},
  series = {CSTN-008}

[ CSTN Index | CSTN BiBTeX ]