Health informatics relies, in part, upon computational simulation, modelling and data mining methods. These, in turn, rely upon information from multiple sources, which is currently organised without reference to universal standards of terminology, language or schema. A data warehouse is a repository for securely storing and maintaining data from diverse sources integrated semantically to enable reporting and analysis. It is not typically intended to be used as a live data store. Live data stores often purge old data which may be useful for analysis, and they need to support operational systems which have different quality of service needs.
We are presented with the challenge of taking heterogeneous data from multiple sources, and presenting it in a consistent and standardised manner to predictive computational tools. The ampoule-pi data warehouse is an open source/open standards based tool designed to do this. The warehouse is agnostic with respect to the types of data to be stored
Federation: Multiple data warehouses can be linked into a federated network. Queries to a particular warehouse can be delegated to others, and results collated and returned, or structured data can be replicated in multiple data stores for faster query response.
Tools: Data mining and modelling tools benefit from a consistent data model which is centrally maintained, curated and is secure. They can also use the data warehouse to store intermediate and output data.
The warehouse is structured around four layers:
- The human web interface enables browsing by modellers and data curation, and RESTful programmatic interfaces are exposed for data access by domain-specific tools.
- Standard data access functions are provided. Query of structured data is via SPARQL. Access to files is via standard protocols, and image access is via the DICOM protocol enabling HIS integration.
- The integration layer ensures data linkage, quality and auditability. Logging keeps track of access and edits, History keeps historical values, Curation provides data annotation, and Provenance keeps track of sources for data. Semantic integration is provided via additional tools.
- Storage of the underlying data can be distributed and provided by arbitrary suppliers, via local or cloud based storage resources. Storage is based on exiting file, image and triplestore (structured data) servers which are integrated with authentication and authorisation services.
Ampoule-pi is designed to sit at the heart of a distributed health informatics infrastructure