The adoption of a VAVP in PHS for IPC could be a sound solution to support the use of dynamic and large data sets from heterogeneous systems in surveillance, conduct comprehensive data analysis, produce effective visualisations to stimulate data exploration and better understanding and improve information dissemination. A VAVP should comprise of five major functional components (figure 2).

VAVP high-level system architecture and functional components

The source systems/data sources component documents specifications about data sources needed. PHS for IPC may involve data collected from diverse sources, such as death certificates, medical examiner and coroner records, child fatality reviews, hospital inpatient records, trauma registries, emergency department records and emergency medical services information. Information on risk factors can be collected through recurring population-based surveys. Police reports can be used for information on traffic crashes, violence and firearm events. Surveillance data for specific injury causes are available from fire departments, poison control centres and crime laboratory reports.1 This component should also involve a data source and metadata repository and online searchable tools to facilitate content discovery by users. The Global Health Data Exchange is an example of such repository and tool.11 The main challenge in this component is to identify external data sources not publicly available for which interorganisational agreements for creating data accessibility and exchange procedures should be established.

Data preparation and integration can provide a database work area; instantiated data structures, methods and functions for connecting to data sources, cleansing, transforming, validating and integrating data and loading of tidy data sets to the platform data warehouse. Data preparation consumes the most time in the analytic process mainly because data are often collected by heterogeneous source systems having different structures, formats, models, standards and nomenclature. The inability to readily access, transform and integrate data can cause inefficiency and functional disruption. Creating an infrastructure that permits the free flow of data from source systems to the platform data repository, combined with self-service, easy-to-use and visual data preparation software are potential solutions to this challenge. This solution empowers data analysts and allows for an agile development of intuitive workflows where no programming code is needed. It also allows team collaboration, reusage of procedures and workflows, scheduling, automation and making more efficient data preparation processes (figure 3). This component is in between source systems and data storage components.

Figure 3 Data preparation workflow. This workflow illustrates data preparation procedures to extract, transform and integrate demographic data from World Population Prospects, UN Population Division and the International Database (IDB), US Census Bureau into the data storage and management component.

Data storage and management involves a database management system (DBMS) with methods, procedures and tools for data storage, data querying and database managements. It includes a data repository or data warehouse, where data are organised, stored and made available for direct access by analysts and analytic applications. Data tables must be modelled into a star-schema based on dimensional design concepts and instantiated in tidy data views ready for analysis, reporting and visualisation.12 Relational DBMS with capability for clustered-columnar indexation and in-memory models are recommended for improved performance when dealing with large databases. The fast data growth rate and the need of a high level of data querying performance could be potential challenges for this component.

The data exploration and visual analytic component provides the tools and methods for data exploration and analysis; the design of data visualisations, dashboards and reports; collaboration among team members; and content sharing. It also includes a repository of data visualisations and services for data visualisation management.

Based on a self-service and collaborative approach, surveillance practitioners and data analysts are able to use visual analytics tools to connect to large data sets from the data storage component and/or other sources, conduct data exploration and analysis and create interactive data visualisations. As a final step, data visualisations are published to a repository that is part of this component. Published data visualisations can keep live connections to databases, which allow interactive data visualisations to show the most updated data without needing to edit and republish visualisations. Other surveillance users (eg, subject-matter experts, managers, executives) are also able to access and interact with data visualisations from the visualisation repository.

The web-based applications and services for data dissemination component refers to the set of applications for disseminating data and information online. Web applications typically can be implemented using a content management system combined with web-development frameworks and programming languages. Interoperability mechanisms allow sharing interactive data visualisations with end-users within a specific thematic context.

A PHS for IPC website can be enriched by featuring interactive data visualisations together with descriptive reports, data and policy briefs and other information products. Intranet web-based applications can be used for sharing internal data visualisations with sensitive and/or critical information, while internet web-based applications can be used for facilitating data access and dissemination to external users and the public.