Federated dataset specification
The specification describes how a structured study concept is converted into synthetic country-level datasets, local aggregate outputs, pooled summary tables, and R visualisation code.
The teaching workflow follows a federated principle: patient-level rows remain local, and only aggregate summaries are pooled. In this prototype, all data are synthetic and generated in the browser for education.
A federated study is not built by collecting everything first and thinking later. It is built by defining a common question, common variables, local derivation rules, local checks, allowed outputs, and pooled summaries before data extraction.
Data standard, governance & ownership
A federated Nordic dataset is, above all, a shared standard and a governance agreement — not just software. This is the part that belongs to the Nordic community.
Who owns what
The common data elements and the Nordic codebook are an NTOG asset (© Nordic Thoracic Oncology Group) — a community standard that only the registry and society network can own and maintain. The interactive builder is stateless software powered by Vahtian: it generates a specification in your browser, retains nothing, and never holds, processes, or controls registry or patient data.
This separation is deliberate. It keeps data controllership and clinical authority with NTOG and the national registries, and keeps the software a replaceable instrument.
Governance the specification must satisfy (before any real data)
- Legal basis & data protection: a GDPR legal basis per country and a Data Protection Impact Assessment (DPIA).
- Controller / processor roles: each national registry or site is the data controller for its own patients; any processor relationship is defined explicitly.
- Consent & ethics: a consent model or registry legal mandate, with ethics and Data Access Committee (DAC) approval per site.
- European Health Data Space (EHDS): align secondary-use plans with the emerging EHDS framework for cross-border health data.
Interoperability standards to align with
- OMOP CDM (OHDSI): map the codebook to the OMOP common data model for federated and distributed network analysis.
- DataSHIELD: the federated-analysis model — compute on patient data without moving it; only aggregates are shared, as this teaching prototype illustrates.
- FAIR: make the data dictionary Findable, Accessible, Interoperable, and Reusable.
- Cancer-registry standards: align variable definitions with ENCR and IACR conventions.
This prototype is for education and early design only. It does not replace ethics review, data permissions, a DPIA, statistical review, or patient-level data governance. Do not enter patient-level, identifiable, or confidential data.
1. Study concept input
Paste the Concept JSON from the Study Design & Simulation Builder. This page uses that structured input to generate a federated-style teaching project.
2. Country dataset generation assumptions
These editable teaching assumptions define denominators, centre counts, practice-pattern heterogeneity, missingness, and the random seed.
What this section controls
The synthetic countries are intentionally different. This teaches that federated studies must pre-specify local denominators, missingness patterns, coding rules, and allowed aggregate outputs before analysis.