The Personal Genome Project UK

How Citizen Scientists Are Building the Future of Medicine

Genomics Citizen Science Personalized Medicine

Your Genome, Our Shared Future

Imagine a world where medical treatments are tailored precisely to your genetic makeup, where diseases are predicted and prevented before symptoms even appear, and where our understanding of human biology is propelled forward not just by white-coated academics, but by everyday people.

This is the visionary future being constructed today by an ambitious endeavor known as the Personal Genome Project UK (PGP-UK).

At its heart, PGP-UK operates on a powerful principle: that the sharing of our most personal biological data—our genomes—can accelerate medical progress for everyone ¹ .

Launched in 2013 at University College London, this project is part of a global network that recognizes our collective genetic code as perhaps the most valuable resource for understanding health and disease in the 21st century ² . By blending rigorous scientific research with the participatory energy of citizen science, PGP-UK is creating an unprecedented, openly accessible resource of human multi-omics data, breaking down traditional barriers in research and inviting the public to become active partners in discovery ⁴ ⁵ .

Open Genomic Data

Creating freely accessible genetic information for researchers worldwide

Citizen Science

Engaging the public as active participants in research

Medical Advances

Accelerating the development of personalized medicine

The PGP-UK Approach: Openness, Consent, and Collaboration

The Radical Idea of Open Data

Traditional genetic research has often operated behind veils of anonymity and controlled access. PGP-UK turns this model on its head. The project advocates for making genomic data fully open and accessible to the global research community ¹ .

This approach, pioneered by the global PGP network, is grounded in the belief that scientific progress is stifled when data is siloed and restricted ² . By sharing genomic, health, and trait data without barriers, PGP-UK enables researchers worldwide to ask novel questions, validate findings across different populations, and develop tools for personalized medicine that might otherwise take decades to emerge.

The commitment to openness extends beyond just data. PGP-UK also embraces open consent—a transparent framework that ensures participants fully understand the implications of sharing their identifiable genetic information ⁵ . Unlike studies that promise confidentiality, PGP-UK addresses the possibility of re-identification directly and honestly during the enrollment process ² . Participants knowingly accept these risks for the greater good, becoming true partners in the research enterprise.

Citizens as Scientists

What truly sets PGP-UK apart is its citizen science model ⁴ . Participants are not merely subjects of study; they are actively engaged in the scientific process. They can choose to donate genomic data generated elsewhere through a novel "Genome Donation" mechanism, contribute to research design through feedback, and even help communicate findings to the public ⁵ .

This model fosters a unique research ecosystem where the traditional boundaries between researchers and the researched blur, creating a collaborative community dedicated to advancing genomic knowledge.

The project's ambassadors—individuals like Stephan, Laura, Momodou, and Colin—publicly share their experiences and data, putting human faces on the complex science of genomics ¹ . Their visible participation helps demystify genome research and inspires others to join the effort, accelerating the growth of this shared resource.

PGP-UK Data Sharing Model

Data Openness

Participant Engagement

Research Accessibility

A Landmark Pilot Study: The First Multi-Omics Decode

Designing a Comprehensive Blueprint

To validate its innovative approach, PGP-UK initiated a landmark pilot study, recruiting ten participants willing to comprehensively share their biological data ³ ⁵ . This wasn't just another genetic study—it aimed to create a detailed multi-omics reference panel, integrating different layers of biological information to provide a more complete picture of human biological function.

The study design was meticulously crafted to capture multiple dimensions of biological information simultaneously from each participant:

Whole-genome sequencing (WGS) to decode the complete DNA blueprint
Whole-genome bisulfite sequencing (WGBS) to map epigenetic modifications
RNA sequencing (RNA-seq) to capture gene expression patterns
DNA methylation profiling using 450k arrays to examine additional epigenetic markers ³

Methodological Excellence

The journey from biological sample to research-grade data followed rigorous, standardized protocols to ensure quality and reproducibility. For the whole-genome sequencing component, DNA extracted from blood samples was processed into libraries using Illumina's TruSeq Nano protocol before being sequenced on HiSeq X platforms to an average depth of 30x ³ .

The bioinformatics pipeline was equally thorough. Raw sequences were trimmed and mapped to the reference human genome (GRCh37) using the BWA-MEM algorithm. Potential PCR duplicates were flagged, and ambiguously mapped reads were filtered out. Genomic variants were identified following GATK best practices, the gold standard in the field for accurate variant calling ³ .

Perhaps most innovatively, the team implemented a genotype-based sample tracking system to prevent sample mix-ups—a critical concern when integrating multiple data types from the same individual ³ .

Pilot Study Participant Overview

Participant ID	Sample Types Collected	Data Types Generated	Self-Reported Phenotypes
uk35C650 (Stephan)	Blood, Saliva	WGS, WGBS, RNA-seq, Methylation	Available in public dataset
uk33D02F (Laura)	Blood, Saliva	WGS, WGBS, RNA-seq, Methylation	Available in public dataset
uk481F67 (Momodou)	Blood, Saliva	WGS, WGBS, RNA-seq, Methylation	Available in public dataset
uk4CA868 (Colin)	Blood, Saliva	WGS, WGBS, RNA-seq, Methylation	Available in public dataset
[6 other participants]	Blood, Saliva	WGS, WGBS, RNA-seq, Methylation	Available in public dataset

Groundbreaking Findings and Functional Variants

The analysis of this rich dataset yielded remarkable insights. Researchers identified 47 new variants predicted to affect gene function—potential additions to our understanding of genetic diversity ⁴ . Beyond mere cataloging, the project generated personalized genome and methylome reports for each participant, interpreting their genetic and epigenetic variants in the context of self-reported traits, ancestry, and environmental exposures ⁴ .

The multi-omics approach proved particularly powerful for observing biological relationships that would be invisible through genomics alone. By correlating genetic variation with epigenetic modifications and gene expression patterns, researchers could begin to understand how different layers of biological information interact to influence traits and disease susceptibility.

Category of Finding	Number Identified	Significance
Novel variants predicted to affect gene function	47	Expand understanding of functional genetic diversity
Integrated multi-omics profiles	10 participants	Enable study of interactions between genomic and epigenetic factors
Personalized genome reports	10	Demonstrate interpretation and reporting back to participants
Genetic variants linked to phenotypic traits	Multiple	Illustrate connections between genotype and observable characteristics

The data, exceeding 2 terabytes in volume, was deposited in public repositories like the European Nucleotide Archive and ArrayExpress, making it freely available to researchers worldwide ³ . To enhance accessibility, PGP-UK also collaborated with cloud platforms like Seven Bridges Genomics and Lifebit, allowing researchers to analyze the data without needing to download massive files—an innovation that dramatically lowers the barrier to working with complex genomic data.

The Scientist's Toolkit: Essential Research Reagents and Methods

The PGP-UK research relies on a sophisticated array of laboratory techniques and bioinformatics tools to transform biological samples into research-ready data. This "toolkit" represents the cutting edge of genomic technology and computational biology.

Tool/Reagent	Type	Primary Function in PGP-UK
Illumina TruSeq Nano	Library Prep Kit	Prepares DNA fragments for sequencing on Illumina platforms
Illumina HiSeq X Ten	Sequencing Platform	Performs high-throughput whole-genome sequencing
Oragene OG-500	Saliva Collection Kit	Enables self-collection of DNA samples from saliva
BWA-MEM	Bioinformatics Algorithm	Aligns sequencing reads to the human reference genome
GATK (Genome Analysis Toolkit)	Software Package	Identifies genetic variants from sequencing data
TrimGalore	Bioinformatics Tool	Quality trimming of raw sequencing data
Picard	Bioinformatics Tool	Marks duplicate reads to improve variant calling accuracy
HumanMethylation450 BeadChip	Microarray	Profiles DNA methylation patterns at CpG sites

PGP-UK Research Workflow

Sample Collection

Participants provide blood and saliva samples using standardized collection kits.

DNA/RNA Extraction

Genetic material is isolated and prepared for sequencing.

Library Preparation

DNA fragments are prepared for sequencing using specialized kits.

Sequencing

High-throughput sequencing generates raw genetic data.

Bioinformatics Analysis

Computational tools process and analyze the sequencing data.

Data Sharing

Processed data is made openly available to researchers worldwide.

Beyond the Lab: Impact and Future Directions

The implications of PGP-UK extend far beyond the laboratory. The project has developed GenoME, a free, open-source educational app that allows laypersons to explore personal genomes ⁴ . This tool exemplifies the project's commitment to democratizing genomic knowledge and engaging the public as active participants rather than passive subjects.

PGP-UK's open approach aligns with broader national initiatives to advance genomic medicine. The UK government's Genome UK strategy aims to create "the most advanced genomic healthcare system in the world," with significant investments in newborn sequencing, cancer genomics, and efforts to address health inequalities in genomic medicine . The PGP-UK model provides a valuable template for how public participation and open science can support these ambitious goals.

Future Directions

Expanding participant diversity to better represent global populations
Integrating additional data types (proteomics, metabolomics)
Developing advanced tools for data interpretation and visualization
Enhancing educational resources for public engagement
Strengthening ethical frameworks for open genomic data

As the project continues to grow, it faces ongoing challenges and opportunities—navigating the ethical complexities of open genetic data, ensuring diversity among participants, and developing new methods for interpreting the vast and complex datasets being generated. Yet, its core mission remains constant: to advance human health through open genomic data sharing, and to demonstrate that the path to scientific progress is best walked together, with researchers and citizens as equal partners in discovery.

Through its innovative blend of cutting-edge science and participatory ethics, PGP-UK offers a powerful vision for the future of medical research—one where each person's genetic information becomes part of a collective resource that benefits all of humanity, and where the boundaries between research and society are bridged in pursuit of a common good.