What is Personal Genomics?
Welcome to the openSNP-project. We’d like to use this first blogpost to give you a general introduction to the project, which data we’d like to use and what the possible benefits & use cases of openSNP may be.
Companies that perform Direct-To-Customer (DTC) genetic tests have now been around for about six years, with 23andMe – founded in 2006 – and deCODEme being two of the oldest companies that are on the market. Their customers receive a test tube via mail, spit into this tube and send it back to their DTC-company to get their genetic information analyzed. The tests that such DTC companies perform do not utilize the more famous DNA-sequencing but rely on faster and still cheaper DNA microarrays instead.
Those microarrays screen for around 1 million genetic markers, called Single Nucleotide Polymorphisms (SNPs). A SNP is a genomic variation, where a single base is changed at one site between members of a population. Usually a SNP has only two alleles (variants) and occurs with a frequency of at least 1% in the population. Spread over the whole human genome, each of us carries around 10 million variable sites, where 10% are covered by DTC-companies. Many of those markers are known to be associated with certain conditions. For example, there are variations of SNPs that are associated with elevated risks for breast cancer or Alzheimer’s. Other SNPs can be used to predict how a person metabolizes chemicals or drugs.
The Rise of Personal Genomics
The company 23andMe released an overview over their customers in June 2011. At this time they had genotyped (as the kind of testing they perform is also called) over 100.000 customers of which over 70 % were willing to allow 23andMe to use their genotyping data for research purposes and over 50 % of all customers participated in different surveys on medical conditions, drug metabolizing etc.
23andMe uses the results to perform their own genome wide association studies (GWAS). Those studies check for statistical differences between different groups. In a simple example one could have a group that is known to have Alzheimer’s and a control-group that does not have Alzheimer’s. Given enough participants, one can then look for genetical variants that are over- or underrepresented in one of the groups. The variants that are found by this method can then be used as predictors for Alzheimer’s.
23andMe published a couple of papers in 2011 that show how they use their datasets (with up to 30.000 individuals) to reproduce already known associations and find new predictors for Parkinson’s. The sheer amount of datasets they can utilize, combined with customers that are willing to take surveys on different things, from diseases to the metabolization of coffee, gives them a great opportunity to perform a lot of meaningful research. Unfortunately, this great dataset is not made available to other researchers outside of 23andMe and their collaborators.
An Open Alternative?
While there may be many valid reasons not to publish those datasets, we feel that research projects all over the world and science in general would benefit from such a rich source of linked, genetic data that is freely available. And although genome wide association studies need a minimum number of participants to be able to find significant variations, it is not necessary to have 30.000 participants in your study. There are many publications that find lots of SNPs that can be used as significant predictors for certain conditions, from obesity to asthma. And many of those only have a total number of participants of < 5000 individuals.
Lets transfer this to 23andMe: Given the total number of customers, one only needs 5 % of them to participate in freely sharing their genetic information together with basic information on some medical conditions or other variations to reach the critical mass to be able to perform simple association studies! To our knowledge there are currently a few individuals worldwide that already share their 23andMe results freely (nearly all of them without any linked data).
We set up a small survey on how many customers of 23andMe would be willing to share both kinds of data with the general public and out of 88 people that are already a customer of a DTC company 15 % already shared their information and in total 75 % would be willing to do so. Additionally, out of 72 individuals who are planning to take a DTC test in the future, 61 % would be willing to share their results and some linked data. Given those results, there should be enough customers of DTC companies that would be willing to share data, enabling genome wide association studies (granted, we got a small sample size. We will publish the whole results of the survey as soon as possible). Due to those results we started working on such an open alternative.
The Idea of openSNP
OpenSNP wants to be a repository and an open platform to collect this kind of data. The vision is to enable everybody to perform crowd-sourced association studies to create new knowledge about our genes. Additionally we would like to enable everyone to find out more about their own results.
Up to now, people that wanted to share their genotyping data had to find a solution on their own: Some put the data on their own webspace, others to GitHub, others on some FTP-servers. But not only was phenotypic data missing, there was also no way of easily finding and downloading this data. On openSNP, users and especially customers of personal genomics companies have the chance to easily upload their genotyping data and publish details on their phenotypes.
What’s in it for the Users?
(Citizen) scientists get the option to easily add new conditions and phenotypes they are interested in to find DTC-customers that are willing to answer questions on those while openSNP also allows for an easy mass download of all data or of data partitioned into groups (like A: all users that have Alzheimer’s B: all users that don’t have Alzheimer’s) so the data is already in a basic shape for a GWAS. Additionally we provide simple RSS-feeds that deliver the latest data, either all data-sets or split for the condition of interest. This should make it really easy to get data out of openSNP.
Customers of DTC tests on the other hand can also benefit from using openSNP. One of the main reasons why people would want to freely publish their genotyping data, according to our survey, is that people like to help and open up science and like to participate in an approach as crowd-sourced GWAS. This is definitely something that openSNP supports and that was one of the main reasons for building the platform.
Another big reason why people like sharing data is the hope to get some personal benefit from it, for example finding others to chat with about their personal results or finding some primary literature on their results. openSNP tries to deliver this experience as it offers to find other users that share variations and conditions, as well as comment options that enable sharing personal experiences on conditions and variations.
We also implemented the APIs of the Public Library of Science (PLoS) and Mendeley. Those are used to find the latest publications on the genetic variations that are covered by 23andMe and deCODEme. We rate those publications according to the number of readers and if they are Open Access publications or not. We also crawl the SNPedia to deliver links to the user generated content there. By combining Mendeley, the PLoS and SNPedia we can deliver lots of great, curated content on the genetic variations to the users of openSNP.
To give openSNP a try, you can simply start browsing the phenotypes as well as the genetic variations. If you are interested in additional functions (mass downloading genotyping files, creating new phenotypes or just commenting) you can now easily create an account. Please bear in mind that openSNP right now is in its early beta stage, so you might encounter bugs.
If you need some help regarding using openSNP, want to tell us of some bugs or just have some questions you can read the FAQ, comment on this post or reach us via eMail or via IRC in #openSNP @ freenode.
Bastian, Fabian, Helge and Philipp