In 2011, someone asked me how many genotypes I personally would expect users to upload, and if I remember correctly, I said 30. That was quite the understatement: Just a few days ago on the 30th of May, openSNP received its 1000th genotyping!
On this happy occasion we thank the users and participants for their trust in the project and their continued support and interest.
Since 2011, people have used openSNP in research, art, and their own projects, have written additional software to interact with openSNP’s API, written great comments, and much more. We have published a paper on openSNP a few months ago, which is for most of us the first publication in our careers.
openSNP has come a long way, here are the first three commits from June 2011, with Basti’s oldest on the bottom:
That was nearly exactly 3 years ago! Since then we’ve (among other things) learned to write proper commit messages.
So what does the future hold for openSNP?
- A better server: We recently received a grant from Bayer HealthCare so we can move to bigger servers, so that the site should load and react much faster. Maybe we can even start hosting bigger datasets?
- Pre-given phenotypings: One of the biggest problems with openSNP’s data is the high amount of variation in phenotypes entered by users: researchers who want to work with the data still have some manual cleaning to do. We’ve prepared a set of phenotypes for which users can only choose their variation; this should greatly improve the speed with which researchers can start working on the data.
- Faster parsing: we’ve replaced the Ruby-based genotyping parser by a 99% complete implementation written in Go. So far, it’s much much faster, but still only marginally tested.
- A variety of smaller things: A stats-page, bug-fixes, genosets, etc. – have a look at the Issues page here.
We thank you for your continued support and interest, and here’s to many more years! If you know of any other project that uses openSNP data, feel free to post it in the comments!
Congratulations, thanks for all the hard work – and for rounding up these projects, hadn’t heard of most of them, very cool.
I hope the ‘pre-given’ phenotypes will be in addition too, rather than replacing the ‘Lanierian’ freedom we have now though, it’s one of the things that excites me about the platform.
Be interesting to see how long until the next 1000…
In order to make any sense of it, you definitely should work on your think about a more structured questionaire à la 23andme for the environment effects. Also think about pre-existing answers-tags like on askubuntu.com, which only can be created by older users. Without consistency this projects is no help.
Hi Anastasius,
we got such a system in preparation, have a look here: https://github.com/gedankenstuecke/snpr/blob/curated_phenotypes/lib/tasks/add_curated_phenotypes.rake
We’ve had some problems with the actual implementation, but it should be working ‘soon’. I agree that in the current form, working with the given phenotypes still requires a lot of manual cleanup before you can do actual work.
Hi phi1ipp,
thanks for getting back to me. Unfortunately, I only know some basic python.
I’m torn. I’m interested to help your guys with disease-research. But most phenotypes seem to show correlations to physical appearance as it stands. In so far the risks (privacy) outweigh the benefits for me at this stage to upload my raw files.
I can fully understand that – there are many, many unknowns involved in this project.
As for the phenotypes: Some interesting disease related SNPs have popped up recently, here’s one interesting example for cancer-related microRNA SNPs:
http://www.biomedcentral.com/1471-2164/15/669/abstract
Are there definitive cancer markers that can be found on the level of a 23andme-dna-file?
Yes!
SNPedia has an overview of the SNPs linked to mutations in the BRCA1/BRCA2 genes, which are linked to breast cancer. http://www.snpedia.com/index.php/BRCA1
You may remember when Angelina Jolie was in the news for having a dobule mastectomy, she had family history and she carries some risk alleles: http://www.nytimes.com/2013/05/14/opinion/my-medical-choice.html?_r=0
At least some of these SNPs are on the 23andme array:
https://opensnp.org/snps/rs28897696 – all users carrying the “no risk” allele
https://opensnp.org/snps/rs55770810 – all carrying the “no risk” allele
https://opensnp.org/snps/rs1799950 – a few carrying the “higher risk” allele
and many more.
Interestingly, all of these are on the opposite strand compared to the “known” risk alleles. That’s just from the SNP array, we can’t fix that.
But if I understand correctly, that’s still just a correlation to a tendency/risk to get a certain type of cancer the average population has. Is there anyway to tell that a certain type of cancer already is present in your dna with high certainty?
Aah I see – as far as I know, to find out whether you are actually carrying cancer right now you’d have to, by chance, sequence an actual cancer cell (as in for example http://www.ncbi.nlm.nih.gov/pubmed/17418407 )
That’s very hard to do, especially since the usual genotyping sets like 23andme just use oral swabs, and the chances to just randomly pick up a few cancer cells there are slim. Furthermore, there are always cancer cells present in your body, they’re just being destroyed by your body. Any sequencing set might pick up these cells and you’ll get false positive signals..
I see. I guess I have to attend this genetics intro course on Coursera. Hence this whole phenotype correlations are nothing more than a horoscope, if the standard deviation is not extraordinarily high.
[…] What if such data were opened up? Everyone, has access to data contributed to the openSNP genome sharing platform. The catch with openSNP is that the number of people willing to make their DNA public is still small, but growing. […]