Google Summer of Code Wrap-Up

The Google Summer of Code (GSoC) 2016 is drawing to a close so we thought it best to summarise our experience with GSoC from the mentors’ perspective. We originally came to GSoC through the Mozilla Science Lab. We’ve been having regular calls with Abby Cabunoc from Mozilla, who’s been kind to support openSNP‘s efforts in trying to make open science more accessible. Together we quickly identified a need for more volunteers and people working on the project. From there, we got to the Open Bioinformatics Foundation (OBF), which has been participating in GSoC for years. The OBF kindly agreed to take openSNP under its umbrella, all we needed to do is come up with a few projects, which didn’t take long with the help of Abby (thanks again!!).
We came up with 3 projects that we wanted to tackle for quite a while: 
  1. Overhauling our frontend, which has been stuck in the non-responsive dark ages of Twitter Bootstrap basically since the start of openSNP in 2011. Something that’s been rather annoying to many of us over the years.
  2. Connecting the Phenotype and SNP information. We do have annotations for genetic variants that basically already tie these genetic variants to the phenotypic data we’re storing. But so far there’s no easy way to go from one to the other. Which limits the use of openSNP for many use cases.
  3. Getting more Quantified Self data from Fitness-Tracking devices into openSNP. We’ve been offering support for Fitbit for a while, but nothing for other devices. We wanted to change this and add some more support in general.
For project #1, the frontend work, we could win Mateus, who is largely teaching himself how to work with CSS, JS and all that weird stuff. He’s from Brazil but recently moved to Toronto. See his final blog post on what he achieved.
For project #2 (phenotype <-> SNP link) we could win Vivek who is an Biotechnology student at IIT Kharagpur (India). See his final blog post here
For project #3 (more Quantified Self data) we could win Graham, who is an Computer-Science student at the University of Illinois. You can see his final blog post here.
We handled the communication by having a bi-weekly call with whoever had time to show up, which isn’t easy when you have one mentor in Germany, one in Australia, one student in the US, one in Canada, and one in India. It was rare that all students could make time for the call, but the majority of communication was still handled via openSNP’s gitter channel, or via email. Surprisingly, the gitter channel is also where students started to help each other if mentors were still asleep in their respective time-zones. Towards the end of GSoC we increased the frequency of group calls to make sure that all students had support with their deliverables (blog posts similar to this one, code excerpts, etc.). It’s a bit daunting for a first-time GSoC project to accept three students, but we’ve been lucky with three students who were all very well to work on their own, and who always stayed on our radars and checked in with us to share their progress. We’ve also been lucky to have Abby help us with making sure that the projects are well-defined.
Most of the students’ projects are not 100% finished but that’s not the end goal here, what we want is more people to join us on this journey of open science, not to give us finished products and then disappear. We’re confident that the students will keep on collaborating, as we believe that participating in such a small, easy-to-change open source project is a great source of programming experience for novel programmers. If all goes well, openSNP will probably participate as mentors again next year. We’re still unsure how many students we’ll take next year, we were probably very lucky this year. It will depend on how many actual problems we’ll have then. For starters, the gender ratio of people involved in openSNP is still predominantly male, and which open source project is ever truly finished?
If you’re now interested in joining openSNP you can find all of us hanging out in our Gitter channel at, where you’re most likely to find one of us being awake and ready for a chat. We also want to hold regular office hours via Google Hangouts once per month in the future, stay tuned for that.
Thanks to the OBF for making us part of this great experience and also to our three awesome students! Again, you can take a look at the final blog posts from the participating students here in order of the projects we proposed:

Our 2015-End-Of-The-Year Newsletter

It’s been a while since we sent out a regular newsletter with the latest updates on what’s happening behind the scenes. Which, it turns out, has been quite a bit in 2015:

Financial support
You can now support openSNP on Patreon and pledge a small monthly amount to help us out in running the service.

Even if you just chip in a single dollar per month, this would already help us a lot. Some of you already do, thanks for that! To give this move some context: Since 2011 we – Helge, Philipp and Bastian – have been running openSNP largely from the money we earn in our dayjobs. We do this because we believe in running a community project around personal genomics.

But with the growth of openSNP we had to move the project to a higher number of larger servers, which means that we have to spend more money as well. This is why we’re now also going for the last step, marrying open source and open science with crowdfunding. Nearly 20 patrons are already contributing, including one of our larger contributors, Seven Bridges.

Hardware updates
So far everything needed to run openSNP was living on a single machine with slow spinning disks — this became more and more unsatisfactory the larger the database grew. We compartmentalized the architecture and moved the different parts to their own virtual servers.

The fact that the new servers are backed by SSD storage brought the most significant improvement in terms of performance, specifically for the database. The background jobs got their own machine as well, so they don’t interfere with other applications by eating up all the resources.

In order to be more flexible in the future, the Rails app is now running within Docker containers, so they can be moved around without as much effort in the future, if needed. If you are interested in joining us in running openSNP: Feel free to get in touch through GitHub or via

Phenotype Request: SAT scores
The genetic basis for intelligence has been a hot topic for ages by now. For a primer on the topic you can watch this video. While visiting China and Hong Kong, we met up with Laurent Tellier, who is working at the BGI in Shenzhen. His research is into the genetic basis of intelligence and uses SAT scores as a quantitative proxy. So if you’ve taken the SATs and are willing to share, please help him by entering your variation for those phenotypes:

Phenotype 1
Phenotype 2
Phenotype 3
Phenotype 4

With this we wish all of you happy holidays and some fun science in 2016!

Philipp, Bastian, Lore and Helge for the openSNP team

Support openSNP on Patreon

tl;dr: As openSNP has grown so much we will now really have to upgrade our servers – they’re straining right now already. If you want to chip in, you can support us with a small monthly contribution on Patreon!

Since our last post in April we’ve grown quite a bit, right now we’re hosting ~2100 genetic data sets. Those are largely 23andMe/Ancestry/FamilyTreeDNA genotypings with a few exome data sets. In total this translates to around 1.5 billion genetic variants stored! Add to this all the phenotypes stored in openSNP and we are talking a substantial amount of data. While we are very happy with the growth of openSNP, but at the same time this means that our small server, which is running all of the project, is hitting its limits right now. You will have noticed that using the website can be tedious at times, to the point of your connection even timing out. This is largely because the web server is running on the same machine as the backend, and processing new uploads and maintaining the database backend consumes lots of resources.

To get around those limitations we will have to upgrade our technical infrastructure. Unfortunately doing so will not come for free. So far we have been running openSNP mainly on our own money that we make on our day jobs. This means we are somewhat limited in how much we can shell out to in order to keep openSNP running. As the huge amount of data in openSNP already shows the power of crowdsourcing we are now reaching out to you!

Using Patreon you can support us with as little as $1 per month. We estimate that with around $100 to $150 per month we could substantially improve the speed of openSNP, switching to servers with more RAM, more disk space and faster CPUs. Thanks a lot to those who have become our patrons already. And as usual: Feel free to get in touch any time you have further questions regarding the project at!

We would like to again thank all volunteers who’ve made this possible. You rock!

#thedress, ethics for participant-lead research, research hack days

It’s been a little less than a year since we reached out to you. Right now our ± yearly newsletter is going out and there’s some things that might interest you as well: Since the last newsletter openSNP has grown in size quite drastically. Our database now holding over 1700 data sets! Thanks to everyone who made this possible! And today we have a couple of topics to present:


You might have followed the media buzz around #TheDress, the viral phenomenon that started around the end of February. It lead to one of the new big questions of our time: Is the dress black & blue or is it white & gold? Lots of scientific explanations have been put forward and of course, genetics has also been thrown in the mix in order to explain why people so vehemently disagree about it.  

23andMe recently started to investigate this question amongst their user base and thanks to Michael Whitehead you can now also contribute on openSNP. So if you want to participate in research about the genetics of #TheDress you can enter this phenotype.

Mike briefly spoke to us about his motivation to dive into the topic:

“We know that there are numerous elements of colour vision that are influenced by genetics. Red/green colour blindness is the most famous, but there are a variety of heritable colour-blindness phenotypes, as well as tetrachromats who can see a richer pallette of colours than the rest of us. It is therefore conceivable that variation in our perception of the dress is influenced by underlying variation in our genes.”

23andMe already has some first results in, but having an open data source for this would still be fun. So please, help us out. So far it looks like team blue/black is taking the lead.


The people running are organising research hack days on June 5th & 6th at the ETH in Zurich and at the FORS in Lausanne. Participating in those days is free and genetic data will also play a role, with our own Bastian attending in Zurich. You can find more information on this on their websites.


Performing open, participant-lead research is not only exciting for the science that is done, but also for a whole lot of ethical questions it brings up. The workshop “Towards developing best practice for ethical participant-led health research” asked what a new social contract for such kind of research should include. The results of this workshops are now published as an open access article in the Journal of Medical Ethics.


Helge Rausch has kindly updated openSNP’s code base to Ruby on Rails 4.2, so you might experience some problems on the site, as this update was just done. As usual, if you encounter any problems, please send us a mail to or just reply to this mail.


1000 genotypings, and 3 years of openSNP!

In 2011, someone asked me how many genotypes I personally would expect users to upload, and if I remember correctly, I said 30. That was quite the understatement: Just a few days ago on the 30th of May, openSNP received its 1000th genotyping!

On this happy occasion we thank the users and participants for their trust in the project and their continued support and interest.

Since 2011, people have used openSNP in research, art, and their own projects, have written additional software to interact with openSNP’s API, written great comments, and much more. We have published a paper on openSNP a few months ago, which is for most of us the first publication in our careers.

openSNP has come a long way, here are the first three commits from June 2011, with Basti’s oldest on the bottom:


That was nearly exactly 3 years ago! Since then we’ve (among other things) learned to write proper commit messages.

So what does the future hold for openSNP?

  1. A better server: We recently received a grant from Bayer HealthCare so we can move to bigger servers, so that the site should load and react much faster. Maybe we can even start hosting bigger datasets?
  2. Pre-given phenotypings: One of the biggest problems with openSNP’s data is the high amount of variation in phenotypes entered by users: researchers who want to work with the data still have some manual cleaning to do. We’ve prepared a set of phenotypes for which users can only choose their variation; this should greatly improve the speed with which researchers can start working on the data.
  3. Faster parsing: we’ve replaced the Ruby-based genotyping parser by a 99% complete implementation written in Go. So far, it’s much much faster, but still only marginally tested.
  4. A variety of smaller things: A stats-page, bug-fixes, genosets, etc. – have a look at the Issues page here.

We thank you for your continued support and interest, and here’s to many more years! If you know of any other project that uses openSNP data, feel free to post it in the comments!

Crowdfunding DTC Microbiomics & Proteomics

We hope that you all had nice holidays and made it safely into the new year. You may already be tired of all the donation campaigns which are frequent in this season and don’t worry, we won’t ask you for money for openSNP. We do however have some ideas on what you could spent your Christmas bonus: Crowdfunding startups and products became en vogue in 2012 and the whole Quantified Self and Personal Genomics movement fortunately has jumped on the bandwagon as well. Right now there are some projects looking for funding on Indiegogo. For example uBiome, which wants to provide insights into the bacteria living inside you, and Talking20, which wants to do for your proteins what personal genotyping companies did for your genome.
We feel that those projects are picking up where the genotyping and genome/exome sequencing efforts of other companies are currently ending and thus might be of interest for you as well (I mean: Who wouldn’t like to get his microbiome sequenced and have regular metabolite tests? Right?). And of course, we hope that we can include sharing capabilities for willing participants of those projects in future versions of openSNP. We have reached out both projects to get some more information about their ideas and future plans of possible collaboriations. Before we let the projects speak for themselves just a standard disclaimer: We’re not involved with any of those projects and are not making any money out of this. We just like the idea of people having more biological data on themselves.


Hello, openSNP Community! I’m a PhD student at Oxford, and together with my co-founders from the University of California, San Francisco, I am honored to work with uBiome, the world’s first citizen science effort to map the human microbiome.

The project is funded through crowdfunding at Each gut kit is $69; gut and mouth kits are $139 for both (+$12 shipping outside the United States for each). We plan to send out the kits in May 2012 and return the results on our website once we get the kits back from everyone. Our kits are available in 196 countries. Data will be freely available to those who sign up, and available to the global research community on an “opt-in” basis. In the long run, we hope to integrate with 23andme and other types of genetic and metagenetic data.

So far, our project has garnered over $55,000 in crowdfunding from over 480 participants in less than a month. Participants from twenty different countries spanning four continents have pledged their support, including the United States, United Kingdom, Australia, New Zealand, Canada, Finland, France, Germany, as well as India, Singapore, and Uruguay. We’ve been featured so far in Wired, Venture Beat, the Los Angeles Times, Scientific American, BoingBoing, and syndicated in 160 newspapers around the world through the Associate Press.

What is the microbiome, you ask?

The microbiome are the bacteria that live on and within us. It sounds kind of funny, but all of us are actually covered in helpful germs (or co-evolved symbionts if you prefer).

Like the rainforest, the healthy human microbiome is a balanced ecosystem. The correct balance of microbes serves to keep potential pathogens in check and regulate the immune system. Microbes also perform essential functions such as digesting food and synthesizing vitamins. Some research also suggests that microbial activity influences mammalian mood and behavior. Studies have linked microbiome imbalance to autism, depression, and anxiety, as well as many gut disorders, eczema, and chronic sinusitis. Infant health even appears to benefit from a proper seeding of microbes at birth, with health consequences ranging into adolescence. For some future-thinking commentary on the microbiome, check out this interesting editorial in Science by Leroy Hood.

uBiome brings this cutting edge technology directly to consumers for the first time through citizen science.

We provide participants with a catalogue of their own microbes, detailing the microbial composition of the body and explaining what is known about each genera of microbe. In addition, uBiome compares participants’ microbiomes with scientific studies on the role of the microbiome in health, diet and lifestyle. uBiome also provides personal analysis tools and data viewers so that users can anonymously compare their own data with crowd data as well as with the latest scientific research.

From a small sample on a cotton swab, your uBiome test helps you to learn more about your body, including:

  • Diet: Certain gut enterotypes are strongly associated with long-term diets, particularly protein and animal fat (Bacteroides) versus carbohydrates (Prevotella). Maybe you are not sticking to your diet as much as you think.
  • Diabetes: Does your gut microflora correlate with people who have diabetes? If you have other symptoms as well, you might want to talk with your doctor.
  • Sinusitis: Is your nasal microbiome associated with the profile of chronic sinusitis? Some studies have found that multiple, phylogenetically distinct lactic acid bacteria were depleted concomitant with an increase in the relative abundance of a single species, Corynebacterium tuberculostearicum.
  • Alcohol consumption: Do you drink a lot of alcohol? If your gut profile clusters with heavy drinkers, you might want to consider cutting back on the booze.
  • Bowel conditions: Do you have Irritable Bowel Disorder (or any other bowel condition)? You may want to purchase our specially designed kit and survey for bowel disorders.

Please join us and help us spread the word about this project. The more people that contribute, the more we can all learn about our health, and contribute to the advance of SCIENCE!


Talking20 is a simple blood test that can be ordered online and received in the mail. It’s quick and easy! Results are delivered online in secure profile, and can even be viewed from a smartphone. We are ready to gather data and design experiments that will help us help you learn more about what’s going on inside your body!

Get involved with the Talking20 project by ordering a basic kit and sending it in. Tracking your biological data will help you view changes taking place inside your body, and it will help us design new tests that are useful to everyone. Want to know what eating a burger does to your cholesterol? Interested in tracking stress reduction over the course of a month-long yoga course? We are and we need your help. Your body is talking to you, let’s find out what it’s saying!

This is how Talking20 is measuring your metabolites:

Talking20 is taking proven technologies: (1) dried blood spot collection (done for every newborn in developed countries) and (2) mass spectrometry (used in labs everywhere), and innovatively bringing them together. We believe that actionable information about our bodies are available by looking at our proteins and metabolites. That’s where we need to be looking. Danny Hillis of Applied Minds does a amazing job of explaining this concept in his talk at TEDMED.

Mass spectrometry is a very sensitive method for counting molecules from blood, and is actually used as the ‘gold standard’ to set up hospital tests. With a bit of extra work, mass spec can also be used to measure every other molecule too. Without visiting a lab or even leaving your home, you can now just put a few drops of your blood on a card and mail it to us at room temperature from anywhere in the world! We can then do the analysis and tell you your results wherever you are!

Talking20’s commitment is to make this testing as economical as possible, so you can do it whenever you want. Heather has personally tracked her hormones and cholesterol every few days to learn more about her own life events and diet. Talking20 is also on the verge of completing a cortisol test so we can start watching stress levels in response to different types of exercise!

At Talking20, we want to make many different tests available, so you can start finding out about what is happening inside you. We are offering five of these ‘biomarkers’, to measure your cholesterol, vitamin D, estradiol (estrogen), progesterone, and testosterone. This would normally require you go into the doctor, then into a lab to get a blood draw, and then back to the doctor, and would likely cost $300-500! We believe we can do better!

If you want to support them you should visit their Indiegogo page.

Now included: FitBit-data (plus minor design changes)

Hi there,

Basti has been hard at work linking the Fitbit-API to openSNP – if you’re a customer of Fitbit, you can now connect your Fitbit-data with your openSNP-profile to give researchers around the world an even better picture of how your genes interact with your health.

The new Fitbit-phenotype-overview – still kind of empty!

Why this data?

There are several SNPs associated with an increased risk of developing obesity, SNPedia has a very good overview here. For example, some variations are involved in how early your body starts to store fat, or how strong your appetite is.

Carrying these variants, however, does not mean that there’s nothing you can do about your weight – there are some studies showing that regular exercise helps alleviate these effects (see here, for example). This is where the Fitbit-data comes in – it gives researchers a detailed overview of your movement-patterns, the status of your body-weight and your sleep patterns. Using this data scientists can then perform association-studies by comparing weight and activity-data to known and unknown SNPs linked to obesity. You can also link your sleep-data (if you are tracking it) – this enables research into the genetics behind some sleep disorders.

Another reason why it’s great to collect data through a technical device instead through surveys it that all data which comes through the Fitbit-API is normalized into standardized units. This makes it much easier to compare the data, as you don’t have to convert between units (metric & imperial system) or have to work around spelling mistakes etc., while it also allows to circumvent the cognitive biases we all have while answering questions about ourselves. And of course it’s also is more convenient for you, because you don’t have to update your BMI and activity-pages manually, as we automatically will get your latest Fitbit-data. This means there is one profile less you have to worry about!

As a bonus, the level of detail in this data is a level researchers usually can’t reach due to time and financial constraints.

Basti’s page

How to link your data

Linking a Fitbit-account to openSNP is a matter of just a few clicks – once you’re logged in, click on My Account, then on Settings and under the Fitbit-tab you’ll be able to go through the procedure. Alternatively, here’s the direct link to the page. Of course, you can specify which data is going to be shared with openSNP – we don’t automatically take all of it, because that wouldn’t be very nice of us. After you’ve approved that you want to link Fitbit and openSNP you will get redirected to a page where you can choose which categories should be shared and mirrored on openSNP. And yes, you have to submit the form which selects which categories you want to share, until you’ve done so we will not grab any data.

Note that: a) the data at openSNP is automatically updated with new data from Fitbit along with all the data you have put into Fitbit so far and b) you can always unlink your Fitbit account from openSNP, this also deletes your Fitbit-data on our end, in the settings.

This is where we hid the bodies the link

For researchers

If you’re interested in this data, it’s from now on available in all data-dumps in CSV-format for easy parsing (available on the genotypes-page, or directly here). Please note that the CSV lists 0 (that is zero) for missing data-points.

Slight re-design

We’ve also slightly updated the design of openSNP to the newest Bootstrap-version. You shouldn’t have been affected by that (except in the way that things now look a bit nicer. For example take a look at the SNP-pages which now show pie charts for the allele & genotype-frequencies) , but if you find anything wrong, please tell us in the comments or at!

Thanks for reading and have a wonderful weekend,

the openSNP-team

Tagged , ,

New API methods & Vagrant images

Additions to the API

We can announce some minor additions to our API:

  1. You can now grab annotations for a SNP by using$SNP_NAME.json. For example returns all Mendeley-, PLOS- and SNPedia-annotations we have for Rs7903146.
  2. You can now get additional information for phenotypes. You can get a list of all phenotypes by visiting You can use this call to find out the IDs of the phenotypes you are interested in. 
  3. And you can get all phenotypic variation and the description for a given phenotype by visiting$PHENOTYPE_ID.json. For example gives you the variation for each user at the phenotype with the ID 12.

We also added some more extensive documentation of the API to our Wiki at GitHub. The wiki also lists the attributes each call will return.

Vagrant Image

We compiled two Vagrant-images which should make running the openSNP source and development much easier. Installing Ruby, Redis, PostgreSQL, Java, and all the right versions of the Ruby gems can be painful and often takes a significant amount of time – so we created the Vagrant-Image, which is the whole server-runtime with all gems pre-installed inside a virtual environment for easy development.  This Railscast gives you a nice idea of how to install and use Vagrant.

Our Vagrant-images (downloadable as 32 bit version and 64 bit version) are based on Lucid64 and Lucid32 and come with the Ruby version manager rbenv, Ruby 1.9.2 including bundler, Sun’s Java and Postgres 8.4 (the development-tables are already migrated) pre-installed. Getting the development-server running should now be really easy:

  1. Install the image using vagrant box add opensnp
  2. Initialize the image inside your openSNP-folder using vagrant init opensnp
  3. Run vagrant up to start your image and afterwards connect to it using vagrant ssh
  4. Go to the mounted directory using cd /vagrant
  5. Run bundle to see if all gems are working as expected and try bundle exec rails s to start the web-server

You can watch this Railscast to learn more about how you can put Vagrant to use and download our Vagrant-image as 32bit version or 64bit version. One caveat: You need to delete the .rvmrc file if you want to run vagrant from inside the directory which has your copy of the openSNP-source code. The file can be found in the root-folder of the openSNP-source code. If you keep the file your copy of RVM might lead to some problems with vagrant. Let us know if you encounter any problems or need help.

Recent Talks – re:publica & SIGINT

In the last post Philipp already told you that we’ve been giving talks on Personal Genetics and a possible future of Biomedicine over the last months. In May Fabian and I visited the re:publica in Berlin and the SIGINT in Cologne. Now the recordings of those conferences are available. So if you couldn’t make it to Berlin and/or Cologne you can watch it those talks.

Unfortunately the talk on The Future of Genetics which we gave on the re:publica wasn’t recorded, but you can at read and download our slides of the talk. And on the same conference I was also part of a (German, sorry rest of the world!) panel discussion called “Die totale Selbstkontrolle als Wunsch und nicht Bedrohung” (total self-control as wish instead as thread) which mainly focussed on the Quantified Self-movement, but also covered a bit of personal genetics.

On the SIGINT we talked about Power to the Patient and how modern technology along with the web, think of personal health records, could be put to use to change how medicine is done. You can again download the slides or as this talk was also recorded watch the video.

Thanks to all of you who joined the conferences, listened to our talks and – most importantly – took their time to discuss the topics with us.

Milestone reached: 200 genotypings!

We’re happy to announce that openSNP has reached the magnificent number of 200 open genotypings! It’s great that you guys support open science so much, we didn’t think to reach this number so fast. openSNP has been growing constantly, here are some interesting stats:

Here’s the amount of data downloaded from every day over the last 6 months, with the days on the x-axis and the traffic in megabytes on the y-axis:


As you can see, the amount of data jumps around wildly – that’s because the data-dump with all genotypings is fairly large, and there are lots of days in there where no-one downloaded any genotypings, and some days where a lot of people download data – nothing much in between!

And for fun, here’s the amount of daily attacks on the server (usually automated scripts that look for open administrator-interfaces)

What happened there in May? Two articles were released on openSNP and Basti and Fabian held two talks (at the SIGINT and at the re:publica), so that increased positive and negative attention.

Some things that are coming soon:

– picture-uploads for phenotypes

It would be great if users could upload pictures of their phenotypes and variations, for example, the shape of their heads or the form and color of their eyes. We’re currently working on this, might take a while longer.

– more secure backend

In light of the recent attacks on linkedin, and others we’ve switched the password-storing system from salted SHA512 to salted bcrypt-hashes. If you’re interested in how that works, check out this article, Keeping passwords safe by staying up to date.
We’re currently testing how well the implementation works but it’s looking good, coming very soon. You’ll probably not notice any changes.

– a couple more talks

We’ve submitted some more applications to talk at different conferences but haven’t heard back from any. If we go and hold a talk somewhere, we’ll link to the videos!

Thank you for your interest & help and time, we couldn’t have done it without you!

The openSNP-team


Get every new post delivered to your Inbox.

%d bloggers like this: