Author Archives: Bastian

Videos on openSNP & DAS-support

Another thing we have been procrastinating for far too long: Creating videos on the idea of openSNP and some screencasts that show how you can use openSNP to enter data about yourself and how you can get the data out again for your own research. So here we go. The first video is a small “self-interview” I did to tell you a bit how we started, what you should keep in mind privacy-wise before starting to use openSNP etc.

The next video shows you how you can use the openSNP-frontend to enter your phenotype-data, what kind of information the individual SNP-pages can show you and how you can subscribe to the openSNP-RSS-feeds to be notified about the latest genotyping-files etc. A small new feature which is missing from the video as we implemented it after recording the video: The news-page features a tab that includes the latest publications on the SNPs we have in the database. And for all info-junkies: You can also subscribe to the latest publications using RSS.

The last video shows how you can query the APIs which we have implemented. Bonus-Content: This video includes the first preview on how you can use the Distributed Annotation System to visualize individual genotyping files! In short: You can use http://opensnp.org/das/sources to get a list of all DAS-sources we have with openSNP. Each source represents all SNPs we have of a single user, regardless of how many genotyping files a user has provided.

If you want to use a DAS-source in a genome browser, for example in MyKaryoView you can use the features-commmand of DAS. The link for this is http://opensnp.org/das/$user_id/features, where you have to replace $user_id by the ID of the user you are interested in. If you want to query SNPs between chromosomal positions using DAS you can use http://opensnp.org/das/$user_id/features?segment=$chromosome_name:start,stop. So http://opensnp.org/das/1/features?segment=1:1,1000000 will give you all my SNPs on Chromosome 1 between position 1 and 1000000.

If you want to see an example for a visualization using DAS look at the video below. The DAS features are still experimental. I will attend this DAS workshop to get some help with the final implementation, so if you have suggestions: Please let us know!

Enjoy playing around with this features. As usual: Let us know if you find any bugs we have missed!

Some progress on the API: JSON endpoints

Some weeks ago we stated that we are working on implementing the Distributed Annotation System into openSNP. And I’m sorry that I’ll have to announce that we are not finished with this yet. We just underestimated the amount of time it would take to finish this. But to make up for this we just released some JSON (JavaScript Object Notation) endpoints which you can use to get data out of openSNP. JSON can be easily parsed using software and is already widely used, especially in web applications. For a start we added JSON support for the user-index, for genotypes at single SNPs and for all phenotypes of a given user and I’ll briefly discuss how you can access the different JSON endpoints.

Let’s start with the user-index, which can easily be accessed at http://opensnp.org/users.json. This includes the complete list of all openSNP-users and their names, their unique user-IDs and all the genotyping-files (with the unique genotype-IDs and the download-links). We hope that this makes an ideal entry-point if you are looking for genotyping-files and the user-IDs to further query the openSNP-database.

If you want to get genotypes of single or multiple users for a given SNP you can use the JSON endpoint at http://opensnp.org/snps/json/$snpname/$userid.json. Just replace $snpname by the Rs-ID you are interested in and $userid by the unique ID of the user you are interested in. For example: http://opensnp.org/snps/json/rs9939609/1.json gives you my genotype at Rs9939609. If you are interested in the genotypes of multiple users you can concatenate this into a single query by either using commas to provide multiple User-IDs (for example http://opensnp.org/snps/json/rs9939609/1,6,8.json) or by giving a range of user-IDs (for example http://opensnp.org/snps/json/rs9939609/1-8.json).

Similarly you can access all phenotypic information of a given user by using http://opensnp.org/phenotypes/json/$userid.json. Again: Just replace $userid by the unique ID of the user you are interested in. For example http://opensnp.org/phenotypes/json/1.json gives you all the phenotypic information I have entered about myself so far. Concatenating multiple users into one query works just as for the SNP/User-combinations by using commas (http://opensnp.org/phenotypes/json/1,6,8.json) or ranges (http://opensnp.org/phenotypes/json/1-8.json). In any case: If you request data of users or user/SNP-combinations that don’t exist the JSON-hash you will get back includes the key “error”, just like this.

This are all the options you can supply by using our JSON-endpoints right now. There are no API-keys and no rate limits. We will just see how it turns out and if any limiting of the access will be necessary in the future. We hope that this will allow more easily reusing the openSNP-data and you maybe have already some nice ideas for remixes/browser plugins/younameit. If you have any requests, which kind of JSON-endpoints you need or would like us to add, just let us know. We are currently experimenting with this JSON-stuff and are open for any critique, comments, ideas etc. If you want to help us to implement further features into openSNP: Please do so, we are open for everybody who wants to participate and want to invite you to do so. The source code is freely available and there is a Google Group/mailinglist where we discuss bug-fixing, new features etc. So you might want to join us there.

Videos and Slides on the recent talks

A happy new year from the openSNP-team! Philipp and I are back from our talks. If you couldn’t make it to Berlin you can now watch the videos that were recorded during our talks. You can watch the recordings from our talk on crowdsourcing genome-wide association studies at the 28th Chaos Communication Congress at YouTube or in better quality here. If you are interested in our slides you can get them at SlideShare or as LaTeX-sources at GitHub.

For those of you who speak german: You might be interested in our talk on the privacy implications of the coming post-genomics era, which we gave at the 0. Spackeriade. You can watch it on YouTube as well or download the video. Again: The slides can be found at SlideShare or as LaTeX-sources on GitHub.

Thanks for all who helped on the slides, gave us their feedback and of course all of you who approached us after the talks – in real life or via email – and had some ideas for new features. We already started to work those. Stay tuned to see some changes on openSNP in the next weeks.

Happy Holidays

We have some last news before we leave for our holidays. Let’s start with the biggest news: We were able to secure a little funding through the WissensWert-contest of the german Wikimedia Foundation (sorry, the posting is in German as well). This means that we will have up to 5000 Euros that we can spent to get some more people, who are in love with sharing as we are, genotyped. We will release more details on this as soon as possible.

Additionally Philipp and I will be in Berlin between 12/27 and 12/31. As we have mentioned before we will give a talk on openSNP and crowd-sourced genome-wide association studies at the 28th Chaos Communication Congress. The talk will be on 12/28 at 11 pm. This talk will be in english and there should be day passes, so if you are in town you can pay us a visit. If you are If you’re able to speak or understand German you can also pay a visit to the 0th Spackeriade which takes place on 12/29. We will talk about the implications of the post-genomics-era on privacy.

Thanks again for all your support, for voting for openSNP in the different contests we have entered, for sharing your data with us, for finding bugs, for spreading the word. Have some nice holidays and maybe we’ll see some of you for a beer in Berlin.

Results of the Binary Battle

We are happy to announce that we won the Mendeley / PLoS Binary Battle. This comes unexpected, although we worked hard to achieve this. As we see openSNP as a community-driven project: We really want to thank all of you: For voting for openSNP in the final round of the Battle. For sharing your data. For finding bugs. For your critique. For your ideas and feature suggestions. For all of your support. This is a great source of motivation, especially if I think of implementing all the upcoming feature-ideas we have.

We also want to send our congratulations to PaperCritic, which came in second, and rOpenSci which is the second runner up. Both are definitely worth a look (as are all the other entries of the binary battle as well). It is really great to see what creative minds can build with open data and open APIs.

As Philipp is currently writing his master thesis (and I’m also working for the last exam of this year) there hasn’t been much new in terms of features in the last weeks. But this should end in a week or two and we already have some plans. And we are also applying for some small funding via the german Wikimedia Foundation and their WissensWert-contest, which funds projects that support open knowledge. We are trying to get the funding in order to get data of more people who are into sharing their data genotyped (and may lack the financial resources to get it done). This could lead to some more data sets on openSNP.

Thanks a lot ! And if you have any questions: Just contact us, we are really looking forward to get in touch with you.

You can vote for us

The Mendeley/PLoS Binary Battle now also features a public vote where you can vote for the Top 10+1 submissions. The result of the public vote will count as one point to be added to the expert judges votes.

If you want to help us, give us your vote and spread the word. Thanks a lot!

First Results of the Survey on Sharing Genetic Information

General Information

We have finally taken the time to analyze some of the results of the survey on sharing genetic information we did before we started working on openSNP.

Some general information: Overall 229 people participated in this survey. About 25% of participants gave their chromosomal sex as XX, 74% as XY and there are no differences in terms of usage of DTC-companies between those groups. The mean age of the participants is ~33, the youngest being 15, the oldest being 70. Over 80% of participants gave their ethnicity as caucasian.

Nearly 40% of all participants have already used a DTC-company to get themselves genotyped, further 30% plan to do so while 30% don’t plan to get genotyped. This high amount of participants that got themselves genotyped seems to be the result of the ways we spread the survey: We posted it at the 23andMe-community, sent it to the DIYBio-mailing list and some bloggers out of the fields of genetics/personal genomics also wrote posts on the survey (again: Thanks a lot for your support). We also spread the survey using Twitter, Facebook and Google+. We chose this approach as our goal was not to survey a representative sample, but to assess the demand for a service like openSNP.

68% of all participants said they would agree to share data with their DTC-company, no matter if it shared the data with others, 26% would agree to share, given that the company didn’t distribute the data to others and about 7% were not willing to share at all. No real surprise here: Those who have already been genotyped or are planning to get genotyped are more willing to share than those who don’t plan to. It would be interesting to know if people don’t want to get genotyped because they don’t want to share their data with a company (e.g. Don’t trust DTC-companies).

General reasons (not) to share

My girlfriend says I'm not allowed to display the mean of scaled answers, but then again, she also objects to the bars having shadows, so I wouldn't listen to her.

We also asked a few questions on why people would or wouldn’t share their data with others. Each question could be answered by making a selection on a five point scale, ranging from 1 (strongly disagree) to 5 (strongly agree). There are quite large differences in reasons why people would like to share. The most agreed upon answer is to help scientists with their work (mean = 4.53, median = 5), followed by personal benefits (mean = 3.64, median = 4) and curiosity (mean = 3.5, median = 4). Over half of all people strongly disagree with personalized advertising as a motivation for sharing data (mean = 1,72, median = 1).

There is less diversity in reasons not to publish the results, although the median of “fear of discrimination” and targeted advertising show that over half of all participants at least agree on those questions (medians = 4), while the medians of the questions about consequences for closely related and privacy breaches are in general more neutral (medians = 3).

Differences between customers/non-customers?

We also used an ANOVA and Tukey’s range test to see if there are any differences in agreeing/disagreeing on those questions between survey participants who have already gotten genotyped, those who plan to get genotyped and those who don’t plan to get genotyped. On the topics why people would share their data we found significant differences for the questions regarding helping scientists, having personal benefits and curiosity. Participants who have already gotten genotyped do agree more on those questions, compared to those who don’t plan to get genotyped. For out of curiosity and to help scientists this is even true for comparing the don’t plan to-group to the plan to-group, with the latter one agreeing more.

Regarding reasons not to share genetical information we find similar results: Those who don’t plan to get genotyped agree significantly more on all four questions, compared to those who have gotten themselves genotyped.

Summary (tl;dr)

Although there are no big surprises in those statistics, it is great to get some results regarding our own guesses:

  • People who are already customers of Direct-To-Consumer testing companies (or at least plan on becoming a customer) are more likely to share their data with the company, even if the company allows others to use the data.
  • Customers of DTC testing companies do agree more on questions regarding reasons to share genetical information than those participants who don’t plan to become customers.
  • Those who don’t plan to get genotyped do agree more on topics regarding reasons not to share their data than those who are already genotyped.

It seems that participants who (plan to) get genotyped are feeling more optimistic about the benefits of sharing their data with the DTC company as well as the public and see less problems in possible reasons not to share their data with others, compared to those who don’t plan to get themselves genotyped. And the same seems true vice versa, of course: those who do not plan to get themselves genotyped will agree more to questions concerning the risks of sharing, while scoring lower on questions concerning the possible benefits of doing so.

It’s too bad that we can’t find out (given the current survey) if this is more than correlation. Do people feel more optimistic and lose some of their fears about sharing their data, after they’ve gotten genotyped? Or do they get themselves genotyped because they feel more optimistic about it in the first place (which seems more likely to me)?

We will explore the data set a bit more in the future. Do you have ideas what things we should take a look at?

Binary Battle, Wissenswert-Contest and planned features

Binary Battle & WissensWert-Contest

We are participating in the Mendeley/PloS Binary Battle. Over 40 applications that make use of the Mendeley and PLoS APIs were submitted. A selection of 11 submissions will get reviewed by some great judges and those get the chance to win 10.001 US $. Were happy that we made it into this final selection. But you should also check out the other applications, there are some great tools.

We are also participating in the WissensWert-Contest of the german Wikimedia foundation. They fund ideas that make use of open licenses and try to support Free Knowledge with up to 5000 €. We applied for the funding to get some people genotyped that would like to make their results freely available, but lack the financial resources to pay for it themselves (this is a thing we quite often encountered). With the money we could get over 30 people genotyped by a DTC-company. Making those results available to the public would provide a great resource for people who are interested in personal genetics.

Features

There is not much new at openSNP in terms of features, but: Currently we are working on implementing the Distributed Annotation System (DAS) into openSNP. DAS is a protocol that has been around for ~10 years and it allows the delivery of genetic information in a way that can be easily reused and makes remixing the data really easy. For example the UCSC Genome Browser and ENSEMBL make heavy use of it to display sequences, along with their annotation (SNPs, genes, diseases etc.). Rafael Jimenez and Manuel Corpas also use the DAS-protocol for their MyKaryoView, which is a genome browser that is meant to be used with genotyping data.

We are also working on adding support for zipped files, but currently Philipp and I are facing a high workload at our universities. If you are interested in helping out and doing some coding in Ruby on Rails: Feel free to do so. All of our code can be found at GitHub and we have a mailing list.

On Crawling Efforts and Requesting Data

Some Statistics

We love to share some more data on openSNP with all of you and now seems like a good time to do so.

  • Up to now our database stores a total number of 34 977 228 polymorphisms of 39 different users. Those are divided into 1 933 962 different SNPs.
  • Users have entered a total number of 412 phenotypes, split into 28 different categories.
  • Due to the great support of Mendeley (they relaxed the API-limit for us) we already finished crawling all papers on those SNPs we know of from their database. Those add up to 5940 papers. 698 of those are published as Open Access-papers, so they can be freely accessed by everybody.
  • We also finished crawling the SNPedia and were able to find 7760 different pages that contain information on SNPs that we have listed. This includes links to primary literature as well as summaries on the effects of specific SNPs
  • While we did not finish crawling the Public Library of Science yet (259098 SNPs still need to be checked), we could already find 1135 publications that deal with SNPs listed on openSNP.

On Navigation

All this makes a nice source of information for everyone who is interested in SNPs (and their possible effects), as well for everybody who likes to play around with personal genomics-data. Today we changed the URL-layout a little to make it a bit easier for those of you who are frequently interested in finding out about a specific SNP:

The old URLs just used the internal database-ID of the SNP to deliver the site you were looking for. So if you were interested in rs7903146 you had to visit http://opensnp.org/snps/445791, which was not that nice, as the URL is not informative and you always had to perform a search on openSNP to find the page of interest.

The new URL-layout uses the name of the SNP, so you can easily visit http://opensnp.org/snps/rs7903146 and find all the information you were looking for. But don’t panic if you bookmarked some of the old URLs, they still work, so you don’t have to change a thing.

Enjoy playing around!

28c3 Ticket Sale

The presale-dates for the 28c3 have just been announced. Tickets will be sold on this occasions:

  1. Sunday, November 06, 10:00PM CET (UTC+1) (½ of all tickets)
  2. Monday, November 14, 16:00PM CET (UTC+1) (¼ of all tickets)
  3. Tuesday, November 29, 10:00AM CET (UTC+1) (¼ of all tickets)

If you really would like to participate there is a tip: As tickets are always short you shouldn’t wait but be in front of your internet-device as the sale starts. In order to be able to buy yourself a ticket you need an account for the presale-system. The standard fee for tickets is 80€.

Tagged , ,
%d bloggers like this: