This is awesome!
I'd love to have all SNPs on as well, and I started a discussion about this on Wikiproject MB: https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Le...
I think this would be amazing, because single nucleotide polymorphisms relate the genes to human diseases and traits, which are currently both on Wikidata.
So for instance, we now have the gene https://www.wikidata.org/wiki/Q18028243 which encodes the protein product https://www.wikidata.org/wiki/Q1738190, and we have the SNP https://www.wikidata.org/wiki/Q18341737 IN that gene, which is implicated in the disease https://www.wikidata.org/wiki/Q5712506.
This way we can get a fuller picture from wikidata how changes in genes and gene products are related to the traits and diseases on wikidata.
There are some things I'm really not sure how to handle however- each SNP is a *location*, and in a diploid organism, each location has two values, each of 4 different options (AGTC) and then each combination of values may result in the same protein or a different one. So in the case of the Kell antigen system, the rs8176058 location can be either A or G. A nucleotide of A in this location codes for the 'K' antigen or protein, and G encodes the 'k' antigen. This presents difficulties with representing the information in a single "table" because common variations AT the location have information that needs to be grouped together.
In this case, it's simply the presence of an A or G that determines the gene product, but of course this gets more complicated, where we might not know strictly the "value" of A or G individually but may only have "values" for each genotype (AG, AA, or GG) that may need to be represented. And these genotypes might not always point to a specific gene product, but may instead point to a qualitative trait "increased risk of glaucoma" or a quantitative trait "vision was .2 diopters greater on average".
The two options are:
create a separate WD item for each "option"- i.e. "rs8176058-A" to contain information about variation A at location rs8176058 (or, in the case when information is known about the genotype, "AG genotype on rs8176058")
OR
allow each option "A" or "AG" to be annotated with various fields. The complication is that each annotation may be needed to be annotated itself (and I don't think that's possible on WD) if we have multiple pieces of quantitative information associated with one genotype. Hard to say.
To see how this data is represented in table form elsewhere, you can browse the GWAS catalog:
http://www.genome.gov/page.cfm?pageid=26525384&clearquery=1#result_table
Importing that might be a good start. There it looks something like this:
Risk allele: rs1230666-A Effect: .0269 [0.014-0.039] unit increase Implicated in: Serum thyroid peroxidase antibody levels p-value: 2 x 10-8 reference: Medici M February 27, 2014 PLoS Genet Identification of novel genetic Loci associated with thyroid peroxidase antibodies and clinical thyroid disease.
On Fri, Oct 24, 2014 at 1:24 AM, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey folks :)
Blog post is now available at http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub... Thanks Benjamin and Andra!
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l