I thought folks might like to know that every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on wikidata. I hope that these are the seeds for some amazing applications in biology and medicine.
Well done Andra and ProteinBoxBot !
For example: Here is one (of approximately 40,000) called "spinocerebellar ataxia 37" https://www.wikidata.org/wiki/Q18081265
-Ben
Wow! That's pretty cool work!
Do you have any plans to keep the data fresh?
On Mon Oct 06 2014 at 1:22:12 PM Benjamin Good ben.mcgee.good@gmail.com wrote:
I thought folks might like to know that every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on wikidata. I hope that these are the seeds for some amazing applications in biology and medicine.
Well done Andra and ProteinBoxBot !
For example: Here is one (of approximately 40,000) called "spinocerebellar ataxia 37" https://www.wikidata.org/wiki/Q18081265
-Ben _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Yes we plan to keep the data fresh. We are currently looking into optimising the current bot to increase the throughput and also in ways to automate the update process.
On Mon, Oct 6, 2014 at 10:23 PM, Denny Vrandečić vrandecic@google.com wrote:
Wow! That's pretty cool work!
Do you have any plans to keep the data fresh?
On Mon Oct 06 2014 at 1:22:12 PM Benjamin Good ben.mcgee.good@gmail.com wrote:
I thought folks might like to know that every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on wikidata. I hope that these are the seeds for some amazing applications in biology and medicine.
Well done Andra and ProteinBoxBot !
For example: Here is one (of approximately 40,000) called "spinocerebellar ataxia 37" https://www.wikidata.org/wiki/Q18081265
-Ben _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Yes indeed we do. Andra is working on a scheduler for the update bot now. (And thanks to the tips about wbeditentity function it should go faster now...)
We have a grant to expand the scope of the original gene wiki project ( https://en.wikipedia.org/wiki/Gene_Wiki) to a) use wikidata as its structural foundation and b) expand into other areas such as genes and diseases.
Looking forward to continuing to work with the community to make wikidata a key hub of open biological and medical information on the Web.
-Ben
On Mon, Oct 6, 2014 at 1:23 PM, Denny Vrandečić vrandecic@google.com wrote:
Wow! That's pretty cool work!
Do you have any plans to keep the data fresh?
On Mon Oct 06 2014 at 1:22:12 PM Benjamin Good ben.mcgee.good@gmail.com wrote:
I thought folks might like to know that every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on wikidata. I hope that these are the seeds for some amazing applications in biology and medicine.
Well done Andra and ProteinBoxBot !
For example: Here is one (of approximately 40,000) called "spinocerebellar ataxia 37" https://www.wikidata.org/wiki/Q18081265
-Ben _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey Ben and Andra,
On Mon, Oct 6, 2014 at 10:30 PM, Benjamin Good ben.mcgee.good@gmail.com wrote:
Yes indeed we do. Andra is working on a scheduler for the update bot now. (And thanks to the tips about wbeditentity function it should go faster now...)
We have a grant to expand the scope of the original gene wiki project (https://en.wikipedia.org/wiki/Gene_Wiki) to a) use wikidata as its structural foundation and b) expand into other areas such as genes and diseases.
Looking forward to continuing to work with the community to make wikidata a key hub of open biological and medical information on the Web.
That's really great news! Would you be willing to write a blog post about what you did, your plans for the future and why this is awesome? Jens can work with you on getting that published on our blog. We ran a few lately and your project sounds like it should get a bit ore exposure ;-) http://blog.wikimedia.de/tag/wikidata/
Cheers Lydia
Would be happy to. Let me know suggested size and how to get it over to you.
thanks -Ben
On Mon, Oct 6, 2014 at 1:40 PM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey Ben and Andra,
On Mon, Oct 6, 2014 at 10:30 PM, Benjamin Good ben.mcgee.good@gmail.com wrote:
Yes indeed we do. Andra is working on a scheduler for the update bot
now.
(And thanks to the tips about wbeditentity function it should go faster now...)
We have a grant to expand the scope of the original gene wiki project (https://en.wikipedia.org/wiki/Gene_Wiki) to a) use wikidata as its structural foundation and b) expand into other areas such as genes and diseases.
Looking forward to continuing to work with the community to make
wikidata a
key hub of open biological and medical information on the Web.
That's really great news! Would you be willing to write a blog post about what you did, your plans for the future and why this is awesome? Jens can work with you on getting that published on our blog. We ran a few lately and your project sounds like it should get a bit ore exposure ;-) http://blog.wikimedia.de/tag/wikidata/
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Mon, Oct 6, 2014 at 10:48 PM, Benjamin Good ben.mcgee.good@gmail.com wrote:
Would be happy to. Let me know suggested size and how to get it over to you.
Sweet. As long as you want. Don't write a novel ;-) Pictures are a plus. One or two pages is a good goal. Just send it to Jens and CC me. We'll get it published asap then.
Cheers Lydia
Sounds good. Will do.
On Mon, Oct 6, 2014 at 1:53 PM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Mon, Oct 6, 2014 at 10:48 PM, Benjamin Good ben.mcgee.good@gmail.com wrote:
Would be happy to. Let me know suggested size and how to get it over to you.
Sweet. As long as you want. Don't write a novel ;-) Pictures are a plus. One or two pages is a good goal. Just send it to Jens and CC me. We'll get it published asap then.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
This is absolutely awesome! Congratulations for your work!
L.
2014-10-06 22:21 GMT+02:00 Benjamin Good ben.mcgee.good@gmail.com:
I thought folks might like to know that every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on wikidata. I hope that these are the seeds for some amazing applications in biology and medicine.
Well done Andra and ProteinBoxBot !
For example: Here is one (of approximately 40,000) called "spinocerebellar ataxia 37" https://www.wikidata.org/wiki/Q18081265
-Ben
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Andra, Chinmay, Ben, Andrew,
Kudos! This is a significant milestone, and showcases Wikidata's potential for structuring large sets of biological data. Thanks for your excellent work!
Cheers, Eric
https://www.wikidata.org/wiki/User:Emw
On Mon, Oct 6, 2014 at 4:21 PM, Benjamin Good ben.mcgee.good@gmail.com wrote:
I thought folks might like to know that every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on wikidata. I hope that these are the seeds for some amazing applications in biology and medicine.
Well done Andra and ProteinBoxBot !
For example: Here is one (of approximately 40,000) called "spinocerebellar ataxia 37" https://www.wikidata.org/wiki/Q18081265
-Ben
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
And thanks to you for your help Eric! We still have a long way to go with this, but I do think it's an important milestone.
-Ben
On Oct 7, 2014, at 5:52 AM, Emw emw.wiki@gmail.com wrote:
Andra, Chinmay, Ben, Andrew,
Kudos! This is a significant milestone, and showcases Wikidata's potential for structuring large sets of biological data. Thanks for your excellent work!
Cheers, Eric
https://www.wikidata.org/wiki/User:Emw
On Mon, Oct 6, 2014 at 4:21 PM, Benjamin Good ben.mcgee.good@gmail.com wrote: I thought folks might like to know that every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on wikidata. I hope that these are the seeds for some amazing applications in biology and medicine.
Well done Andra and ProteinBoxBot !
For example: Here is one (of approximately 40,000) called "spinocerebellar ataxia 37" https://www.wikidata.org/wiki/Q18081265
-Ben
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey folks :)
Blog post is now available at http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub... Thanks Benjamin and Andra!
Cheers Lydia
This is awesome!
I'd love to have all SNPs on as well, and I started a discussion about this on Wikiproject MB: https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Le...
I think this would be amazing, because single nucleotide polymorphisms relate the genes to human diseases and traits, which are currently both on Wikidata.
So for instance, we now have the gene https://www.wikidata.org/wiki/Q18028243 which encodes the protein product https://www.wikidata.org/wiki/Q1738190, and we have the SNP https://www.wikidata.org/wiki/Q18341737 IN that gene, which is implicated in the disease https://www.wikidata.org/wiki/Q5712506.
This way we can get a fuller picture from wikidata how changes in genes and gene products are related to the traits and diseases on wikidata.
There are some things I'm really not sure how to handle however- each SNP is a *location*, and in a diploid organism, each location has two values, each of 4 different options (AGTC) and then each combination of values may result in the same protein or a different one. So in the case of the Kell antigen system, the rs8176058 location can be either A or G. A nucleotide of A in this location codes for the 'K' antigen or protein, and G encodes the 'k' antigen. This presents difficulties with representing the information in a single "table" because common variations AT the location have information that needs to be grouped together.
In this case, it's simply the presence of an A or G that determines the gene product, but of course this gets more complicated, where we might not know strictly the "value" of A or G individually but may only have "values" for each genotype (AG, AA, or GG) that may need to be represented. And these genotypes might not always point to a specific gene product, but may instead point to a qualitative trait "increased risk of glaucoma" or a quantitative trait "vision was .2 diopters greater on average".
The two options are:
create a separate WD item for each "option"- i.e. "rs8176058-A" to contain information about variation A at location rs8176058 (or, in the case when information is known about the genotype, "AG genotype on rs8176058")
OR
allow each option "A" or "AG" to be annotated with various fields. The complication is that each annotation may be needed to be annotated itself (and I don't think that's possible on WD) if we have multiple pieces of quantitative information associated with one genotype. Hard to say.
To see how this data is represented in table form elsewhere, you can browse the GWAS catalog:
http://www.genome.gov/page.cfm?pageid=26525384&clearquery=1#result_table
Importing that might be a good start. There it looks something like this:
Risk allele: rs1230666-A Effect: .0269 [0.014-0.039] unit increase Implicated in: Serum thyroid peroxidase antibody levels p-value: 2 x 10-8 reference: Medici M February 27, 2014 PLoS Genet Identification of novel genetic Loci associated with thyroid peroxidase antibodies and clinical thyroid disease.
On Fri, Oct 24, 2014 at 1:24 AM, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey folks :)
Blog post is now available at http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub... Thanks Benjamin and Andra!
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Marielle!
I replied on your post https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Le... but here as well quickly.
I also want this data in wikidata, but, after just attending the American Society for Human Genetics annual meeting, my suggestion is to be slightly patient with this. A lot of people are working hard to standardize the nomenclature for variant identification (facing all the problems you describe above) and I don't think it will take a long time for it to stabilize. (Famous last words.. but a lot of people are building tools that depend on this happening). Once this is accomplished, we ought to be able to use the standard ids to anchor all the wikidata items for variants.
In my opinion, this is a battle best fought over at the Human Genome Variation Society forum (http://www.hgvs.org/mutnomen/) and then applied within wikidata rather than the other way around.
In the meantime, I'd encourage you to keep working on modeling all the claims you would want to see that use variant entities as you have already started doing.
my two cents.. -Ben
On Sun, Oct 26, 2014 at 1:59 PM, Marielle Volz marielle.volz@gmail.com wrote:
This is awesome!
I'd love to have all SNPs on as well, and I started a discussion about this on Wikiproject MB:
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Le...
I think this would be amazing, because single nucleotide polymorphisms relate the genes to human diseases and traits, which are currently both on Wikidata.
So for instance, we now have the gene https://www.wikidata.org/wiki/Q18028243 which encodes the protein product https://www.wikidata.org/wiki/Q1738190, and we have the SNP https://www.wikidata.org/wiki/Q18341737 IN that gene, which is implicated in the disease https://www.wikidata.org/wiki/Q5712506.
This way we can get a fuller picture from wikidata how changes in genes and gene products are related to the traits and diseases on wikidata.
There are some things I'm really not sure how to handle however- each SNP is a *location*, and in a diploid organism, each location has two values, each of 4 different options (AGTC) and then each combination of values may result in the same protein or a different one. So in the case of the Kell antigen system, the rs8176058 location can be either A or G. A nucleotide of A in this location codes for the 'K' antigen or protein, and G encodes the 'k' antigen. This presents difficulties with representing the information in a single "table" because common variations AT the location have information that needs to be grouped together.
In this case, it's simply the presence of an A or G that determines the gene product, but of course this gets more complicated, where we might not know strictly the "value" of A or G individually but may only have "values" for each genotype (AG, AA, or GG) that may need to be represented. And these genotypes might not always point to a specific gene product, but may instead point to a qualitative trait "increased risk of glaucoma" or a quantitative trait "vision was .2 diopters greater on average".
The two options are:
create a separate WD item for each "option"- i.e. "rs8176058-A" to contain information about variation A at location rs8176058 (or, in the case when information is known about the genotype, "AG genotype on rs8176058")
OR
allow each option "A" or "AG" to be annotated with various fields. The complication is that each annotation may be needed to be annotated itself (and I don't think that's possible on WD) if we have multiple pieces of quantitative information associated with one genotype. Hard to say.
To see how this data is represented in table form elsewhere, you can browse the GWAS catalog:
http://www.genome.gov/page.cfm?pageid=26525384&clearquery=1#result_table
Importing that might be a good start. There it looks something like this:
Risk allele: rs1230666-A Effect: .0269 [0.014-0.039] unit increase Implicated in: Serum thyroid peroxidase antibody levels p-value: 2 x 10-8 reference: Medici M February 27, 2014 PLoS Genet Identification of novel genetic Loci associated with thyroid peroxidase antibodies and clinical thyroid disease.
On Fri, Oct 24, 2014 at 1:24 AM, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey folks :)
Blog post is now available at
http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub...
Thanks Benjamin and Andra!
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Great, Lydia,
I just added this to the "Other Noteworthy Stuff" in the summary: "Blog post is now available at: http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub... (See Summary #128)".
Is there a way to update the weekly date automatically please - https://www.wikidata.org/wiki/Wikidata:Status_updates/Next ?
Thanks Lydia, Benjamin and Andra! Scott
On Thu, Oct 23, 2014 at 5:24 PM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey folks :)
Blog post is now available at
http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub... Thanks Benjamin and Andra!
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l