PubMedCentral: https://en.wikipedia.org/wiki/PubMed_CentralOn Thu, Oct 23, 2014 at 12:07 AM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:Ahh. What are pmcs?On Wed, Oct 22, 2014 at 5:06 PM, Maximilian Klein <isalix@gmail.com> wrote:with ignorecase flag set on.Out of interest, my regex wasand then also
pmc\s*\=\s*(.*?)[\|\}]
pmid\s*\=\s*(.*?)[\|\}]On Wed, Oct 22, 2014 at 12:48 PM, Aaron Halfaker <aaron.halfaker@gmail.com> wrote:Hey folks,Somehow I missed this thread, but I've already addressed this request on the Village Pump[1]. See:See. http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv
I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/iIt includes page_id, page_namespace, page_title, rev_id (most recent), pmid in TAB separated values.Let me know if you have questions or if you think the regex matching strategy is insufficient. It's pretty quick to take another pass.1. https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDsOn Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <isalix@gmail.com> wrote:Jake,
I have script that does this already for DOIs, Its was one-line change to make. These files should answer what you were looking for.
https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt
https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txtIn the future you can tell them to use halfak's https://pythonhosted.org/mediawiki-utilities/This is the code I used to get those lists. https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <west.andrew.g@gmail.com> wrote:Jake,
Yes, its a rather straightforward parse based on the citation format which Jeremy described. Doc James and I already have this coded up for a soon to be published [[WP:MED]] readership/editorship paper.
Searching for PMID's in the entirety of the Wikipedia article base would be a bit time consuming -- but if one needs to pull down only articles in WikiProject Medicine, for example, I am also able to help on that front.
Perhaps we'll take this offline, but if anyone else is interested in the dirty details, feel free to contact one of us off-list. -AW
--
Andrew G. West, PhD
http://www.andrew-g-west.com
On 10/20/2014 11:57 PM, Jake Orlowitz wrote:
_______________________________________________Hi folks,
Relaying a question from a Stanford medical researcher:
"Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs
from Wiki references? Furthermore, could you dump those IDs out into a
list for analysis?"
Best,
Jake Orlowitz (Ocaasi)
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l