Re: [Wiki-research-l] Extracting PMIDs

23 Oct 2014

Thanks Jody, but I know what PubMed Central is.  Here, I was (unclearly)
asking about the meaning of the "pmc" field.

I talked to Max on IRC.  He said like "pmc" is an old legacy field name
that corresponds to "pmid", so they can be used interchangably.  I've
updated my regex to be /\bpm(id|c) *= *([0-9]+)\b/i and restarted my run
over the 2014-10-08 XML dump.

-Aaron

On Wed, Oct 22, 2014 at 5:58 PM, Jodi Schneider &lt;jschneider(a)pobox.com&gt;
wrote:

...
  PubMedCentral:
https://en.wikipedia.org/wiki/PubMed_Central

 On Thu, Oct 23, 2014 at 12:07 AM, Aaron Halfaker &lt;ahalfaker(a)wikimedia.org&gt;
 wrote:

  Ahh.  What are pmcs?

 On Wed, Oct 22, 2014 at 5:06 PM, Maximilian Klein &lt;isalix(a)gmail.com&gt;
 wrote:

  Out of interest, my regex was

 pmc\s*\=\s*(.*?)[\|\}]

 and then also

 pmid\s*\=\s*(.*?)[\|\}]

 with ignorecase flag set on.

 Make a great day,
 Max Klein ‽ http://notconfusing.com/

 On Wed, Oct 22, 2014 at 12:48 PM, Aaron Halfaker <
 aaron.halfaker(a)gmail.com&gt; wrote:

  Hey folks,

 Somehow I missed this thread, but I've already addressed this request
 on the Village Pump[1].  See:

 See.
 http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.201…

 I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i

 It includes page_id, page_namespace, page_title, rev_id (most recent),
 pmid in TAB separated values.

 Let me know if you have questions or if you think the regex matching
 strategy is insufficient.  It's pretty quick to take another pass.

 1.
 https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting…

 On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein &lt;isalix(a)gmail.com&gt;
 wrote:

> Jake,
> I have script that does this already for DOIs, Its was one-line change
> to make. These files should answer what you were looking for.
>
>
> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt
>
> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt
>
> In the future you can tell them to use halfak's
> https://pythonhosted.org/mediawiki-utilities/
> This is the code I used to get those lists.
>
https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1d…
>
> Make a great day,
> Max Klein ‽ http://notconfusing.com/
>
> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <
> west.andrew.g(a)gmail.com&gt; wrote:
>
>> Jake,
>>
>> Yes, its a rather straightforward parse based on the citation format
>> which Jeremy described. Doc James and I already have this coded up for a
>> soon to be published [[WP:MED]] readership/editorship paper.
>>
>> Searching for PMID's in the entirety of the Wikipedia article base
>> would be a bit time consuming -- but if one needs to pull down only
>> articles in WikiProject Medicine, for example, I am also able to help on
>> that front.
>>
>> Perhaps we'll take this offline, but if anyone else is interested in
>> the dirty details, feel free to contact one of us off-list. -AW
>>
>> --
>> Andrew G. West, PhD
>> http://www.andrew-g-west.com
>>
>>
>>
>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote:
>>
>>> Hi folks,
>>>
>>> Relaying a question from a Stanford medical researcher:
>>>
>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs
>>> from Wiki references?  Furthermore, could you dump those IDs out
>>> into a
>>> list for analysis?"
>>>
>>> Best,
>>> Jake Orlowitz (Ocaasi)
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Extracting PMIDs