Ran into some issues with downloading large files and forgot to post this
earlier.
*http://paws-public.wmflabs.org/paws-public/6877667/projects/headings/datasets/enwiki_20160204_headings.tsv.bz2
<http://paws-public.wmflabs.org/paws-public/6877667/projects/headings/datasets/enwiki_20160204_headings.tsv.bz2>*
Columns:
- "page_id" : int
- The identifier of the article
- "page_title"
- The title of the article
- "heading_level"
- The level of the heading in question
- "heading_text"
- The text of the heading
Enjoy!
-Aaron
On Mon, Mar 7, 2016 at 6:52 PM, Yuvi Panda <yuvipanda(a)gmail.com> wrote:
Just also wanted to note that these paws-public URLs
will break in the
near-to-mid future :)
On Mon, Mar 7, 2016 at 4:22 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
wrote:
Got some work done here. I'm using this as
an opportunity to test out
PAWS.
See
http://paws-public.wmflabs.org/paws-public/EpochFail/projects/headings/extr…
It's still running right now, but I should have an output file that we
can
download and/or load into MySQL soon.
-Aaron
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Yuvi Panda T
http://yuvi.in/blog
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l