Hi,
We are researching on the post quality of wikipedia, and we would like to know how the page id is linked to the content ( page article content ). We gather that from the dumps we can mirror wikipedia. We however want only a subset.
Please do reply if anyone knows how to go about this.
Thank you in advance.
Janani Krishnamurthy
Janani schreef:
Hi,
We are researching on the post quality of wikipedia, and we would like to know how the page id is linked to the content ( page article content ). We gather that from the dumps we can mirror wikipedia. We however want only a subset.
Please do reply if anyone knows how to go about this.
Thank you in advance.
Janani Krishnamurthy
Look at [1], [2] and [3]. Every entry in the revision table has a rev_page field corresponding to the page_id it belongs to, and a rev_text_id referring to an old_id in the text table.
Roan Kattouw (Catrope)
[1] http://www.mediawiki.org/wiki/Text_table [2] http://www.mediawiki.org/wiki/Revision_table [3] http://www.mediawiki.org/wiki/Page_table
Thank you so much for your help and time!
On Nov 7, 2007 1:17 PM, Roan Kattouw roan.kattouw@home.nl wrote:
Janani schreef:
Hi,
We are researching on the post quality of wikipedia, and we would like
to
know how the page id is linked to the content ( page article content ).
We
gather that from the dumps we can mirror wikipedia. We however want only
a
subset.
Please do reply if anyone knows how to go about this.
Thank you in advance.
Janani Krishnamurthy
Look at [1], [2] and [3]. Every entry in the revision table has a rev_page field corresponding to the page_id it belongs to, and a rev_text_id referring to an old_id in the text table.
Roan Kattouw (Catrope)
[1] http://www.mediawiki.org/wiki/Text_table [2] http://www.mediawiki.org/wiki/Revision_table [3] http://www.mediawiki.org/wiki/Page_table
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org