Would it be possible to run a search on the full text of Wikipedias for lines that start and end with "==", "===", "====", and lines that start with ";", then make a list of those strings, and count the number of times that each title appears in the list?

Pine

On Jul 13, 2015 10:29 AM, "Jonathan Morgan" <jmorgan@wikimedia.org> wrote:
Cross-posting this request to wiki-research-l. Anyone have data on frequently used section titles in articles (any language), or know of datasets/publications that examined this?

I'm not aware of any off the top of my head, Amir.

- Jonathan

---------- Forwarded message ----------
From: Amir E. Aharoni <amir.aharoni@mail.huji.ac.il>
Date: Sat, Jul 11, 2015 at 3:29 AM
Subject: [Wikitech-l] statistics about frequent section titles
To: Wikimedia developers <wikitech-l@lists.wikimedia.org>


Hi,

Did anybody ever try to collect statistics about frequent section titles in
Wikimedia projects?

For Wikipedia, for example, titles such as "Biography", "Early life",
"Bibliography", "External links", "References", "History", etc., appear in
a lot of articles, and their counterparts appear in a lot of languages.

There are probably similar things in Wikivoyage, Wiktionary and possibly
other projects.

Did anybody ever try to collect statistics of the most frequent section
titles in each language and project?

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l