Is there any chance for progress to be made on this? I recently ran
into this problem again at a featured article candidate I was
reviewing. It is has a very worthy 'National Historic Landmarks' set
of templates at the bottom, but unfortunately this leads to massive
template linkage bloat. Of the over 100 articles that link to this
article, I estimate that only three links are from within the text of
other articles - the rest are from the templates.
If I had been able to see at a glance that this article was linked
from two other articles, I would have been able to make a suggestion
to link back to those articles, and maybe link from other articles. As
it was, I was unable to do this and this caused some problems (which
it is best not to go into here).
So is there anyway to encourage or help with whatever needs to be done here?
On Mon, Feb 7, 2011 at 4:10 AM, David Goodman <dggenwp(a)gmail.com> wrote:
> agreed. The footer templates are the biggest source of linkage bloat.
> the templates are useful, and we need some way of keeping track of
> what should be in them when we add or delete articles, but they make
> working with what links here for any practical purpose extremely
> difficult. They'd be much more helpful if they were separated.
> On Sun, Feb 6, 2011 at 9:52 PM, Carcharoth <carcharothwp(a)googlemail.com> wrote:
>> On Mon, Feb 7, 2011 at 1:34 AM, Tim Starling <tstarling(a)wikimedia.org> wrote:
>>> On 07/02/11 10:56, Carcharoth wrote:
>>>> On Sun, Feb 6, 2011 at 10:19 PM, Magnus Manske
>>>> <magnusmanske(a)googlemail.com> wrote:
>>>>> Many of these links are due to templates, which I can do little about.
>>>> Can *anyone*, even in principle, do something about that? It really
>>>> bugs me that the "what links here" function doesn't distinguish
>>>> between links arising from templates (often not directly relevant) and
>>>> links directly from the article wiki-text. If the answer is something
>>>> to do with parsers, please do explain!
>>> Yes, it's possible. It was necessary to register links from templates
>>> in the pagelinks table so that when a page is deleted or created, the
>>> HTML caches can be updated so that the link colour will change. With a
>>> schema change and some parser work, it would be possible to flag such
>>> links so that they are optional in "what links here".
>> That would be wonderful. It might even get me to create a bugzilla
>> account to vote for a bug if there is one open on this...(of course,
>> one problem is still that some templates are relevant to article
>> content and some are not - the ones that generate distracting links
>> are the navigational ones that tend to be at the bottom of pages, the
>> footer templates - and I'm not sure if infobox links would count as
>> template links or not - they are generated from parsing of a template
>> parameter, but don't appear in the template itself, unlike the footer
>> [In case anyone is confused, an example is the massive footer
>> templates that can lead to Nobel prize winners decades apart linking
>> to each other, or diverse topics within a broad area linking to each
>> other, though only through templates and not in the text. Oh, and some
>> links appear in both footer templates, infoboxes, and the article
>> 'text'. Not sure how that is handled.]
>> WikiEN-l mailing list
>> To unsubscribe from this mailing list, visit:
> David Goodman
> DGG at the enWP
> WikiEN-l mailing list
> To unsubscribe from this mailing list, visit:
"10:20, 29 April 2011 Jimbo Wales (talk | contribs) m (37,376 bytes)
(moved Kate Middleton to Catherine, Duchess of Cambridge over
redirect: Marriage to the Duke of Cambridge) (undo) "
He must have had his finger on the button waiting for Beardie[*] to
pronounce them man and wife...
[*] I can call him that; my mother knows him reasonably well
Good day Wikipedians,
I have of late got into a football management computer game. Don't
panic, I will be relating this post to Wikipedia, hang on. I'm really
enjoying the game. To such an extent that I've actually started to
follow football. I've never particularly liked football. I only
started playing the computer game cos there was a free demo. Now I
like the computer game so much I'm following football in the real
After quite a few hours of playing it struck me that all I was really
doing most of the time was evaluating numbers: player abilities rated
out of 5, 10 or 20 depending on the stat in question. Numbers of
goals. Numbered position in league. Tier of football I'm playing in.
I don't know why this should be so compelling. Watching numbers
change. But the game is incredibly successful (some editions have
broken records for fastest selling computer game according to our
The numbers are clearly giving us players an emotional response. They engage.
Last year, during the Strategy process and before I started playing
this game, I proposed that what Wikipedia needed was "more rewards"
for editors. I proposed a few things. In the end we got Wiki-love,
which I support and like, but they isn't really like what I proposed
at Strategy. To be honest I can barely remember what it was I proposed
I still think we could do with more rewards and maybe this damned game
has given me an answer.
More editor stats.
All of us who have been around for some time know that edit counts are
not very reliable indicators of effort. Nevertheless we still do keep
a public record of editors with high counts. I think there's a reason
for that. I think it's because we still, despite protestations, know
that an edit count does tell us *something* about a Wikipedian. Even
if it's just "(s)he edits a lot".
I believe I'm right in saying that the Foundation is in the process of
setting up something like Toolserver. I suggest we plan to put it to
work. I suggest we expand greatly the stats we keep on individual
editors and form league tables from them. I believe that aiming for a
place in a table will motivate people. I realise that a) this is
unproven and b) there will be objections, particularly regarding
'gaming the system and 'unintended consequences' but perhaps we can
discuss those and mitigate them (more later).
New Stats that could be placed in league tables could include:
* Length of service (difference in days between first edit and last)
* Number of consecutive days/months/weeks where 5 or more edits have
been made (or 50 edits, or a hundred): in short there could be quite a
number of these tables that relate to consistency and number of edits
all of which, I feel, might spur people on to keep contributing.
* Most characters/bytes added (without being removed)
* Most blocks for admins
* Most welcomes, barn stars awarded
* Most reverts / undos
* Average reader-rating of articles user has edited at least ten times
You could also have these as percentage of number of edits and rank
for those too, eg welcomes, blocks or reverts as a percentage of total
edits, (with a minimum number of edits to qualify for inclusion on the
Now, it could be (WILL be!) that someone decides "I'm going for the
revert league title" and starts doing things we wouldn't ideally like
(to put it mildly). However their presence at the head of the league,
I feel, will actually subject their edits to greater scrutiny. People
will look at their contributions and it may well result in needed
censure, showing their activity to be undesirable and action could be
taken accordingly. Also, you may have people in the top table who
aren't even *trying* and their presence at or near the top might cause
some examination of their contribs.
Perhaps you can think of some league tables that would really push
desirable behaviours at minimal risk of negative ones?
If you don't like this idea I'd like to hear the concerns, HOWEVER! I
would also like you to just entertain the idea and - even if you're
against - think of some individual editor stats that could be tracked
you think *may* provide useful feedback, even if you ultimately don't
think we *should*.
So: I propose we greatly increase feedback on user performance to
drive people on. Support editor stats today.
I'm making a crossword-style word game, and I'm trying to automate the process of creating the puzzles, at least somewhat.
I am hoping to find or create a list of English Wikipedia page titles, sorted roughly by how "recognizable" they are, where by recognizable I mean something like, "how likely it is that the average American on the street will be familiar with the name/phrase/subject".
For instance, just to take a random example, on a recognizability scale from 0 to 100, I might score (just guessing here):
Lady_Gaga = 90
Lady_Jane_Grey = 10
Lady_and_the_Tramp = 90
Lady_Antebellum = 5
Lady-in-waiting = 70
Lady_Bird_Johnson = 65
Lady_Marmalade = 10
Ladysmith_Black_Mambazo = 10
One suggestion would just be to use the page length (either number of characters or physical rendered page length) as a proxy for recognizability. That might work, but it feels kind of crude, and certainly would get many false positives, such as Bose-Einstein_condensation.
Someone suggested to me that I might count incoming page links, and referred me to http://dumps.wikimedia.org/enwiki/latest/ and in particular the file enwiki-latest-pagelinks.sql.gz. I downloaded and looked at that file but couldn't understand whether/how the linking structure was represented.
So my questions are:
(1) Do you know if a list like I'm try to make already exists?
(2) If you were going to make a list like this how would you do it? If it was based on page length, which files would you download and process to make it as efficient as possible? If it was based on incoming links, which files specifically would you use, and how would you determine the link count?
Thanks for any help.
Yes, but for the purpose of creating a creating a game that may not be an
issue. Michael asked how to get a list of recognisable topics to build a
game with, not how to list all 3.7 million article names in order of
On 30 September 2011 11:11, Scott MacDonald <doc.wikipedia(a)ntlworld.com>wrote:
> > -----Original Message-----
> > From: wikien-l-bounces(a)lists.wikimedia.org [mailto:wikien-l-
> > bounces(a)lists.wikimedia.org] On Behalf Of WereSpielChequers
> > Sent: 30 September 2011 10:56
> > To: Michael Katz; English Wikipedia
> > Subject: Re: [WikiEN-l] finding the "most recognizable" page names
> > I'd suggest using metrics of page views per article, and if you want a
> > specifically US product screen out articles that don't use American
> > English
> > spelling. Better still would be to get page views from the USA, or at
> > least
> > page views ignoring the 6 hours when the US is most likely to be asleep.
> > WereSpielChequers
> Removing non-US spellings would also distort. You would dismiss "Tony
> Blair", while keeping "Tony Boselli" and also remove articles like
> "Scotland" "Queen Elizabeth II" and "George III of the United Kingdom" -
> of which might conceivably have some recognition in the US.
> WikiEN-l mailing list
> To unsubscribe from this mailing list, visit:
Re your comment "nobody seems to be committed to clearing backlogs of
articles that actually provide legal, if not journalistic, risk for WP and
You might want to have a look at
Since Dashbot's run in January 2009 to the authors of unreferenced BLPs,
there has been a 21 month effort by hundreds, perhaps thousands of editors
involving hundreds of wiki projects. As a result our backlog of known
unreferenced BLPs has dropped by over 99% and the only month available to
the "random month of random unreferenced BLPs" squad is September 2011.
While the Death anomalies project has now been rolled out to 14 different
language versions of Wikipedia, and resolved thousands of anomalies.
OK one of the things we found fairly early in the unreferenced BLP cleanup
was that these are far from being the highest risk articles in the pedia.
But the initial focus on that backlog was because many thought it was one of
our highest risk areas (and at the time we didn't have many other higher
risk areas easily identified).
Yes we also have WikiProject Bacon and several other aspects of the pedia
that are less seriously focussed. Some things are as much about motivating
volunteers as anything else. If we hired a bunch of professional editors you
could probably dispense with some of that. But even with ten times the WMF
budget I'm not convinced that we could get as much edited, or as well. The
wonder of barnstars, wikilove, secret pages, userboxen and the like is that
so many volunteers have been motivated to do so much for so little reward.
BTW I was sorry to hear about your problems on EN wiki, I don't know the
details, but hope that in your case indef does not turn out to be permanent.
As far as I'm concerned discussion as to whether we are or are not
successfully improving the pedia are or at least should be well within the
scope of this list. I may not agree with your comment that "nobody seems to
be committed to clearing backlogs of articles" but I'm happy to defend your
right to say it.
On 22 September 2011 01:31, Phil Nash <phnash(a)blueyonder.co.uk> wrote:
> Carcharoth wrote:
> > On Sat, Sep 17, 2011 at 1:13 AM, Phil Nash <phnash(a)blueyonder.co.uk>
> > wrote:
> > <snip>
> >> [[User:Rodhullandemu]] - "still flying the flag for Wikipedia, for
> >> some inexplicable reason".
> > Does this refer to this?
> > I'm not going to comment further, but I think others who respond to
> > your posts should be aware of this.
> Actually, you did comment further, and on a personal level; see below. And
> the lack of response in nearly nine hours to your post amply demonstrates,
> to me at least, how you seems to have missed the point.
> > What the scope of this mailing list should be (given your recent posts
> > on BLP matters, all copied to Jimmy Wales), is something I'd like to
> > see discussed by the list moderators and those posting here. If there
> > is a reason or rationale behind the posts, attempting to demonstrate
> > something, then fine, but it would be courteous to state that rather
> > then just post randomly like this.
> Starting at the back, and working forward, my posts are not random. They
> carefully selected examples based on my experience as (currently) a reader
> of Wikipedia and my responses to what I found. I take it as obvious that if
> I can read these articles, so can their subjects, and if they don't like
> what they see, making appropriate noises, or (in extreme cases) litigating
> against the Foundation.
> We have BLP policies for that reason, and while I see editors on Wikipedia
> competing to provide articles about bacon(!), fiddling about with templates
> that are ostensibly fit for purpose as they are, and still arguing about
> trivial issues, nobody seems to be committed to clearing backlogs of
> articles that actually provide legal, if not journalistic, risk for WP and
> its parent. And there are myriad similar examples.
> My personal reasons are less important than making sure that this project
> does, and can, continue without unnecessary diversions into legalities-
> perhaps I've been spending too much time reading up Commons policies of
> late, one of which (to paraphrase) says that "just because nobody will
> notice a copyright violation is no reason to ignore policy"- and so it
> should be with any policy on any WMF project that may have consequences for
> the Foundation. I am available to discuss any non-apparent personal
> motivations PRIVATELY by email rather than on a public list. But don't
> assume that I don't have our project's viability at heart.
> As a lawyer by training, qualifications, experience, and observation, I've
> seen many operations thought to be acting blithely within the law crumble
> the ground when the courts have upheld unexpected, but valid challenges.
> not suggesting this is likely in our case; but neither is it beyond the
> bounds of possibility, and at least if I bring risks to the attention of
> others, my hands are clean.
> Hope that helps.
>  and consuming unnecessary resources in TfDs
> WikiEN-l mailing list
> To unsubscribe from this mailing list, visit:
If anybody recognizes this email address as coming from me originally (User:RickK), I would just like to say that the person calling themself User:RickK2 currently editing on Wikipedia is NOT me. Since my account has been locked from my access, I have no other way to address this problem.