Forwarded from Betsy Megas, who's subscribed under a different
address. Please read.
Austin
---------- Forwarded message ----------
From: Betsy Megas <Betsy(a)strideth.com>
Date: Tue, Aug 11, 2009 at 10:26 PM
Subject: Election vote strikes
To: "foundation-l-owner(a)lists.wikimedia.org"
<foundation-l-owner(a)lists.wikimedia.org>
Due to an error in a script that was used to generate the list of
authorized voters for this election, roughly 300 votes were cast by
users who were not qualified based on the posted election rules
(requiring that voters have made at least 600 edits before 01 June
2009 across Wikimedia wikis and have made at least 50 edits between 01
January and 01 July 2009). Those votes will be removed by the
election committee prior to the election being tallied by Software in
the Public Interest.
Once this is completed, the election results will be tallied and
announced shortly thereafter.
Questions regarding why a vote was struck can be addressed to
board-elections(a)lists.wikimedia.org.
For the committee,
Dvortygirl
On Wed, Aug 12, 2009 at 3:58 AM, Tim Starling<tstarling(a)wikimedia.org> wrote:
[snip]
> Brianna Laugher was receptive to the idea of having
> Wikimedia projects hotlink or cache images from galleries.
So there have been a number of statements against doing something like
this, but (unsurprisingly) I don't think they have been strong enough
stated or hit all the arguments that I think are important. So please
humour me for a moment.
I think hotlinking images is something we ought not to do for several
independent reasons.
(1) There is no reason to do so.
The so far cited reasons for GLAM interest in this are Branding and Statistics.
Hotlinking or caching would do nothing to improve branding— Most of
the time a hot linked image looks just like a local one to users.
Whatever branding we'd find acceptable could be accomplished as well
or better locally.
Statistics gathering is something that is interesting to many of our
contributors, we cand should have good statistics for everything (and
caching would be useless for statistics), so hotlinking should create
no improvement.
GLAMS have spent money building their own databases, yes. But ours are
an additional copy, our problem, and not a significant cost.
The only other reason I can see for hotlinking would be collecting
resellable marketing data on Wikipedia viewers, and I do not believe
that this would be a use we'd wish to support. (I'm not making a value
judgement here— If that is indeed someone's goal thats fine— only that
it's not one WMF would intentionally support). See below for more…
(2) Hotlinking has enormous privacy problems
When the rubber hits the road NDAs are ineffective: People make
mistakes. Governments and ISPs snoop. Privacy polices are often bad
and allow things which would horrify people. Hotlinking would greatly
increase readers exposure to information leaks.
Some random museum has no business knowing that I loaded the pederasty
article just because some art was placed in it.
Wikimedia's handling of reader privacy ought to be leading-edge
trend-setting stuff. That would be an nearly impossible goal if media
were inlined from many third party sites.
(3) It significantly reduces the atomicity of the Wikimedia projects.
Today are *things*, objects you can obtain (± temporary problems with
the dump system), archive, data-mine, etc. I have complete (though
not current right now) copies of Wikipedia in all languages along with
all images and other media, as well as the core software. Not just
partial bits and pieces, but the whole thing.
External links are a clear boundary between what is in Wikipedia and
what isn't. ... and the stuff *in* wikipedia is all freely licensed
and available for download. They are now all tracked with a common
revision control system, have common (if bad…) metadata.
External dependency would lower reliability and make the generally
less tractable. It would become more difficult to retain backups and
historical records.
Perhaps some day Wikipedia will be too big to maintain any singular
copy of for purely technical reasons, but we are a long long way away
from that now!
So basically I think there are a bunch of practical and principled
problems with hotlinking, but that hot-linking isn't actually needed.
Really good upload systems that preserve metadata and provide good
links to external resources? Statistics collection? These are good
an uncontroversial things. They don't require hotlinking.
Cheers—
As perhaps some people here will recall, I was always skeptical of Knol's
ability to enter the collaborative knowledge space. The reasons discussed
here, including SJ's mentions of the issues of structuring public
collaboration, are no doubt valid, but to me -- and of course it may be said
that this is my Lawyer Vision(tm) kicking in -- the primary problem for Knol
was lack of compatibility with the existing dominant free licenses used by
Wikimedia projects and others. In short, it was difficult for Knol to build
on the work of other collaborative freely licensed projects without, as a
practical matter, violating those licenses. (We saw countless examples of
people attempting to import Wikipedia content into Knol, for example, and
played a bit of whack-a-mole with those folks.)
But to me the takeaway from this error of Knol's licensing design is not
that Knol can't work -- it's that it actually could work, if properly
thought through. So my view right now is the Wikimedia community can't be
complacent about Knol's apparent failure -- properly adjusted and
redesigned, it could have quite an impact on us. We're going to have to
continue to give serious attention to all the issues, from quality to
community to legality, that give us an advantage in terms of fueling
creative collaboration, as we go forward.
The next Knol can't be relied upon to make the same mistakes.
--Mike
This paper is making the rounds:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1446862
"This is a pilot study of the use of “Flash cookies” by popular
websites. We find that more than 50% of the sites in our sample are
using flash cookies to store information about the user. Some are
using it to "respawn" or re-instantiate HTTP cookies deleted by the
user. Flash cookies often share the same values as HTTP cookies, and
are even used on government websites to assign unique values to users.
Privacy policies rarely disclose the presence of Flash cookies, and
user controls for effectuating privacy preferences are lacking. "
Inside it says:
"We encountered Flash cookies on 54 of the top 100 sites. […]
Ninety-eight of the top 100 sites set HTTP cookies (only wikipedia and
wikimedia.org lacked HTTP cookies in our tests). These 98 sites set a
total of 3,602 HTTP cookies."
Kudos to the WMF for avoiding gratuitous reader tracking. Other
people *are* paying attention to the privacy implications of this kind
of user-invisible behavior.
(Adding to what Michael said.)
Yes, we're trying to catch up. May will be posted tomorrow, and June is being worked on right now.
These reports started off as simple staff activity reports (when I joined the Foundation two years ago), and when the staff was small they were fairly easy to put together quickly. Over time, we've added in new structured info such as the comScore Media Metrix data, lists of media interviews, fundraising totals, etc. That takes a little longer to gather -- for example, we don't have finalized fundraising totals until 20 days following the close of month, and comScore data can take even longer. Plus, growth in staff means it takes that much longer to collect and synthesize everyone's input.
Meantime, we've been working towards a parallel data-driven monthly report -- it would include comScore data, financial information, and metrics aimed at assessing participation and quality. The financial information for that report is now regularly produced on a monthly basis, and we are pretty close to having good-enough reach, quality and participation metrics regularly produced as well, thanks to Erik Zachte and others. The goal of the data-driven report is to focus less on staff activity, and more on a high-level assessment of the overall health of the Foundation and its projects.
Once we have the data-driven report in regular production, we can rethink reporting overall. For example, we might decide to publish the monthly data report + a richer text-based staff activities report once a quarter. That would mean the activities report could be less focused on small incremental changes (the staff worked on X, the staff continued Y) and more focused on providing greater detail about a small number of high-priority initiatives, e.g., the strategy project, the usability project, the bookshelf project, etc. Or, we could publish the data report, plus a lightweight, simple monthly activities report focused purely on staff work -- new hires and that kind of thing.
I definitely sympathize with people wanting to be connected and aware of what's going on with the staff. I'd be curious to know what kinds of information people find most useful of what we publish today, and what you'd like to see more of -- and also what you think of the other channels we publish through, e.g., the tech blog, the Foundation blog, press releases, etc. And I do also appreciate your patience as we get caught up on this most recent backlog :-)
Thanks,
Sue
------Original Message------
From: Benjamin Lees
Sender: foundation-l-bounces(a)lists.wikimedia.org
To: Wikimedia Foundation Mailing List
ReplyTo: Wikimedia Foundation Mailing List
Sent: Aug 11, 2009 6:58 PM
Subject: Re: [Foundation-l] Report to the Board April 2009
On Tue, Aug 11, 2009 at 6:46 PM, Sue Gardner <sgardner(a)wikimedia.org> wrote:
> Report to the Wikimedia Foundation Board of Trustees
>
> Covering: April 2009
> Prepared by: Sue Gardner, Executive Director, Wikimedia
> Foundation
> Prepared for: Wikimedia Foundation Board of Trustees
>
I really like these reports, but they'd be more useful if they came sooner
after the events they describe. Will you be able to catch up to a <1-month
delay in the near future? (I wouldn't mind if the reports for May, June, and
July were condensed, if that's what it took.)
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Hey All--
We've made some modifications to the process and time line for the
Donation Button Enhancement project.
You can find and comment on them here:
http://meta.wikimedia.org/wiki/Fundraising_2009/Donation_buttons_upgrade
I appreciate all the feedback so far.
-Rand
--
Rand Montoya
Head of Community Giving
Wikimedia Foundation
www.wikimedia.org
Email: rand(a)wikimedia.org
Phone: 415.839.6885 x615
Fax: 415.882.0495
Cell: 510.685.7030
“At some future time, I hope to have something witty,
intelligent, or funny in this space.”
Onion sourcing. That would be a nice improvement on simple cite styles.
On Tue, Aug 11, 2009 at 12:10 PM, Gregory Crane<gregory.crane(a)tufts.edu> wrote:
> There are various layers to this onion. The key element is that books and
> pages are artifacts in many cases. What we really want are the logical
> structures that splatter across pages.
And across and around works...
> First, we have added a bunch of content -- esp. editions of Greek and Latin
> sources -- to the Internet Archive holdings and we are cataloguing editions
> that are the overall collection, regardless of who put them there. This goes
> well beyond the standard book catalogue records -- we are interested in the
> content not in books per se. Thus, we may add hundreds of records for a
Is there a way to deep link to a specific page-image from one of these
works without removing it from the Internet Archive?
> We would like to have useable etexts from all of these editions -- many of
> which are not yet in our collections. Many of these are in Greek and need a
> lot of work because the OCR is not very good.
So bad OCR for them exists, but no usable etexts?
> To use canonical texts, you need book/chapter/verse markup and you need
> FRBR-like citations ... deep annotations... syntactic analyses, word sense,
> co-reference...
These are nice features, but perhaps you can develop a clean etext
first, and overlay this metadata in parallel or later on.
> My question is what environments can support contributions at various
> levels. Clearly, proofreading OCR output is standard enough.
>
> If you want to get a sense of what operations need ultimately to be
> supported, you could skim
> http://digitalhumanities.org/dhq/vol/3/1/000035.html.
That's a good question. What environments currently support OCR
proofreading and translation, and direct links to page-images of the
original source? This is doable, with no special software or tools,
via wikisource (in multiple languages, with interlanguage links and
crude paragraph alignment) and commons (for page images). The pages
could also be stored in other repositories such as the Archive, as
long as there is an easy way to link out to them or transclude
thumbnails. [maybe an InstaCommons plugin for the Internet Archive?]
That's quite an interesting monograph you link to. I see six main
sets of features/operations described there. Each of them deserves a
mention in Wikimedia's strategic planning. Aside from language
analysis, each has significant value for all of the Projects, not just
wikisource.
OCR tools
* OCR optimization: statistical data, page layout hints
* Capturing page layout logical structures
CROSS REFERENCING
* Quote, source, plagiarism idenfication.
* Named entity identification (automatic for some entities? hints)
* Automatic linking (of urls, abbrv. citations, &c), markup projection
TEXT ALIGNMENT
* Canonical text services (chapter/verse equivalents)
* Version Analysis b/t versions.
* Translation alignment
TRANSLATION SUPPORT
* Automated translation (seed translations, hints for humans)
* Translation dictionaries (on mouseover?)
CROSS-LANGUAGE SEARCHING
* Cross-referencing across translations
* Quote identification across translations
LANGUAGE ANALYSIS
* Word analysis: word sense discovery, morphology.
* Sentence analysis: syntactic, metrical (poetry)
> Greg
>
> John Vandenberg wrote:
>>
>> On Tue, Aug 11, 2009 at 3:00 PM, Samuel Klein<meta.sj(a)gmail.com> wrote:
>>
>>>
>>> ...
>>> Let's take a practical example. A classics professor I know (Greg
>>> Crane, copied here) has scans of primary source materials, some with
>>> approximate or hand-polished OCR, waiting to be uploaded and converted
>>> into a useful online resource for editors, translators, and
>>> classicists around the world.
>>>
>>> Where should he and his students post that material?
>>>
>>
>> I am a bit confused. Are these texts currently hosted at the Perseus
>> Digital Library?
>>
>> If so, they are already a useful online resource. ;-)
>>
>> If they would like to see these primary sources pushed into the
>> Wikimedia community, they would need to upload the images (or DjVu)
>> onto Commons, and the text onto Wikisource where the distributed
>> proofreading software resides.
>>
>> We can work with them to import a few texts in order to demonstrate
>> our technology and preferred methods, and then they can decide whether
>> they are happy with this technology, the community, and the potential
>> for translations and commentary.
>>
>> I made a start on creating a Perseus-to-Wikisource importer about a year
>> ago...!
>>
>> Or they can upload the djvu to Internet Archive.. or a similar
>> depositories... and see where it goes from there.
>>
>>
>>>
>>> Wherever they end up, the primary article about each article would
>>> surely link out to the OL and WS pages for each work (where one
>>> exists).
>>>
>>
>> Wikisource has been adding OCLC numbers to pages, and adding links to
>> archive.org when the djvu files came from there (these links contain
>> an archive.org identifier). There are also links to LibraryThing and
>> Open Library; we have very few rules ;-)
>>
>> --
>> John Vandenberg
>>
>
>
Mark W. wrote:
> It looks to me like Austin did exactly what he should've so I'm not
> sure why you're implying he made an incorrect decision. Exactly what
> did he do wrong in your opinion?
Austin may have done exactly right, but his lack of responsiveness - just
as with Arbcom - just as with Cary - made it an issue. As it currently
stands the list moderator has blocked three of my posts on different
threads, and is also ignoring my direct request to be unblocked.
Here's an idea: Arbcom - respond to case subject's questions and comments
and maybe organize some case-centered discussion. Here's an idea: Mailing
list creators - respond to requests for new list creation. Here's an idea:
Mailing list moderators - respond to requests for clarification about
blocks and state blocks openly.
Nathan wrote:
> Stevertigo is more interested in the debate, in my opinion, than any
> particular outcome.
I do love to argue, but this comment is not accurate. The truth is I just
like it better when people don't act like dicks. This includes angels,
supermodels, Presidents, founders, Arbcom members, foundation bureaucrats,
and myself (I'm admittedly feeling a bit forced into the concept).
> If you find that people don't take your side even after you have "utterly
> destroyed them, point by point" then perhaps you should pick a new approach.
I understand that people don't like having their pet concepts taken apart.
I mean nothing personal by it - simply separate from your defunct concept,
admit cordially that I might have a point, and there will be no issue.
Sources of bullshit will often think that the bull-fighter is evil. "What
of it? At least the [bullshit] is disposed of." (after Mencken)
-Stevertigo
Google has put a preview online of a new version of their search
engine, with a new infrastructure:
http://googlewebmastercentral.blogspot.com/2009/08/help-test-some-next-gene…
You can test it here:
http://www2.sandbox.google.com/
Things are a lot faster, and the results differ from the current
version. I'm wondering if this will have any impact on the number of
visitors on our projects, because so many of our visitors come through
Google links.
-- Hay
Many of us talk/think a lot about how to reduce conflict in our
projects. For those who're interested, I was recently pointed towards
two relevant videos -- I'm posting them here in hopes they might be
useful for others.
So, for whoever's interested, here is:
* Donnie Berkholz's recent talk at Open Source Bridge, titled
"Assholes are killing your project." Donnie, a council member at
Gentoo Linux, advocates establishment of a friendly culture including
a code of conduct, and maintenance of that culture via simple
mechanisms for problem reporting and resolution, plus a clear focus on
mission. Unfortunately the audio here isn't terrific, so it's not
super-easy to follow. http://blip.tv/file/2444432
* Two open source engineers at Google, Ben Collins-Sussman and Brian
Fitzpatrick, with a talk called "How Open Source Projects Survive
Poisonous People" Upshot: Preserve your project's attention and
focus, build a healthy community and fortify it with good community
practices, be on the lookout for problems, and disinfect where
necessary, including marginalization/ignoring of difficult people, and
booting them out if you need to.
http://www.youtube.com/watch?v=ZSFDm3UYkeE
Personally, I have also gotten some good value out of Bill Eddy's book
High-Conflict People in Legal Disputes. Bill is a mediator, lawyer
and former social worker who found himself repeatedly encountering
destructive people in his work, and not knowing how to disarm them or
disengage from them. He wrote High Conflict People to help other
mental health and legal professionals recognize, understand and work
productively with various types of conflict-seeking personalities --
but IMO its usefulness extends way beyond the legal system; it's
relevant for our work too.
http://www.amazon.com/gp/product/0981509053/ref=cm_li_v_cr_self?tag=linkedi…
Thanks,
Sue
--
Sue Gardner
Executive Director
Wikimedia Foundation
415 839 6885 office
Imagine a world in which every single human being can freely share in
the sum of all knowledge. Help us make it a reality!
http://wikimediafoundation.org/wiki/Donate