Some notes that will be of interest to the community now that this
quarter's LevelUp is underway:
https://www.mediawiki.org/wiki/Mentorship_programs/LevelUp/Q1_2013
* If you have AbuseFilter patches that need review, please toss them to
Matthias Mullie so he can review them, towards getting AF
maintainership! Similarly, Benny Situ wants to improve his MediaWiki
core review skills, so please go ahead and ask him to review your core
patches. (This isn't on the list for this quarter, but Waldir and
Liangent are also good candidates to ask core reviews of.)
* S Page is learning to puppetize things and is working to "document
mediawiki-install management" which sounds eminently useful!
* Some folks haven't precisely defined their goals yet, but once this
quarter is over, I predict that we'll have a few more volunteers
maintaining highly-used extensions, and more developers comfortable
writing and reviewing JavaScript.
* The fundraising team is using this time to productize the
DonationInterface extension and make it much more useful to the rest of
the open source community.
A process note: for next quarter, I'm going to have a few IRC office
hours to get all this done in late March and early April. Crowdsourcing
and helping people find each other will be a lot more efficient than
being a SPOF.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation
As many people know, our current search infrastructure has caused a
few problems with the site. It's an area that was greatly improved by
the work that Robert Stojnic (a.k.a. Rainman) did in 2008, but he
hasn't had the time to keep up with it, and to date, the WMF hasn't
invested a lot in further developing it.
When I started with WMF, RobLa asked me to learn our search system and
start adding debugging information, with a plan to start fixing search
problems. I've captured what I've learned about our search
infrastructure here (it is being augmented on a daily basis currently):
http://wikitech.wikimedia.org/view/User:Ram/Search
While Rainman hasn't had a lot of time to dedicate to search, he offered
to spend some time with us to talk about it. We had a small
meeting today, and plan to have at least one or two more meetings
(including an Open Tech Talk soon). In addition to Rainman and me,
David Schoonover, Aaron Schulz, Tim Starling, RobLa were there.
This meeting was helpful for us in understanding lsearchd better, and
starting to talk about a possible plan to move to Solr. Meeting notes
below:
*
lsearchd deep dive*
What processes are running?
- searchfrontend & searchbackend
Indexer:
- index is a collection of files in a certian format
- one index daemon (per server?) (avoids synchronizaton/locking)
- RMI is used as a wrapper for searching the indexes that manages
local/foreign index shard access transparently
- Indexes are sharded on namespace and further into smaller parts (each
checked on query, e.g. map/reduced)
Index updates:
- Initial index building for a wiki is viia an XML dump using an
indexbuilder tool
- Incremental updates work via polling OAI
- There used to be a synchronous update triggered by the searchupdate
hook on article edit but that is disabled.
Misc notes:
- /db/searchterm request format to daemon, responses with one of
opensearch/xml/json format
- "prefix format" use for "lists of suggestions"
- search daemon using 80 threads (class SearchServer) (can run 80/'sec
search requests in parallel, higher than normal load (~10?))
- one daemon running on each server
Possible things to fix:
- better error handling? (e.g. on timeout)
- index opened multiple times and handles pooled. Searchers check
locally and then check foreign servers (index is partitioned). The pool
avoids synchronization around Files which would curtail concurrency. Solr
already makes optimizations for resource sharing.
- Current code is in searchpool (searchcache?) in the search package.
- RMI load balancing is not smart, just random (using solr probably
would deal with this)
- XMLRPC not used anymore (not since the switch to OAI)
- Fix bugs in disaled interwiki search code that caused it to hang
*Solr*
Current Lucene features to make sure new Solr version has:
- Custom ranking metric (we have custom MW logic for determining hit
score)
- "Did You Mean?" engine that can handle multi-word queries (e.g. for
spellchecking)
...potentially related Solr features:
http://lucene.apache.org/solr/features.html
(Query) Function Query - influence the score by user specified complex
functions
of numeric fields or query relevancy scores.
(Core) Pluggable user functions for Function Query
(Query) Auto-suggest functionality for completing user queries
(Query) Dynamic search results clustering using Carrot2
(Schema) Many additional text analysis components including word
splitting, regex and sounds-like filters
*Solr Links*
1. http://lucene.apache.org/solr/ -- single-node frontend for index
query/update
2. http://lucene.apache.org/solr/4_1_0/tutorial.html - 4.1.0 tutorial
3. http://wiki.apache.org/solr/SolrCloud -- Sharding indices and using a
federated group of solr instances to serve query responses
*OAI:*
http://www.mediawiki.org/wiki/Extension:OAIRepository
Are you going to FOSDEM? If so (or if you are considering going) please
add yourself to
http://www.mediawiki.org/wiki/Events/FOSDEM
I still don't know. Depends on whether we have a MediaWiki EU critical mass.
--
Quim Gil
Technical Contributor Coordinator
Wikimedia Foundation
Dear Wikimedians,
Wikimedia Commons is happy to announce that the 2012 Picture of the Year
competition is now open. Click here <To see the candidate images just go to
the POTY 2012 page on Wikimedia Commons at
https://commons.wikimedia.org/wiki/Commons:Picture_of_the_Year/2012/Introdu…>to
vote right now! Voting is open to established Wikimedia users who meet
the following criteria:
1. Users must have an account, at any Wikimedia project, which was
registered *before Tue, 01 Jan 2013 00:00:00 +0000* [UTC].
2. This user account must have more than *75 edits* on *any single*
Wikimedia
project *before Tue, 01 Jan 2013 00:00:00 +0000* [UTC]. Please check
your account eligibility at the POTY 2012 Contest Eligibility
tool<https://toolserver.org/~pathoschild/accounteligibility/?user=&wiki=&event=27>
.
3. Users must vote with an account meeting the above requirements either
on Commons or another SUL-related Wikimedia project (for other Wikimedia
projects, the account must be attached to the user's Commons account
through SUL <https://meta.wikimedia.org/wiki/Help:Unified_login>).
Hundreds of images that have been rated Featured Pictures by the
international Wikimedia Commons community in the past year are all entered
in this competition. From professional animal and plant shots to
breathtaking panoramas and skylines, restorations of historically relevant
images, images portraying the world's best architecture, maps, emblems,
diagrams created with the most modern technology, and impressive human
portraits, Commons features pictures of all flavors.
For your convenience, we have sorted the images into topic categories. Two
rounds of voting will be held: In the first round, you voted for as many
images as you like. The first round category winners and the top ten
overall have made it to the final.* In the final round, when a limited
number of images are left, you must decide on the one image that you want
to become the Picture of the Year.*
Wikimedia Commons celebrates our featured images of 2012 with this contest.
Your votes decide the Picture of the Year, so remember to vote in the first
round by *February 14, 2013*.
To see the candidate images just go to the POTY 2012 page on Wikimedia
Commons at *
https://commons.wikimedia.org/wiki/Commons:Picture_of_the_Year/2012/Introdu…
*
Thanks,
the Wikimedia Commons Picture of the Year committee
On 02/05/2013 02:35 AM, Jens Ohlig wrote:
>> I'm wondering if some of the specialized functionality can be avoided by
>> fetching JSON data from wikibase / wikidata through a web API. This
>> would be more versatile, and could be used by alternative templating
>> systems.
>
> This was actually my first idea! However, since the client (i.e.
> Wikipedia) currently must have access to the database at the repo (i.e.
> Wikidata) anyway, this would result in a huge performance loss without
> any obvious gain.
Jens,
I am not so sure about the potential performance loss. I am guessing
that you fear the overheads of JSON serialization, which tends to be
relatively low with current libraries. Moving or accessing PHP objects
to/from Lua involves some overheads too, which is avoided when directly
decoding in Lua.
Apart from making the data generally available, using a web API means
that the execution can be parallelized / distributed and potentially
cached. It also tends to lead to narrow interfaces with explicit
handling of state. Is direct DB access just needed because an API is
missing, or are there technical issues that are hard to handle in a web API?
Adding specialized Wikidata methods to Lua has a cost for users and
developers. Users probably need to learn larger and less general APIs.
Developers need to continue to support these methods once the content is
there, which can be difficult if the specialized methods don't cleanly
map to a future web API.
Gabriel
Sorry for cross-posting, but for MediaWiki developers, this is a good
opportunity to ask any questions you might have about the newly-released
Extension:GuidedTour, and how to leverage it to build any tours yourself.
---------- Forwarded message ----------
From: Steven Walling <swalling(a)wikimedia.org>
Date: Mon, Feb 4, 2013 at 12:57 PM
Subject: IRC office hours with the Editor Engagement Experiments team
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Hi all,
This Wednesday at 22:00 UTC,[1] there will be an open discussion in
#wikimedia-office with our team, Editor Engagement Experiments.[2]
We've launched several new features since our last office hours --
including interactive "guided tours" and a getting started page for
newly-registered Wikipedians. We'll likely discuss these projects,
including testing results so far, as well as any questions people might
have.
Thanks,
--
Steven Walling
https://wikimediafoundation.org/
1. https://meta.wikimedia.org/wiki/IRC_office_hours
2. https://meta.wikimedia.org/wiki/Editor_engagement_experiments
--
Steven Walling
https://wikimediafoundation.org/
Forwarding to wikitech with RFC:
Currently the mobile site scrubs an elements with the noprint class.
It also scrubs elements with the nomobile class. Each "scrub" of
element effects performance [citation needed] so this would be a good
thing to do.
It has been suggested that we stop scrubbing .noprint elements and
there is a patchset to do so:
https://gerrit.wikimedia.org/r/#/c/43852/
At worst case merging this patchset will suddenly reveal elements with
the noprint class on mobile. To me this is not a big deal but I
understand it is of concern to some people and some people have
requested communicating this change.
If there are any objections to doing this please voice them now.
Please note if you do have objections I will expect you to help us
come to a satisfactory solution to this problem!!
I've included the discussion below for your reference.
Thanks!
On Mon, Feb 4, 2013 at 2:30 PM, Arthur Richards <arichards(a)wikimedia.org> wrote:
> I am inclined to agree with Jon and make the change - especially considering
> there has been radio silence about the change on this list, I suspect it
> will be ok, unless there is a better forum to discuss these kinds of changes
> (eg wikitech-l)?
>
>
> On Mon, Jan 28, 2013 at 6:46 PM, Jon Robson <jdlrobson(a)gmail.com> wrote:
>>
>> Any further thoughts on this? Should I abandon the patchset...? Does
>> anyone want to own this problem?
>>
>> On Thu, Jan 24, 2013 at 10:16 AM, Jon Robson <jdlrobson(a)gmail.com> wrote:
>> > I think this is a pessimistic view of things. I don't suspect this
>> > would have a catastrophic change and we could always add a boolean in
>> > LocalSettings.php in case we feel this will and need to roll back.
>> > Realistically the fall out here is going to be a nuisance more than
>> > anything and certain content will appear that didn't use to.
>> >
>> > I personally think the best form of communication is to make the
>> > change and then deal with the fallout. If something is rendering
>> > strangely then people will notice and complain and we'll get that
>> > fixed. This will happen in a much quicker time then spending time
>> > exploring the impact and communicating and waiting for people to make
>> > their changes with no incentive.
>> >
>> > On Tue, Jan 22, 2013 at 10:31 AM, Max Semenik <maxsem.wiki(a)gmail.com>
>> > wrote:
>> >> While I certainly agree in principle, do we know how many pages are
>> >> relying on this feature? Also, some communication would be good.
>> >>
>> >> On 14.01.2013, 22:21 Jon wrote:
>> >>
>> >>> I agree.
>> >>> We already have the nomobile class and have done so for a while. The
>> >>> noprint class has been removed from sometime and never got reevaluated
>> >>> with the addition of nomobile
>> >>
>> >>> Thanks for pointing this out!
>> >>
>> >>> https://gerrit.wikimedia.org/r/43852
>> >>
>> >>> On Sun, Jan 13, 2013 at 6:42 PM, Danny B.
>> >>> <Wikipedia.Danny.B(a)email.cz> wrote:
>> >>>> Hello,
>> >>>>
>> >>>> I found out, that items with class noprint are not delivered to the
>> >>>> mobile
>> >>>> version. Is that a bug or feature? If it is feature, then I strongly
>> >>>> suggest
>> >>>> to reconsider it and rather set up new class "nomobile" instead. Some
>> >>>> stuff
>> >>>> with noprint is useful on mobile, it just does not have a sense to
>> >>>> *print*
>> >>>> it.
>> >>>>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >> Max Semenik ([[User:MaxSem]])
>> >>
>> >
>> >
>> >
>> > --
>> > Jon Robson
>> > http://jonrobson.me.uk
>> > @rakugojon
>>
>>
>>
>> --
>> Jon Robson
>> http://jonrobson.me.uk
>> @rakugojon
>>
>> _______________________________________________
>> Mobile-l mailing list
>> Mobile-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
>
>
> --
> Arthur Richards
> Software Engineer, Mobile
> [[User:Awjrichards]]
> IRC: awjr
> +1-415-839-6885 x6687
--
Jon Robson
http://jonrobson.me.uk
@rakugojon
Back in December, there was discussion about needing a better method of
identifying disambiguation pages programmatically (bug 6754). I wrote
some core code to accomplish this, but was informed that disambiguation
functions should reside in extensions rather than in core, per bug
35981. I abandoned the core code and wrote an extension instead
(https://gerrit.wikimedia.org/r/#/c/41043/). Now, however, it has been
suggested that this code needs to reside in core after all
(https://www.mediawiki.org/wiki/Suggestions_for_extensions_to_be_integrated#…).
Personally, I don't mind implementing it either way, but would like to
have consensus on where this code should reside. The code is pretty
clean and lightweight, so it wouldn't increase the footprint of core
MediaWiki (it would actually decrease the existing footprint slightly
since it replaces more hacky existing core code). So core bloat isn't
really an issue. The issue is: Where does it most make sense for
disambiguation features to reside? Should disambiguation pages be
supported out of the box or require an extension to fully support?
The specific disambiguation features I'm talking about are:
1. Make it easy to identify disambiguation pages via a page property in
the database (set by a templated magic word)
2. Provide a special page (and corresponding API) for seeing what pages
are linking to disambiguation pages
3. Assign a unique class to disambiguation links so that gadgets can
allow them to be uniquely colored or have special UI (not yet implemented)
Ryan Kaldari
Last week we had our first Bug Day of the year.
*---How it Went*---
We looked at bugs (excluding enhancements) that had not seen any changes in
over a year, a little over 250 bugs. The bugs came from a number of
different products and components. We started looking at the bugs in
"General/Unknown." Attendance was lower than what we wanted. Andre,
Matanya, and I triaged bugs and had help from developers in #wikimedia-dev.
*
---What we Achieved*---
We triaged about 30 bugs [1]. This included re-testing, prioritizing, and
closing old reports. Reports for which we requested more information will
be closed after 3 weeks if we receive no response. We also made note of
what we could improve for the coming Bug Days.
*
---What we can improve*---
Some things we can improve:
- Better Landing Page
- We posted the announcement on [2]. The announcement is in the
middle of the page. Instead we can have a Bug Day page where the time and
date for the next Bug Day is prominent. It should be friendly for new
users, as this is likely where they would refer to if they are interested
in joining a Bug Day.
- Hosting the event in a different IRC channel
- We held the event in #wikimedia-dev. We were able to get help from
developers on the channel, but it was hard to tell if users joining were
there for the Bug Day. We felt greeting each user that joined could have
increased noise on the channel, and could have been annoying to other
users. We may hold the next even in #wikimedia-office if there
are no other
meetings scheduled for that time.
Thank you for your participation and support! We hope the coming Bug Days
will get better and better.
-Valerie Juarez
[1] http://www.mediawiki.org/wiki/Bug_management/Triage/20130129
[2] http://www.mediawiki.org/wiki/Bug_management/Triage
Heads-up to API developers and to deployment reviewers.
http://lists.wikimedia.org/pipermail/glam/2013-January/thread.html#310
is the full discussion thread.
-Sumana
-------- Original Message --------
Subject: Re: [GLAM] Wiki GLAM Toolset project (GLAM Digest, Vol 18,
Issue 11)
Date: Thu, 24 Jan 2013 11:47:54 +0100
From: Maarten Zeinstra <mz(a)kl.nl>
Reply-To: glam(a)lists.wikimedia.org
To: Wikimedia & GLAM collaboration [Public] <glam(a)lists.wikimedia.org>
Hi Ed,
This is as of yet unsure.
We are now developing on Wikimedia labs and use the architecture of an
extension. This extension will either keep on running on WIkimediaLabs
and use Commons' API to upload materials or run as an extension on
Wikimedia Commons. Our team is aiming for the latter. However this is
something that the sysadmins of Commons need to approve. This also a
reason why we are looking for more eyes on the project. To ensure the
quality of our code and avoid any possible security risks.
Does this answer your question?
Cheers,
Maarten
On Jan 24, 2013, at 11:24 , Ed Summers <ehs(a)pobox.com> wrote:
> Sorry, by "Wikimedia instance" I of course meant "Mediawiki instance" :-)
>
> //Ed
>
> On Thu, Jan 24, 2013 at 5:23 AM, Ed Summers <ehs(a)pobox.com> wrote:
>> I guess I'm confused about the expected deployment of the GLAM
>> Toolset. Is the idea that the plugins you develop will be deployed on
>> the commons? Or will the GLAM Toolset be deployed in a separate
>> Wikimedia instance, and then (somehow) the data will be migrated from
>> it to the commons? Or something else entirely?
>>
>> //Ed
>>
>> On Thu, Jan 24, 2013 at 4:09 AM, Maarten Zeinstra <mz(a)kl.nl> wrote:
>>> Hi Ed,
>>>
>>> The short answer is no.
>>>
>>> Mass uploading files from cultural institutions has always been a tailor
>>> made process. Most of it thanks to the work of Maarten Dammers/ Multichill.
>>> We are now seeing that more and more cultural institution want to add parts
>>> of their collection online which makes this original process not manageable.
>>> This is mainly because there are different standards used within WIkimedia
>>> Commons and within the cultural institution. This project tries to solve
>>> that project by giving an interface for these institution to upload data and
>>> map their internal standard to the standard that Wikimedia Commons uses.
>>> Also it gives these institutions the option to do it without much assistance
>>> from a Wikipedian, which speeds things up in the long run.
>>>
>>> Does this answer your question?
>>>
>>> Cheers,
>>>
>>> Maarten Zeinstra
>>> Project member of the Wiki GLAM Toolset project
>>>
>>> On Jan 24, 2013, at 3:42 , Ed Summers <ehs(a)pobox.com> wrote:
>>>
>>> Is it not possible to upload metadata via the API?
>>>
>>> On Wednesday, January 23, 2013, Maarten Dammers wrote:
>>>>
>>>> Hi Ed,
>>>>
>>>> Op 23-1-2013 11:14, Ed Summers schreef:
>>>>>
>>>>> Thanks for the update. I was wondering why you need to develop this
>>>>> new functionality as a mediawiki extension. Is there not enough
>>>>> functionality in the commons Web API [1]? Or is the plan to create
>>>>> plugins that will be deployed on commons.wikimedia.org?
>>>>
>>>> It's not about uploading large amounts of images (that's easy), but it's
>>>> about uploading it with metadata properly sorted out.
>>>> For this handling of metadata the extension will be the interface. Please
>>>> take a look at the sprint demo's at
>>>> https://commons.wikimedia.org/wiki/Commons:GLAMwiki_toolset_project to get
>>>> an idea.
>>>>
>>>> Hope that that answers your question,
>>>>
>>>> Maarten
>>>>
>>>>>
>>>>> //Ed
>>>>>
>>>>> [1] http://commons.wikimedia.org/w/api.php
>>>>>
>>>>> On Tue, Jan 22, 2013 at 7:50 AM, Geer Oskam <Geer.Oskam(a)kb.nl> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>>
>>>>>> As you might have heard Wikimedia Nederland, Wikimedia UK, Wikimedia
>>>>>> France
>>>>>> and Europeana are collaborating to provide a set of tools to get
>>>>>> material
>>>>>> from GLAM institutions onto Wikimedia Commons. The Wiki GLAM Toolset has
>>>>>> to
>>>>>> be created in a way that re-use can easily be tracked, and that Commons
>>>>>> materials can easily be integrated back into the collection of the
>>>>>> original
>>>>>> GLAM. You can find more about our the Wiki GLAM Toolset on our project
>>>>>> page.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Currently we are looking for developers who can help us in this process,
>>>>>> if
>>>>>> you are interested please let us know. You can find more specific
>>>>>> information about our needs on our discussion page.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Geer Oskam
>>>>>> www.europeana.eu
>>>>>>
>>>>>> Phone: +31 (0)70 31 40 972
>>>>>> Email: geer.oskam(a)kb.nl
>>>>>> Skype: GeerOskam
>>>>>>
>>>>>> Over 20 million cultural records for re-use via Europeana API: find out
>>>>>> more
>>>>>> and register for an API-key on http://bit.ly/Reuse_API
>>>>>>
>>>>>> Disclaimer: This email and any files transmitted with it are
>>>>>> confidential
>>>>>> and intended solely for the use of the individual or entity to whom they
>>>>>> are
>>>>>> addressed. If you have received this email in error please notify the
>>>>>> system
>>>>>> manager. If you are not the named addressee you should not disseminate,
>>>>>> distribute or copy this email. Please notify the sender immediately by
>>>>>> email
>>>>>> if you have received this email by mistake and delete this email from
>>>>>> your
>>>>>> system.
>>>>>>
>>>>>> P Please consider your environmental responsibility before printing
>>>>>> this
>>>>>> e-mail.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> GLAM mailing list
>>>>>> GLAM(a)lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/glam
>>>>>>
>>>>> _______________________________________________
>>>>> GLAM mailing list
>>>>> GLAM(a)lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/glam
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> GLAM mailing list
>>>> GLAM(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/glam
>>>
>>> _______________________________________________
>>> GLAM mailing list
>>> GLAM(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/glam
>>>
>>>
>>>
>>> _______________________________________________
>>> GLAM mailing list
>>> GLAM(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/glam
>>>
>
> _______________________________________________
> GLAM mailing list
> GLAM(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/glam