test2.wikimedia.org is now configured to act as a client to wikidata.org. It's
supposed to access data items by directly talking to wikidata.org's database.
But this fails: Revision::getRevisionText returns false. Any ideas why that
would be? I have documented the issue in detail here:
https://bugzilla.wikimedia.org/show_bug.cgi?id=42825
Any help would be appreciated.
-- daniel
Hi guys!
SMWCon Fall 2012, the conference on Semantic MediaWiki is now fully
available on YouTube.
Use this YouTube playlist:
http://www.youtube.com/playlist?list=PLwtfwT1GnUQRaLki-YcF-_n8ndayi--W5
And the conference page is still here:
http://semantic-mediawiki.org/wiki/SMWCon_Fall_2012
Here is the small review of the second day of the conference.
In the keynote Peter Haase have presented Information Workbench
platform. In fact it's an enterprise semantic wiki, it has lots of
import\export features, visualizations and RDF support and visual
editor. Nice mature product that have proven that semantic wiki can be
useful in big companies.
The talks about new features in well-known extensions have made me
excited the most: Jeroen (+Nischay, +MWJames) have told about Semantic
Maps and Semantic Result formats. Lots of new plots and graphs,
interactive charts, SHAPES on the map, searching through markers,
clusters... wow! Stephan Gambke have also presented his new 'filtered'
format that have gained popularity immediately in our community.
Stephan made 'filtered' format available for calendars and have great
plans on further development. Big hooray to the developers that have
made SMW even more functional and beautiful!
Searching and SMW is the topic that is now being actively developed by
GESIS institute, so we had two talks about SolrStore and SMW.
Also we had two talks about the new SMW extensions that may be of
interest: Semantic Image Annotator that guys from AIFB have developed
to help annotate parts of the scan pages of books (Corpora analysis)
and Semantic Expressiveness that allows you to use much shorter syntax
for your queries: Daniel Werner described how it helps him to develop
RPG Wiki. Among lightning talk there was Presentation of Toneelstof -
pretty impressive visualization with a clear use csse that uses SMW in
a background.
At the end of the conference Joel Natividad described how linked data
may help in city infrastructure.
P.S. dear speakers, if you have time, please add to the template Talk
link to your talk on YouTube in a parameter Video. I'm quite slow guy
as you can see
----
Yury Katkov, WikiVote, Program Chair
Hello all,
After the new version of LabeledSectionTransclusion (LST) was deployed on
itwikisource, performance issues popped up. itwikisource's main page makes
heavy use of LST, and the new version is clearly heavier than the old one.
In this mail, I'll try to describe the aims of the new version, how the old
version worked and how the new version works.
Aims
-------
In the old situation, it was possible to transclude sections of pages by
marking them with <section> tags. However, it was impossible
to include those tags from within a template. I.e. given
page P: something before <section start='a'>something with a</section
end='a'> something after
page Q: {{#lst:P|a}}
then Q was rendered as
something with a
However, it was not possible to do something like:
page O: ===<section start='header'>{{{1}}}</section end='header'>===
page P: {{O|Some header text}}
page Q: {{#lst:P|header}}
Changes in the #lst parser
--------------------------------------
This was because in the old situation, the #lst mechanism did something
along these lines:
1) get DOM using $parser->getTemplateDom( $title ); - note that this is a
non-expanded DOM, as in templates are not expanded
2) traverse this DOM, find section tags, and call
$parser->replaceVariables(....) on the relevant sections
In the new situation, the #lst mechanism does something like:
1) get expanded wikitext using
$parser->preprocess("{{:page_to_be_transcluded}}")
2) get the DOM by calling $parser->preprocessToDom() on the expanded
wikitext
3) traverse this DOM, find section tags, and call
$parser->replaceVariables(....)
on the relevant sections (unchanged)
One obvious performance issue is that (1) and (2) are not cached - not
within one response (so if a page {{#lst}}'s the same page twice, that page
is processed twice), and not between responses (no caching).
In general, I think it would be preferrable not to do a full parse, but
just to expand the DOM of the templates. Unfortunately, I have not been
able to find a simple way to do this: PPFrame::Expand expands the templates
to their final form, not to an 'expanded DOM'.
I don't know MediaWiki caching well enough to say something about which
caches are used (or not), and what would be an effective caching strategy.
Any ideas on how to do LST without bluntly doing a full page parse for
every transcluded page, or on caching strategies, would be very welcome.
Best,
Merlijn
Hi!
Once wikidata.org allows for entry of arbitrary properties, we will need some
protection against spam. However, there is a nasty little problem with making
SpamBlacklist, AntiBot, AbuseFilter etc work with Wikidata content:
Wikibase implements editing directly via the API, but using EditPage. But the
spam filters usually hook into EditPage, typically using the EditFilter or
EditFilterMerged resp EditFilterMergedContent.
Wikibase has a utility class called EditEntity which implements many things
otherwise done by the EditPage: token checks, conflict detection and resolution,
permission checks, etc. We could just trigger EditFilterMergedContent there,
and also EditFilterMerged and EditFilter, though we would have to fake the
"text" for these.
There is one problem with this though: These hooks take as their first parameter
an EnditPage object, and the handler functions defined in the various extensions
make use of this. Often, just to get the context, like page title, etc - but
often enough also for non-trivial things, like calling EditPage::spamPage() or
even EditPage::spamPageWithContent().
How can we handle this? I see several possibilities:
1) change the definition of the hook so it just has a ContextSource as it's
first parameter, and fix all extensions that use the hook. However, it is
unclear how functionality like EditPage::spamPageWithContent() can then be
implemented. EditPage::spamPage() could be moved to a utility class, or into
OutputPage.
2) emulate an EditPage object, using a proxy/stub/dummy object. This would need
a bit of coding, and it's prone to get out of sync with the real EditPage. But
things like spamPageWithContent() could be implemented nicely, in a content
model specific manner.
3) we could instantiate a dummy EditPage, and pass that to the hooks. But
EditPage doesn't support non-text content, and even if we force it, we are
likely to end up with an edit field full of json, if we are not very careful.
4) just add another hook, similar to EditFilterMergedContent, but more generic,
and call it in EditEntity (and perhaps also in EditPage!). If we want a spam
filter extension to work with non-text content, it will have to implement that
new hook.
What's the best option, do you think?
There's another closely related problem, btw: showing captchas. How can that be
implemented at all for API based, atomic edits? Would the API return a special
error, which includes a link to the captcha image as a challange? And then
requires thecaptcha's solution via some special arguments to the module call?
How can an extension controll this? How is this done for the API's action=edit
at present?
thanks,
daniel
Don't we have some sort of policy about an individual merging commits that
he/she uploaded? Because these three changes:
https://gerrit.wikimedia.org/r/36801https://gerrit.wikimedia.org/r/36812https://gerrit.wikimedia.org/r/36813
Were all uploaded and submitted in a matter of minutes by the same person,
and each is a fix for errors in the commit before it. It kind of defeats
the point of having code review in the first place.
*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com
| tylerromeo(a)gmail.com
Fellow Wikimedia Developers,
Matthias Mullie has been working hard to refactor the backend of
mediawiki/extensions/ArticleFeedbackv5 to add proper sharding support.
The original approach that he took was to rely on RDBStore that was
first introduced in Change-Id:
Ic1e38db3d325d52ded6d2596af2b6bd3e9b870fe
https://gerrit.wikimedia.org/r/#/c/16696 by Aaron Schulz.
Asher Feldman, Tim Starling and myself reviewed the new class RDBStore
and determined that it wasn't really the best approach for our current
technical architecture and database environment. Aaron Schulz had a
lot of really good ideas included in RDBStore, but it just seemed like
it wasn't a great fit right now. We decided collectively to abandon
the RDBStore work permanently at this time.
So, we're now left with the need to provide Matthias Mullie with some
direction on what is the best solution for the ArticleFeedbackv5
refactor.
One possible solution would be to create a new database cluster for
this type of data. This cluster would be solely for data that is
similar to Article Feedback's and that has the potential of being
spammy in nature. The MediaWiki database abstraction layer could be
used directly via a call to the wfGetDB() function to retrieve a
Database object. A read limitation with this approach will be
particularly evident when we require a complex join. We will need to
eliminate any cross-shard joins.
The reality is that Database Sharding is a very useful technology, but
like other approaches, there are many factors to consider that ensure
a successful implementation. Further, there are some limitations and
Database Sharding will not work well for every type of application.
So, to this point when we truly implement sharding in the future it
will more than likely be benificial to focus on place in core
mediawiki where it will have the greatest impact, such as the
pagelinks and revision tables.
— Patrick
Second issue of the MediaWiki community metrics monthly report!
We have added a bunch of bug tracking data in order to highlight some of
the QA and testing activities. Hopefully next month we will show
mediawiki.org data to reflect the documentation work.
http://www.mediawiki.org/wiki/Community_metrics/November_2012
The monthly community metrics reports are still under heavy work in
progress. Your feedback and help is welcome!
--
Quim Gil
Technical Contributor Coordinator
Wikimedia Foundation
Hey all,
For a while now we have .jshintrc rules in the repository and are able
to run node-jshint locally.
TL;DR: jshint is now running from Jenkins on mediawiki/core
(joining the linting sequence for php and puppet files).
I cleaned up the last old lint failures in the repo yesterday in
preparation to enable it from Jenkins (like we already do for PHP and
Puppet files). After some quick testing in a sandbox job on Jenkins to
confirm it passes/fails accordingly, this has now been enabled in the
main Jenkins job for mediawiki/core.
Right now only master and REL1_20 pass (REL1_19 and wmf branches do
not, the next wmf branch will however pass).
Therefore is has only been enabled on the master branch for now.
Example success:
* https://gerrit.wikimedia.org/r/#/c/24249/
* https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/7730/console
22:16:41 Running "jshint" task
22:16:48 OK
22:16:48
22:16:48 Done, without errors.
Example failure:
* https://gerrit.wikimedia.org/r/#/c/34433/
* https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/7732/console
22:24:01 Running "jshint" task
22:24:08 >> resources/mediawiki/mediawiki.js: line 5, col 5, Identifier 'bla_bla' is not in camel case.
22:24:08 >> resources/mediawiki/mediawiki.js: line 5, col 12, 'bla_bla' is defined but never used.
22:24:08 >>
22:24:08 >> 2 errors
22:24:08 Warning: Task "jshint" failed.
So if your commit is marked as failure, just like with failures from
phplint, puppetlint or phpunit: Click the link from jenkins-bot and
follow the trail.
-- Timo Tijhof