TL;DR: Getting started with RESTBase is now easier thanks to a new SQLite3
backend
Hello,
The WMF Services team~[0] is proud to announce the initial release of the
SQLite storage back-end~[1] for RESTBase~[2], the service that powers
Wikimedia projects’ public REST API~[3]. Traditionally, RESTBase was
limited to Cassandra persistence, which, while scalable, presented a
weighty and difficult dependency for smaller installations and test
environments. This new module allows RESTBase administrators to replace
Cassandra with a lightweight, file-based persistence based on SQLite.
This is an exciting milestone in RESTBase’s development. We expect the
option to run RESTBase atop an SQLite database to significantly widen the
target audience, allowing users with small installations and limited
resources to take advantage of this valuable service. We are also hoping to
attract more contributors by having a simpler, less resource-intensive
alternative to Cassandra; SQLite is now the storage module of choice in
MediaWiki-Vagrant~[6] installations with the restbase role enabled.
Contributing is just a vagrant up away!
The abstract storage interface~[7] we developed for RESTBase allows users
to seamlessly switch between storage back-ends (Cassandra or SQLite). Due
to data volumes we are facing in WMF, we are not using the SQLite back-end
in production. However, we are testing each RESTBase changeset against both
Cassandra and SQLite to ensure they are behaving as expected. Additionally,
a 93% code coverage makes us confident to recommend it for production use.
Bringing the SQLite storage module to a usable state represents a
significant amount of team effort, but nevertheless we’d like to extend a
special thank you to Petr Pchelko for his tireless efforts in driving this
project home. Thanks Petr!
This is only the first step in providing better third-party and developer
support. One of the next challenges will be basing Parsoid~[8] on
service-runner~[9,10], a general-purpose library for running and managing
Node.JS services. Amongst other things, it enables bundling and running
multiple services together. Packaging and distributing Parsoid and RESTBase
together will simplify their installation, configuration and administration
in small-setup and development environments.
Best,
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
[0] https://www.mediawiki.org/wiki/Wikimedia_Services
[1] https://github.com/wikimedia/restbase-mod-table-sqlite
[2] https://www.mediawiki.org/wiki/RESTBase
[3] https://en.wikipedia.org/api/rest_v1/?doc (for a complete list of
supported domains go to https://rest.wikimedia.org/)
[4] https://phabricator.wikimedia.org/tag/restbase/
[5] https://phabricator.wikimedia.org/tag/restbase-api/
[6] https://www.mediawiki.org/wiki/MediaWiki-Vagrant
[7] https://github.com/wikimedia/restbase-mod-table-spec
[8] https://www.mediawiki.org/wiki/Parsoid
[9] https://github.com/wikimedia/service-runner
[10] https://phabricator.wikimedia.org/T90668
== Apologies for Crossposting ==
Below you will find useful information and links about the program, our
satellite events and registration opportunities for
SEMANTiCS 2015 -- 11th International Conference on Semantic Systems
15-17 September 2015, Vienna / Austria
http://www.semantics.cc/ // #semantics2015 // #semanticsconf
*— Conference Scope ---*
The annual SEMANTiCS conference is the meeting place for professionals
who make semantic computing work, and understand its benefits and know
its limitations. Every year, SEMANTiCS attracts information managers,
IT-architects, software engineers, and researchers, from organisations
ranging from NPOs, universities, public administrations to the largest
companies in the world.
*— Conference Program ---*
The 2015 edition offers a rich program consisting of 5 keynotes, 24
scientific presentations, 30 industry talks, 38 posters and various
workshops and social events. For details please visit our program page:
http://www.semantics.cc/programme
*— Keynote Speakers --- *
·Jeanne Holms -- Chief Knowledge Architect at NASA
·Peter Mika -- Director Semantic Lab at Yahoo
·Oscar Orcho -- Associate Professor for Artifical Intelligence,
Universidad de Madrid
·Klaus Tochtermann -- Leibniz Information Center for Economics
·Sam Rehman -- CTO at EPAM Systems
*— Workshops & Satellite Events ---*
*MeetUp: SMART DATA SOLUTIONS*
An outlook into the world of data centric business, technologies and
innovations: Data is everywhere these days and efficient data management
is THE key factor for success in nearly all industries in the meantime.
McKinsey lists data as a key factor for production alongside with labor
and capital in one of their recent reports. Furthermore, Data is
produced in huge amounts by sensors, social networks or mobile devices
and the amounts of data available worldwide grow exponentially…
Place: Haus der Ingenieure, Eschenbachgasse 9, 1010 Wien
Date: 15.09.2015, Entrance: 18:30pm CET ; Start event: 19:30 - 22:30pm CET
*2nd International Workshop on Geospatial Linked Data*
In recent years, Semantic Web technologies have strengthened their
position in the areas of data and knowledge management. Standards for
organizing and querying semantic information, such as RDF(S) and SPARQL
are adopted by large academic communities, while corporate vendors adopt
semantic technologies to organize, expose, exchange and retrieve their
datasets as Linked Data.
Chairs: Alejandra Garcia-Rojas M. (Ontos AG), Robert Isele or Rene
Pietzsch, Jens Lehmann (AKSW, University of Leipzig)
Date: 15th of September 2015, 09.00 to 13.00 CEST
*The SEMANTIC EXPERIENCE Coffee & Cocktail LOUNGE*
Sponsored Event: Enjoy semantic technology inspiration in a relaxed
atmosphere. Refresh yourself with Viennese coffee or a cocktail and get
in touch with semantic industry experts.
Date: 15th of September 2015, 15.45 to 18.00 CEST
Room: LC Club Room
*Linked Data in Industry 4.0*
The overall goal of the workshop is to identify challenges and
limitations from the manufacturing engineering industry in the scope of
the mentioned design principles, and bring them together with experts
and solution approaches from the linked data community in the scope of
Industry 4.0.
Chairs: Thomas Moser (FH St. Pölten), Stefan Hupe (IoT Austria)
Date: 15th of September 2015, 14.00 to 17.00 CEST
*European Data Economy Workshop - Focus Data Value Chain & Big and Open
Data*
This workshop is to overview the state of the art in Europe regarding
Big and Open Data initiatives and its impact in the Europan economy and
benefits for theEuropean society. Representatives from the Big Data
Value Association, the annual European Data Forum and data related
projects will participate during the first session of the workshop.
Furthermore it gives information about the Austrian Big Data Study
carried out in 2014 by AIT and IDC
Chairs: Nelia Lasierra (STI Innsbruck), Martin Kaltenböck (Semantic Web
Company)
Date: 15th of September 2015, 09.00 to 13.00 CEST
*1st Workshop on Data Science: Methods, Technology and Applications
(DSci15)*
This workshop is meant as an opportunity to bring together researchers
and practitioners interested in data science to present their ideas and
discuss the most important scientific, technical and socio-economical
challenges of this emerging field.
Chairs: Bernhard Haslhofer (AIT - Austrian Institute of Technology),
Elena Simperl (Univ. Southampton), Rainer Stütz (AIT - Austrian
Institute of Technology), Ingo Feinerer (FH Wiener Neustadt)
Date: 15th of September 2015, 09.00 to 17.00 CEST
*Workshop on Linked Data Strategies - Commercialisation of Interlinked Data*
In this workshop, we will give several demos and concrete examples of
how Linked Data can be used by enterprises in various industries. The
workshop aims to give users and providers of Linked Data valuable
methods and best practices at hand, which help them to make profound
decisions in their Linked Data projects.
Chairs: Christian Dirschl (Wolters Kluwer), Andreas Blumauer (Semantic
Web Company), Tassilo Pellegrini (FH St. Pölten)
Date: 15th of September 2015, 14.00 to 15.30 CEST
*Hackathon on "The power of Linked Data in Agriculture and Food Safety"*
“Data+Need=Hack”, this is the idea of a hackathon that brings together
like-minded people to develop, in a short time frame, novel solutions to
problems around the theme “Agriculture and Food Safety”.
Chairs: Christian Blaschke (Semantic Web Company, Vienna), Stasinos
Konstantopoulos (Institute of Informatics & Telecommunications of the
NCSR Demokritos, Athens)
Date: 18th of September 2015, 10.00 to 16.00 CEST
*— Registration ---*
To register, please go to:
http://www.semantics.cc/registration
We are looking forward to meet you at SEMANTiCS 2015!
In the version history of an image (or any attached file in MediaWiki), the
page displays "Date/Time" with a link to that version. The timestamp
displayed is the upload timestamp of that version. If you look closely, you
can see that the real filename includes a different timestamp. This turns
out to be the timestamp of when that file was superseded by a subsequent
version.
I have looked in the database tables and can see that in the oldimage
table, each row has an "oi_archive_name" with the timestamp of when that
version was superseded and an "oi_timestamp" of when that version was
actually uploaded.
Is there a reason to name the old versions of the files with the
superseding timestamp instead of the upload timestamp? It seems to me that
the timestamp of when that version was uploaded is more relevant.
Daren
--
__________________
http://enterprisemediawiki.orghttp://mixcloud.com/darenwelshhttp://www.beatportfolio.com
FYI
---------- Forwarded message ----------
From: Yuvi Panda <yuvipanda(a)gmail.com>
Date: Thu, Aug 20, 2015 at 4:45 PM
Subject: Evaluation of opt-in alternatives to Grid Engine on Tool Labs
('clustering solution')
To: Wikimedia Labs <labs-l(a)lists.wikimedia.org>
Hello!
One of the experimental goals for this quarter for labs' team is to
make available an new, more modern gridengine alternative just for
webservices on Tool Labs. We are starting to evaluate which systems we
should use - this is tracked at
https://phabricator.wikimedia.org/T106475. The (still incomplete)
evaluation spreadsheet is at
https://docs.google.com/spreadsheets/d/1YkVsd8Y5wBn9fvwVQmp9Sf8K9DZCqmyJ-ew…
We are evaluationg Kubernetes and Mesos/Marathon as alternatives.
GridEngine is also being scored along with them, so if we find that it
wins we'll abandon the experiment and continue using GridEngine only.
Do provide comments on the phab ticket and follow along :)
== WHY? ==
Because our current webservices setup is a pile of hacks on top of
GridEngine, causing... interesting problems due to the complexity
involved.
GridEngine doesn't support a lot of features that people using more
modern systems take for granted - like containerization + isolation, a
nice API, continuous deploy, autoscaling.... Having an alternative to
play with allows us to build newer, better featured and more robust
systems.
== OMG, WILL I HAVE TO CHANGE THE WAY MY CODE WORKS NOW?! ==
Nope. For now this is just an alternative - when completed, you will
be able to run your webservice on this cluster by something like:
webservice --provider=<something> start
And nothing else will change - everything else should still be
compatible. We'll eventually provide more features on the new setup,
but there will be no forced migration of any sort. If the alternative
becomes the default at any point, we'll ensure that things that worked
before continue working without any extra effort from the Tool
Author's part.
--
Yuvi Panda T
http://yuvi.in/blog
--
Yuvi Panda T
http://yuvi.in/blog
I'm elevating this task of mine to RFC status:
https://phabricator.wikimedia.org/T89331
Running the output of the MediaWiki parser through HTML Tidy always
seemed like a nasty hack. The effects on wikitext syntax are arbitrary
and change from version to version. When we upgrade our Linux
distribution, we sometimes see changes in the HTML generated by given
wikitext, which is not ideal.
Parsoid took a different approach. After token-level transformations,
tokens are fed into the HTML 5 parse algorithm, a complex but
well-specified algorithm which generates a DOM tree from quirky input
text.
http://www.w3.org/TR/html5/syntax.html
We can get nearly the same effect in MediaWiki by replacing the Tidy
transformation stage with an HTML 5 parse followed by serialization of
the DOM back to HTML. This would stabilize wikitext syntax and resolve
several important syntax differences compared to Parsoid.
However:
* I have not been able to find any PHP implementation of this
algorithm. Masterminds and Ressio do not even attempt it. Electrolinux
attempts it but does not implement the error recovery parts that are
of interest to us.
* Writing our own would be difficult.
* Even if we did write it, it would probably be too slow.
So the question is: what language should we use? Since this is the
standard programmer troll question, please bring popcorn.
The best implementation of this algorithm is in Java: the validator.nu
parser is maintained by Mozilla, and has source translation to C++,
which is used by Mozilla and could potentially be used for an HHVM
extension.
There is also a Rust port (also written by Mozilla), and notable
implementations in JavaScript and Python.
For WMF, a Java service would be quite easily done, and I have
prototyped it already. An HHVM extension might also be possible. A
non-service fallback for small installations might be Node.js or a
compiled binary from Rust or C++.
-- Tim Starling
In the next RFC meeting, we will discuss the following RFC:
* Multi-Content Revisions
<https://phabricator.wikimedia.org/T107595>
The meeting will be on the IRC channel #wikimedia-office on
chat.freenode.net at the following time:
* UTC: Wednesday 21:00
* US PDT: Wednesday 14:00
* Europe CEST: Wednesday 23:00
* Australia AEST: Thursday 07:00
-- Tim Starling