Hi all! I'm the main developer of the ProveIt gadget
<https://commons.wikimedia.org/wiki/Help:Gadget-ProveIt>, a reference
manager for Wikipedia. The code is tracked via Phabricator, reviewed via
Gerrit, and served to the various Wikipedias from Commons. Each wiki has a
unique initialization code
<https://en.wikipedia.org/wiki/MediaWiki:Gadget-ProveIt.js> that sets some
local config and then requests the main code from Commons (JavaScript, CSS
and JSON). Every time I merge a new change via Gerrit, I need to manually
update the Commons pages so that the Wikipedias have the latest code.
This is sub-optimal. Ideally, the Wikipedias should request the code
directly from Diffusion, so that when developers merge new changes, they
are immediately available (and we don't need interface rights or manual
work in Commons). However, when I go to the Diffusion of the gadget
<https://phabricator.wikimedia.org/diffusion/1884/>, click on the main
proveit.js file, and click on "View Raw File", I get to a URL like the
following:
https://phab.wmfusercontent.org/file/data/iapd7kogqo5x2naywwlq/PHID-FILE-dk…
The URL of the raw file changes with every click and doesn't have the
proper MIME type header, so it's useless for serving the code.
I think it would be very useful, for my case and others, to have a stable
URL that serves the latest code with the proper MIME type heading. In other
words, a CDN, which may or may not be integrated with Diffusion.
Thanks!
Sorry, I hit enter early by accident.
I realized the dump file for wikidata is no longer in the format wikidatawiki-2017XXXX-pages-articles.xml.bz2 anymore.
Now, it is split in to different dumps:
https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-md5sums…
I am wondering when did this happen and the rationale behind it. Will it be permanent or we will switch back to the original format soon ?
Thank you,
Best regards,
Trung
On 4/5/17, 9:57 PM, "Wikitech-l on behalf of Trung Dinh" <wikitech-l-bounces(a)lists.wikimedia.org on behalf of trd(a)fb.com> wrote:
Hi everyone,
I realized the dump file for wikidata is no longer in the format wikidatawiki-2017XXXX-pages-articles.xml.bz2 anymore.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hello!
The Analytics team would like to announce that we have migrated the
reportcard to a new domain:
https://analytics.wikimedia.org/dashboards/reportcard/#
pageviews-july-2015-now
The migrated reportcard includes both legacy and current pageview data,
daily unique devices and new editors data. Pageview and devices data is
updated daily but editor data is still updated ad-hoc.
The team is working at this time on revamping the way we compute edit data
and we hope to be able to provide monthly updates for the main edit metrics
this quarter. Some of those will be visible in the reportcard but the new
wikistats will have more detailed reports.
You can follow the new wikistats project here: https://phabricator.
wikimedia.org/T130256
Thanks,
Nuria
The Parsing team at the Wikimedia Foundation that develops the Parsoid
service is deprecating support for node 0.1x. Parsoid is the service
that powers VisualEditor, Content Translation, and Flow. If you don't
run a MediaWiki install that uses VisualEditor, then this announcement
does not affect you.
Node 0.10 has reached end of life on October 31st, 2016 [1] and node
0.12 is scheduled to reach end of life December 31st, 2016 [1].
Yesterday, we released a 0.6.1 debian package [2] and a 0.6.1 npm
version of Parsoid [3]. This will be the last release that will have
node 0.1x support. We'll continue to provide any necessary critical bug
fixes and security fixes for the 0.6.1 release till March 31st 2017 and
will be completely dropping support for all node versions before node
v4.x starting April 2017.
If you are running a Parsoid service on your wiki and are still using
node 0.1x, please upgrade your node version by April 2017. The Wikimedia
cluster runs node v4.6 right now and will soon be upgraded to node v6.x
[4]. Parsoid has been tested with node 0.1x, node v4.x and node v6.x and
works with all these versions. However, we are dropping support for node
0.1x right away from the master branch of Parsoid. Going forward, the
Parsoid codebase will adopt ES6 features available in node v4.x and
higher which aren't supported in node 0.1x and will constitute a
breaking change.
Subramanya Sastry (Subbu),
Technical Lead and Manager,
Parsing Team,
Wikimedia Foundation.
[1] Node.js Long Term Support schedule @ https://github.com/nodejs/LTS
[2] https://www.mediawiki.org/wiki/Parsoid/Releases
[3] https://www.npmjs.com/package/parsoid
[4] https://phabricator.wikimedia.org/T149331
tl;dr: Search continues to expand functionality by displaying more
information on the search results page
Ever started searching for something on Wikipedia and wondered—*really*, is
that all that there is? Does it feel like you’re somehow playing hide and
seek with all the knowledge that’s out there? And...wouldn’t it be great to
see articles or categories that are similar to your search query and maybe
some related images or links to other languages in which to read that
article? Or, maybe you just want to read and contribute to projects other
than Wikipedia but need a jump start with a few short summaries from sister
projects.
The Discovery Search team has been testing out some really cool new
features that will enable some fun and fascinating clicking—down the rabbit
hole of Wikipedia.[1] But first, let’s recap what we’ve been doing recently.
We've been doing tons of work creating, updating, and finessing the search
back end to enhance search queries. There have been many complex things
that have happened, things like: adding ascii-folding and stemming,
detecting when a visitor might be typing in a language that is different
than the Wikipedia that they are on, switching from tf-idf to BM25,
dropping trailing question marks, and updating to ElasticSearch version 5.
[2][3][4][5][6][7] Whew!
We have much more planned in the coming months—machine learning with
‘learning to rank’, investigating and deploying new language analyzers,
and, after exhaustive analysis, removing quotes within queries by
default.[8][9][10][11] We’ll also be working closely with the new
Structured Data team in their brand new work on Commons.[12][13]
We also want to improve the part that our readers and editors interface
with: the search results page! We started brainstorming during the late
summer of 2016 on what we could do to make search results better—to easily
find interesting, relevant content and to create a more intuitive viewing
experience.[14] We designed and refined numerous ideas on how to improve
the search results page and received lots of good feedback from the
community.[15]
Empowered by the feedback, we began testing starting with a display of
results from the Wikimedia sister projects next to the regular search
results.[16] The idea for this test was to enable discovery into other
projects—projects that our visitors might not have known about—by
displaying interesting results in small snippets. The sidebar display of
the sister projects borrows from a similar feature in use on the Italian,
Catalan and French Wikipedias. We've run two A/B tests on the sister
project search results with detailed analysis and, after a bit of final
touches to the code, we will release the new functionality into production
on all Wikipedias near the end of April 2017.
Our next A/B test will be to add additional information and related results
for each search query. This will be in the form of an ‘explore similar’
link that, when someone interacts with the link, an expanded display will
appear with related pages, categories and links to the article in other
languages—all of which might lead to further knowledge discovery.[17] We
know that not every search query will return exactly what folks were
looking for, but we feel that adding links to similar, but related
information would be helpful and, possibly, super interesting!
We also plan on doing a few more A/B tests in the coming year:
* Test a new display that will show the pronunciation of a word with its
definition and part of speech—all from existing data in Wiktionary.
Initially this will be in English only.
* Test placing a small image (from the article) next to each search result
that is displayed on the page.
* Test an additional future using a new auto completion metadata display in
the search box that is located on the top right of most pages in Wikipedia,
similar to what happens on the Wikipedia.org portal.[18]
For the more technical minded, there is a way to test out these new
features in your own browser. To display the sister project search results,
it will require a bit of URL manipulation; but for the explore similar and
Wiktionary widget, you can modify your common.js file to test an early
version of the features. Detailed information is available on
MediaWiki.org.[19]
Once the testing, analysis and feedback cycle is done for each new feature,
we’d like to slowly implement them into production on all Wikipedias
throughout the rest of the year. We’re really hoping that these
enhancements to how search works will further the usefulness of search and
make our readers and editors more productive.
Cheers from the Discovery Search team!
[1] https://xkcd.com/214/
[2] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/R
e-Ordering_Stemming_and_Ascii-Folding_on_English_Wikipedia
[3] https://blog.wikimedia.org/2016/07/27/wikipedia-language-search/
[4] https://en.wikipedia.org/wiki/Tf%E2%80%93idf
[5] https://en.wikipedia.org/wiki/Okapi_BM25
[6] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Drop
ping_Final_Question_Marks_in_the_Top_10_Wikipedias
[7] https://phabricator.wikimedia.org/T154501
[8] https://en.wikipedia.org/wiki/Learning_to_rank
[9] https://phabricator.wikimedia.org/T154511
[10] https://commons.wikimedia.org/wiki/File:From_Zero_to_
Hero_-_Anticipating_Zero_Results_From_Query_Features,_Ignoring_Content.pdf
[11] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/
Quotes_and_Questions
[12] https://commons.wikimedia.org/wiki/Commons:Structured_data
[13] https://blog.wikimedia.org/2017/01/09/sloan-foundation-structured-data/
[14] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements
[15] https://www.mediawiki.org/wiki/Talk:Cross-wiki_Search_
Result_Improvements
[16] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result
_Improvements/Testing#A.2FB_test:_Add_cross-wiki_search_
results_in_a_right_hand_sidebar
[17] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result
_Improvements/Testing#A.2FB_test:_Add_.27explore_similar.
27_pages_and_categories_for_search_results
[18] https://www.wikipedia.org/
[19] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result
_Improvements/self-guided_testing
--
deb tankersley
irc: debt
Product Manager, Discovery
Wikimedia Foundation
Hello Wikimedia developers!
I've just deployed the latest batch of Phabricator updates. Normally I
wouldn't write an announcement for routine upgrades, however, this update
is different. This week's update includes notable improvements to
Phabricator's global search functionality which I have been working on for
the past week.
*Bugs Fixed:*
Several minor bugs have been resolved, most notably, longstanding bug which
prevented viewing results numbered 100+ has been resolved [1].
*Better Search Results:*
There have been many small improvements to search query parsing,
performance & reliability in the past few weeks. A few of these are
launching today but the most visible change is a significantly improved
search results page with document body highlighting[2]. This feature shows
a snippet of documents with the matching search terms highlighted in bold.
Previously, Phabricator only displayed the title of each result with
matching terms highlighted only if they appeared within the title. With
today's release, the matching terms are highlighted from the body of the
document as well and this takes advantage of an Elasticsearch feature[3] to
accurately highlight the terms which actually lead to the result being
included in the search result.
*Welcome to The Future:*
Some of you might be thinking that this is just too much. Such unnecessary
features are just extravagant and wasteful. To that I say: why should we
let advanced technologies like cascading style sheets sit idle, neglected.
We can do better than a 1970s search experience. We deserve to have our
search terms rendered as stylized hypertext with bold, beautiful letters
and contextually accurate emphasis. We deserve modern conveniences and I
don't feel the least bit guilty about that. It's the 90s[4], after all.
*Upstream Status:*
This new functionality has been submitted upstream for inclusion in
Phabricator, however, as of today it remains in differential pending code
review. The feature is likely to evolve further before finally making it
into the upstream. It is a fairly large patch which adds a new "Engine
Extension" infrastructure to phabricator.
This foundation can be used to add various enhancements to the search
results views (e.g. customized views for each object type.) This also lays
the foundation for resolving https://secure.phabricator.com/T8646, although
that bug doesn't really affect Wikimedia's developers because we have
disabled Phabricator's integrated wiki.
1. https://phabricator.wikimedia.org/T92960
2. https://phabricator.wikimedia.org/T162284
3.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-requ…
4. https://vimeo.com/29455771
That's all for now, I hope you enjoy these improvements to Phabricator
search experience!
Mukunda Modell
Release Engineer & Phabricator Admin
Wikimedia Foundation, Inc.