Wikitech-l March 2013

wikitech-l@lists.wikimedia.org

151 participants
169 discussions

Some Sort of Notice for Breaking Changes

by Tyler Romeo

Is there any way that extension developers can get some sort of notice for breaking changes, e.g., https://gerrit.wikimedia.org/r/50138? Luckily my extension's JobQueue implementation hasn't been merged yet, but if it had I would have no idea that it had been broken by the core. *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo(a)gmail.com

11 years, 1 month

Re: [Wikitech-l] Free Disk Space needed for import

by wiki

Sorry, I forgot to mention that I have in mind the English wikipedia dump. wiki writes: > Hello. > > I'm a newbie who wants to start playing with the xml dumps. I've found > instructions here and there on how to import these. I'd like to seek > guidance though as to how much free disk space one is required to have for > the MySql import to succeed? i.e. after I have already installed LAMP + > Mediawiki, and already allocated space for the bzip file and the converted > import statements file, roughly how much more space is needed? > > Thank you! > > - sam - > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

11 years, 1 month

Free Disk Space needed for import

by wiki

Hello. I'm a newbie who wants to start playing with the xml dumps. I've found instructions here and there on how to import these. I'd like to seek guidance though as to how much free disk space one is required to have for the MySql import to succeed? i.e. after I have already installed LAMP + Mediawiki, and already allocated space for the bzip file and the converted import statements file, roughly how much more space is needed? Thank you! - sam -

11 years, 1 month

Indexing non-text content in LuceneSearch

by Daniel Kinzler

Hi all! I would like to ask for you input on the question how non-wikitext content can be indexed by LuceneSearch. Background is the fact that full text search (Special:Search) is nearly useless on wikidata.org at the moment, see <https://bugzilla.wikimedia.org/show_bug.cgi?id=42234>. The reason for the problem appears to be that when rebuilding a Lucene index from scratch, using an XML dump of wikidata.org, the raw JSON structure used by Wikibase gets indexed. The indexer is blind, it just takes whatever "text" it finds in the dump. Indexing JSON does not work at all for fulltext search, especially not when non-ascii characters are represented as unicode escape sequences. Inside MediaWiki, in PHP, this work like this: * wikidata.org (or rather, the Wikibase extension) stores non-text content in wiki pages, using a ContentHandler that manages a JSON structure. * Wikibase's EntityContent class implements Content::getTextForSearchIndex() so it returns the labels and aliases of an entity. Data items thus get indexed by their labels and aliases. * getTextForSearchIndex() is used by the default MySQL search to build an index. It's also (ab)used by things that can only operate on flat text, like the AbuseFilter extension. * The LuceneSearch index gets updated live using the OAI extension, which in turn knows to use getTextForSearchIndex() to get the text for indexing. So, for anything indexed live, this works, but for rebuilding the search index from a dump, it doesn't - because the Java indexer knows nothing about content types, and has no interface for an extension to register additional content types. To improve this, I can think of a few options: 1) create a specialized XML dump that contains the text generated by getTextForSearchIndex() instead of actual page content. However, that only works if the dump is created using the PHP dumper. How are the regular dumps currently generated on WMF infrastructure? Also, would be be feasible to make an extra dump just for LuceneSearch (at least for wikidata.org)? 2) We could re-implement the ContentHandler facility in Java, and require extensions that define their own content types to provide a Java based handler in addition to the PHP one. That seems like a pretty massive undertaking of dubious value. But it would allow maximum control over what is indexed how. 3) The indexer code (without plugins) should not know about Wikibase, but it may have hard coded knowledge about JSON. It could have a special indexing mode for JSON, in which the structure is deserialized and traversed, and any values are added to the index (while the keys used in the structure would be ignored). We may still be indexing useless interna from the JSON, but at least there would be a lot fewer false negatives. I personally would prefer 1) if dumps are created with PHP, and 3) otherwise. 2) looks nice, but is hard to keep the Java and the PHP version from diverging. So, how would you fix this? thanks daniel

11 years, 1 month

Deployment highlights - 2013-03-08

by Greg Grossmeier

Hello! This is your friendly weekly deployments highlight email. For the week of March 11th (next week), here are some things to be aware of: * Scribunto (Lua) will be available on all wikis as of Wed the 13th * HTTPS for all logged in users This is planned to happen next week, but the exact deployment window is still to be determined. I will inform wikitech-l and -ambassadors when it is scheduled. See this bug for more info: https://bugzilla.wikimedia.org/show_bug.cgi?id=39380 Best, Greg -- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |

11 years, 1 month

JobQueue changes (Re: Some Sort of Notice for Breaking Changes)

by Rob Lanphier

On Fri, Mar 8, 2013 at 8:35 AM, Tyler Romeo <tylerromeo(a)gmail.com> wrote: > Is there any way that extension developers can get some sort of notice for > breaking changes, e.g., https://gerrit.wikimedia.org/r/50138? Luckily my > extension's JobQueue implementation hasn't been merged yet, but if it had I > would have no idea that it had been broken by the core. Hi Tyler, Sorry to hear that there might be a problem here. It's been a pet peeve of mine that we seem to be a little too eager to break backwards compatibility in places where it may not be necessary. That said, let's try to avoid a meta-process discussion before we collectively understand the example you are bringing up, and focus on the JobQueue. As near as I can tell from a quick skim of the changeset you're referencing, Aaron's changes here are purely additive. Am I reading this wrong? Is there some other changeset that changes/removes existing interfaces that you meant to reference instead? Rob

11 years, 1 month

Indexing structures for Wikidata

by Denny Vrandečić

As you probably know, the search in Wikidata sucks big time. Until we have created a proper Solr-based search and deployed on that infrastructure, we would like to implement and set up a reasonable stopgap solution. The simplest and most obvious signal for sorting the items would be to 1) make a prefix search 2) weight all results by the number of Wikipedias it links to This should usually provide the item you are looking for. Currently, the search order is random. Good luck with finding items like California, Wellington, or Berlin. Now, what I want to ask is, what would be the appropriate index structure for that table. The data is saved in the wb_terms table, which would need to be extended by a "weight" field. There is already a suggestion (based on discussions between Tim and Daniel K if I understood correctly) to change the wb_terms table index structure (see here < https://bugzilla.wikimedia.org/show_bug.cgi?id=45529> ), but since we are changing the index structure anyway it would be great to get it right this time. Anyone who can jump in? (Looking especially at Asher and Tim) Any help would be appreciated. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

11 years, 1 month

Seemingly proprietary Javascript

by Alexander Berntsen

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 GNU LibreJS blocks several Javascript sources around Wikipedia. I was sent to this list by Kirk Billund. My issue as well as Kirk's replies follows. I hope you are okay to read it in this form. 03/05/2013 11:16 - Alexander Berntsen wrote: >>>> GNU LibreJs[0] reports that several of the Javascript sources >>>> embedded by different parts of Wikipedia are proprietary[1]. >>>> Is this a conscious anti-social choice[2], or have you merely >>>> not set up your source files to properly show their >>>> licence[3]? >>>> >>>> If the latter is the case, please remedy this. If the former >>>> is the case... please remedy this. It is extremely >>>> important.[4] In any event I hope to get a reply, as the >>>> distinction is important to me. >>>> >>>> [0] https://www.gnu.org/software/librejs/ [1] >>>> https://www.gnu.org/philosophy/categories.html#ProprietarySoftware >>>> >>>> [2] https://www.gnu.org/philosophy/javascript-trap.html >>>> [3] >>>> https://www.gnu.org/software/librejs/free-your-javascript.html >>>> >>>> [4] https://www.gnu.org/philosophy/why-free.html On 05/03/13 11:38, Wikipedia information team wrote: >>> All of the MediaWiki[1] code base that Wikipedia is licensed >>> under the GPL[2], including the JavaScript. Also included in >>> that is the freely-licensed (MIT) jQuery[3] library. However >>> some code is actually written by the invidual users, like >>> English Wikipedia's custom javascript[4], which is licensed as >>> CC-BY-SA-3.0 since all content pages are automatically licensed >>> that way[5]. >>> >>> Additionally, our JavaScript is minified[6] so adding comments >>> is not possible. If you have further concerns, you can either >>> respond to me, email the general Wikimedia technical list[7] or >>> a general Mediawiki help list[8]. >>> >>> >>> [1] https://www.mediawiki.org/wiki/MediaWiki [2] >>> https://www.mediawiki.org/wiki/License [3] >>> https://en.wikipedia.org/wiki/JQuery [4] >>> https://en.wikipedia.org/wiki/MediaWiki:Common.js [5] >>> https://en.wikipedia.org/wiki/Wikipedia:Copyrights [6] >>> https://www.mediawiki.org/wiki/ResourceLoader [7] >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l [8] >>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l 03/05/2013 11:16 - Alexander Berntsen wrote: >> Is it not possible to insert the licence as part of your build >> process? What I do with compiled or minified Javascript is to >> build everything, and then insert the licence to all files using >> BASH. On 05/03/13 12:41, Wikipedia information team wrote: > Unfortunately I don't fully understand how the minification process > works, so it would probably be better if you asked your question on > our technical mailing list > <https://lists.wikimedia.org/mailman/listinfo/wikitech-l> and > someone there would be able to give you a more specific answer. - -- Alexander alexander(a)plaimi.net http://plaimi.net/~alexander -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlE13WcACgkQRtClrXBQc7VRwAEAhJLHhlpssJIze/B9IJ1un9kT /ze8DysWeQHBpoGeKCQBALbfVL+yLy74dAEmncPrT3FAPB4WPjUDfOg8A7Vo/pXm =peks -----END PGP SIGNATURE-----

11 years, 1 month

Redis with SSDs

by Tyler Romeo

Interesting article I found about Redis and its poor performance with SSDs as a swap medium. For whoever might be interested. http://antirez.com/news/52 *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo(a)gmail.com

11 years, 1 month

Query performance - run code faster, merge code faster :-)

by Sumana Harihareswara

If you want your code merged, you need to keep your database queries efficient. How can you tell if a query is inefficient? How do you write efficient queries, and avoid inefficient ones? We have some resources around: Roan Kattouw's https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tu… -- slides at https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf Asher Feldman's https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv -- slides at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf More hints: http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005075.html When you need to ask for a performance review, you can check out https://www.mediawiki.org/wiki/Developers/Maintainers#Other_Areas_of_Focus which suggests Tim Starling, Asher Feldman, and Ori Livneh. I also BOLDly suggest Nischay Nahata, who worked on Semantic MediaWiki's performance for his GSoC project in 2012. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

11 years, 1 month

← Newer
1
...
10
11
12
13
14
15
16
17
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2013