I do a lot of work on Commons, while using a setting that all pages I edit go to my watchlist. In the past I was able to control the size of my watchlist by occasionally purging it by editing raw watchlist. However when Cat-a-lot gadget become available I initially did not configure it properly and all files I edit with it also ended up on my watchlist. Pretty soon my watchlist become too big to edit. I am now at 36k pages and I definitely need to purge it.
What can be done about it? I asked few people and asked on some talk forums but nobody knew the answer. I would be OK with deleting everything.
Thanks
Jarek T.
User:jarekt
The Language Engineering team is in the process of redesigning the Translate
extension <http://www.mediawiki.org/wiki/Extension:Translate>.
The Translate extension turns MediaWiki into a localisation platform, and
it is used in Meta-Wiki, mediawiki.org and a few other Wikimedia wikis, as
well as other opensource projects, to make them available in multiple
languages.
We are planning to do a walkthrough for the latest revision of the designs
tomorrow.
Since the extension is used by many different projects and users, we want
to make this meeting open to get feedback from anyone interested.
So we welcome you to join us in the discussion:
*When.* Wednesday 27 of February at 8:30 PST (San Francisco), 16:30 UTC
(UK), 17:30 CET (most of Europe), and 22:00 IST (India).
*What.* During the meeting we'll discuss information from our design
specification<http://commons.wikimedia.org/wiki/File:Translate-workflow-spec.pdf>
and the current implementation available at
translatewiki.net<http://translatewiki.net/w/i.php?title=Special:Translate&tux=1>.
Feel free to take a look to the docs or try the new UI before the event.
*How* to participate. The event will be broadcasted using Google Hangouts
On Air, so that it can be seen live or accessed later. We'll share the URL
as the event approaches. The #mediawiki-i18n IRC channel will be used to
get questions from the audience.
If you are interested in entering the hangout for a face-to-face
participation, you can ping me and I'll send an invite as long as there are
free seats remaining.
Pau
--
Pau Giner
Interaction Designer
Wikimedia Foundation
Matthew Flaschen wrote:
> No, I was just talking about defining the indices (obviously, the query
> planner is out of luck if you don't define them properly). E.g. in the
> PostgeSQL tables.sql file:
>
> CREATE INDEX archive_name_title_timestamp ON archive
> (ar_namespace,ar_title,ar_timestamp);
>
> Even though often this syntax is the same cross-db, since the whole file
> is db-specific (except MySQL and SQLite share), people have the option
> of db-specific index variants.
Ah, I see what you mean. Yes, some of that is linked to the capability
of the database itself (e.g. can it do a bitmap index scan). So perhaps
a suggested index in the abstract schema could be tied to such attributes,
and simply not created if the db cannot / should not. Or simply tie it to
a specific db type if absolutely needed. I can't recall seeing a case where
there would be a *choice* of indexes (e.g. if you can support this index,
use it, otherwise, do this one), but that's a SMOP once we encounter
that case I suppose. :)
--
Greg Sabino Mullane greg(a)endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
Daniel Friesen let us know:
> For reference this is the RFC that was discussed in my thread on the
> subject.
>
> https://www.mediawiki.org/wiki/Requests_for_comment/Abstract_table_definiti…
>
> Should probably dig up some gmane/archive links for both this and that
> discussion and add them to the RFC page.
Excellent, thank you for that link. That mirrors a lot of my thinking.
I'm going to take a fresh look at my data type issues with that page
as a guideline.
--
Greg Sabino Mullane greg(a)endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
Luke Welling asked:
> Specifically, do we use MySQL specific syntax that is more efficient (but
> breaks elsewhere) or do we attempt to write lowest common denominator SQL
> that will run more places, but not run as efficiently on our primary target?
Neither: we use the already-existing methods, and have those examine
db-specific attributes to modify their behavior. The SQL itself stays
pretty basic: I don't know that I've ever seen SQL (in core anyway),
that varied enough between backends to require a differentiation. If
we do encounter such a thing, it's probably best to pick the simplest
variation (if possible), or the MySQL one (if not), and have attributes
determine which variant is used (e.g. if ( $dbr->left_join_expensive() )
Matt Flaschen wrote:
> However, part of the optimization is choosing indices, which as you
> noted is db-specific (part of tables.sql)
Not sure what you mean - index hints? Yeah, that could be a little tricky,
but luckily the Postgres part, at any rate, doesn't have to worry about
those (as our planner is smart enough to pick the best index itself ;).
I can't think of a clean way to abstract that anyway, as just needing
an index hint for MySQL does mean the same is needed on Oracle, and
vice-versa. So you'd already have a very database specific argument
for each query anyway, such that you would never have to worry if other
dbs had the same index.
--
Greg Sabino Mullane greg(a)endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
Hi,
I apologize but it seems that badoo for some silly reason, scanned my
e-mail account and spammed all contacts I have there with some kind of
invitation or whatever. I have no idea why it happened, but I will try
to investigate it.
Please discard or ignore that message. Thank you
Petr Bena left a message for you
Its sender and content will be shown only to you and you can delete it at any time. You can instantly reply to it, using the message exchange system. To find out what was written to you, just follow this link:
http://eu1.badoo.com/petr-bena/in/wgmLcBxJuIs/?lang_id=3&g=57&m=21&mid=512f…
If clicking the links in this message does not work,
copy and paste them into the address bar of your browser.
This email is a part of delivering a message sent by Petr Bena on the system. If you received this email by mistake, please just ignore it. After a short time the message will be removed from the system.
Have fun!
The Badoo Team
You have received this email from Badoo Trading Limited (postal address below).
http://eu1.badoo.com/impersonation.phtml?lang_id=3&email=wikitech-l%40lists…
Badoo Trading Limited is a limited company registered in England and Wales
under CRN 7540255 with its registered office at Media Village, 131 - 151 Great Titchfield Street, London, W1W 5BB.
What is the number of the new namespace we got from Scribunto?
How can we localize the name of it? In huwiki it should be "Modul" rather
than "Module". We found only core namespaces in translatewiki.
Thank you!
--
Bináris
We're getting back into the swing of weekly tech talks, and will have a
brown bag tomorrow at 12:30 Pacific Time / 20:30 UTC. The following
folks will present:
* Chris McMahon and Zeljko Filipin on automated browser testing
* Chad Horohoe on what's new in Gerrit
* Brad Jorsch on converting templates to Lua
* Sumana Harihareswara on LevelUp - how you can learn, teach, & get
stuff done
* And more?
https://www.mediawiki.org/wiki/Meetings/2013-02-28
Hope you can come in person (if you're in San Francisco) or watch via
YouTube live streaming. Check out #wikimedia-dev when the meeting
starts and we'll announce the stream URL.
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation
Dear All,
Michael Shavlovky and I have been working on blame maps (authorship
detection) for the various Wikipedias.
We have code in the WikiMedia repository that has been written with the
goal to obtain a production system capable of attributing all content (not
just a research demo). Here are some pointers:
- Code <https://gerrit.wikimedia.org/r/#/q/blamemaps,n,z>
- Description of the blame maps mediawiki
extension<https://docs.google.com/document/d/15MEyu5tDZ3mhj_i1fDNFqNxWexK-B3BtbYKJlYE…>
- Detailed description of the underlying algorithm, with performance
evaluation<https://www.soe.ucsc.edu/research/technical-reports/ucsc-soe-12-21/download>
- Demo <http://blamemaps.wmflabs.org/mw/index.php/Main_Page>
These are also all available from
https://sites.google.com/a/ucsc.edu/luca/the-wikipedia-authorship-project
In brief, for each page we store metadata that summarizes the entire text
evolution of the page; this metadata, compressed, is about three times the
size of a typical revision. Each time a new revision is made, we read this
metadata, attribute every word of the revision, store updated metadata, and
store authorship data for the revision. The process takes 1-2 seconds
depending on the average revision size (most of the time is actually
devoted to deserializing and reserializing the metadata). Comparing with
all previous revisions takes care of things like content that is deleted
and then later re-inserted, and other various attacks that might happen
once authorship is displayed. I should also add that these algorithms are
independent from the ones in WikiTrust, and should be much better.
We have NOT developed a GUI for this: our plan was just to provide a data
API that gives information on authorship of each word. There are many ways
to display the information, from page summaries of authorship to detailed
word-by-word information, and we thought that surely others would want to
play with the visualization aspect.
I am writing this message as we hope this might be of interest, and as we
would be quite happy to find people willing to collaborate. Is anybody
interested in developing a GUI for it and talk to us about what API we
should have for retrieving this authorship information? Is there anybody
interested in helping to move the code to production-ready stage?
I also would like to mention that Fabian Floeck has developed another very
interesting algorithm for attributing the content, reported in
http://wikipedia-academy.de/2012/w/images/2/24/23_Paper_Fabian_Fl%C3%B6ck_A…
Fabian and I are now starting to collaborate: we want to compare the
algorithms, and work together to obtain something we are happy with, and
that can run in production.
Indeed, I think a reasonable first goal would be to:
- Define a data API
- Define some coarse requirements of the system
- Have a look at above results / algorithms / implementation and advise
us.
I am sure that the algorithm details can be fine tuned and changed to no
end in a collaborative effort, once the first version is up and running.
The problem is of putting together a bit of effort to get to that first
running version.
Luca