Hi all,
Some of you might heard of it, some of you probably know it or even
regularly use it - huggle is a super fast diff browser for MediaWiki
intended for dealing with vandalism on Wikimedia projects (but it can
be used for any installation), written in C++.
It is being used on a number of wikimedia projects, and it helps to
revert hundreds of instances of vandalism every day, you can check
this chart for some overview:
http://tools.wmflabs.org/huggle/toolstat/daily.php
It may not look like that, but huggle is very effective, compared to
twinkle or other tools which are being used by thousands of users,
huggle is typically used only by less than ten users daily on english
wikipedia, but reverts almost same amount of vandalism as twinkle and
such (some days even more). That makes its users hundreds times more
effective than users of other tools.
Huggle is currently being developed primarily only by me, Adam
Shorland (addshore) and few other devs:
https://github.com/huggle/huggle3-qt-lx/graphs/contributors
If anyone of you were interested in helping us, wanted to contribute
or just find out more about that, me and Adam will be at Hackaton so
don't hesitate to contact us there! Huggle3 is not just all about C++
it contains embedded python interpreter and can be extended with
Python and C++/C extensions.
Hi,
Is anyone against the idea of taking the current HEAD and making a
release candidate of it? There is a number of unresolved bugs, but in
my opinion, none of them should be considered blocking for anything.
Thanks
Hi,
I think there are some features that huggle 2 had and we don't need
anymore. One of them is a user list - restricting open source by user
list [1], while anyone can change the source to bypass it, makes no
sense. This feature itself is of no use, and there is little point in
implementing it to huggle 3.
If someone would really want to have it, create a pluging for it?
So, I think this and maybe some other features should not be included
in huggle 3.
Is there anything else, that you think is of no use in this moment,
that huggle 2 can do, but we don't really need to have in later
versions?
== References ==
[1] - en.wikipedia.org/wiki/WP:Huggle/Users
... you all suck :P
Because nobody did a single commit in past few months except for me
and islandmonkey. I created a list of stuff that needs to be done at
bugzilla https://bugzilla.wikimedia.org/show_bug.cgi?id=34892
so please if you want to contribute and you don't know what to work
on, that list contains lot of stuff that needs to be done
Hi,
I would like to sum changes I have made to far into huggle behaviour
of edit processing.
Since now (huggle 3.0.x) edit aren't sorted by type, but score, edit
score is a number, if higher the edit gets to top of queue, if low it
gets to bottom, very low scores may get auto-trimmed out of queue.
=== How does it work ===
Every edit, just as in old huggle, has 2 kind of processing,
preprocessing and postprocessing. Unlinke in huggle2, edits are put
into queue after postprocessing, which makes the initial load little
bit slower, but in overal everything works a lot better.
==== Preprocessing ====
Processing is lightweight, cpu consuming which doesn't require
anything to be downloaded from anywhere, part of processing is initial
filtering of edits we are surely not interested in (whitelisted users
etc)
==== Postprocessing ====
Slow process during which lot of data is get from wiki. It takes
several seconds for all processes to finish and parse the diffs, after
postprocessing the edit is scored (the diff is stored in memory so you
never need to download the page once it's in queue).
After this all the edit is queued. Thanks to score system 2 important
changes happened:
People who perform something what appears to you as vandalism are
flagged as vandal only for you, in internal huggle cache, not to
everyone as before where only difference was gathered from warning
templates the user received - this is good to prevent good editors
from being blocked - it happened many times that someone accidentaly
reverted good edit, which resulted in target user being flagged with 1
symbol in queue, so that in every next edit people were assuming it is
vandal and reverted even something what just little bit seemed to be a
wrong edit. This way people who performed vandalism are still scored
more than people who didn't, but on other hand, people who for example
insert some score words, may get even higher scored even if they
didn't do anything wrong. So the people who were already warned may
still get lower on queue than new vandals.
Thanks to score system good edits can be also filtered out. In huggle
2 most of users were used to combination of space or R (space or Q) -
next or revert. In huggle 3 we have 1 more button:
* next (space) - for edits you aren't sure about
* revert (R or Q) - for edits you are totally sure about that it's vandalism
* good edit (G) - for edits that aren't vandalism
For every good edit the user get scored -200, every registered user
with score less than -800 get whitelisted.
Are you good in swearing? WE NEED YOU
Huggle 3 comes with vandalism-prediction as it is precaching the diffs
even before they are enqueued including their contents. Each edit has
so called "score" which is a numerical value that if higher, the edit
is more likely a vandalism.
If you want to help us improve this feature, it is necessary to define
a "score words" list for every wiki where huggle is about to be used,
for example on English wiki.
Each list has following syntax:
(see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Config&diff=573…)
score-words(score):
list of words separated by comma, can contain newlines but comma
must be present
example
score-words(200):
these, are, some, words, which, presence, of, increases, the, score,
each, word, by, 200,
So, if you know english better than me, which you likely do, go ahead
and improve the configuration file there, no worries, huggle's config
parser is very syntax-error proof.
If you have any other suggestion how to improve huggle's prediction,
go ahead and tell us!
Hi,
Our primary repository was such a mess that I started to get confused
what is where. For that reason I moved the sources of huggle3 QT LX to
clean repository, see https://github.com/huggle/huggle3-qt-lx
So, if you ever wanted to work on qt, you will not have to checkout
all that huge mess but just this c++ repository.
Hi,
I know these e-mail are becoming quite boring :-) but we are still
looking for some devs who can help us reincarnate huggle project.
Huggle is an antivandalism high-speed utility that allows wikipedians
to mass review huge number of articles in a short time.
We are working on huggle 3 for much longer than we should have, and
now we used our patented technology called
"lets-write-it-from-scratch-once-again" ™
So, we are kind of writing huggle 3 LX from scratch again. This time
it's gonna be different though.
After careful consideration of what best possible programming language
could be (please don't bring this topic on again, it's not funny,
especially for pythonists, shush pls) I decided to step back to C++
which original huggle 3 was written in, but this time using QT
framework instead of wxWidgets. This has several cons and also some
pros :-)
== Cons ==
- It will have about 10% of original features (without plugins)
- It will take some more time to finish it, because we have to start again
== Pros ==
- It will be natively cross-platform
- It will be faster than huggle 2
- QT has nice IDE which allow us to sketch the interface pretty
quickly and easily (unlike wx)
- It comes with a nice browser based on WebKit which is improvement,
given that current huggle is IE based
- It would eventually run even on tablets (or mobile phones - why the
heck people requested that?)
So, despite this is (I think) 4th time we are starting to write huggle
3 from scratch, it's hopefully last attempt. Also, please note the
"QT-LX" in name. LX means "light extendable" which means that huggle 3
is supposed to be very lightweight (just basic functions to provide
review-revert technologies) + set of api's and hooks so that it can be
easily extended with loadable plugins (we might even implement python
interpretor for pythonist so that it would be even easier to make
plugins for it). That will make sure it won't take too long for us to
finish it, and it also make it possible for huggle 3 to have most of
the original features as well in future (through the plugins).
TL/DR:
Are you a c++ programmer? WE NEED YOU! If you know something about
c++, open https://github.com/huggle/huggle/tree/master/huggleQT/ fork,
work and push please.
If you mean it more seriously, join #huggle on freenode as well, and
join huggle mailing list.
Thank you