Hi Pierce!You're right that the wp10 model is based on Warncke-Wang's work. We've made some extensions to the feature set and changed the modeling strategy since then though.If you want to see what features a model uses in a basic form, you can run a query to ORES with the "?features" parameter. E.g. https://ores.wikimedia.or{g/v3/scores/enwiki/779679551/ returns:wp10/?features
"enwiki": {
"models": {
"wp10": {
"version": "0.5.0"
}
},
"scores": {
"779679551": {
"wp10": {
"features": {
"feature.english.stemmed.revision.stems_length": 11621,
"feature.enwiki.main_article_templates": 0,
"feature.enwiki.revision.category_links": 11,
"feature.enwiki.revision.cite_templates": 11,
"feature.enwiki.revision.cn_templates": 2,
"feature.enwiki.revision.image_links": 1,
"feature.enwiki.revision.infobox_templates": 1,
"feature.wikitext.revision.chars": 19241,
"feature.wikitext.revision.content_chars": 12961,
"feature.wikitext.revision.external_links": 24,
"feature.wikitext.revision.headings_by_level(2)": 11,
"feature.wikitext.revision.headings_by_level(3)": 0,
"feature.wikitext.revision.ref_tags": 23,
"feature.wikitext.revision.templates": 30,
"feature.wikitext.revision.wikilinks": 66
},
"score": {
"prediction": "C",
"probability": {
"B": 0.13747039004562459,
"C": 0.8331703672870666,
"FA": 0.007180710735104919,
"GA": 0.005799232485106759,
"Start": 0.015370319423127086,
"Stub": 0.0010089800239699196
}
}
}
}
}
}
}This does not represent the *exact* feature vector. The exact feature vector involved controlling features (e.g. content_chars / chars or log(ref_tags)). The best way to get the exact feature set is to install the appropriate library (wikiclass, editquality, etc.) from https://github.com/wiki-ai/ and ask for the feature set. E.g.$ pythonPython 3.5.1+ (default, Mar 30 2016, 22:46:26)[GCC 5.3.1 20160330] on linuxType "help", "copyright", "credits" or "license" for more information.>>> from editquality.feature_lists.enwiki import damaging >>> for f in damaging:... print(f)...feature.revision.page.is_articleish feature.revision.page.is_mainspace feature.revision.page.is_draftspace feature.log((wikitext.revision.parent.chars + 1)) feature.log((len(<datasource.tokenized(datasource.revision.p arent.text)>) + 1)) feature.log((len(<datasource.wikitext.revision.parent.words> ) + 1)) feature.log((len(<datasource.wikitext.revision.parent.upperc ase_words>) + 1)) feature.log((wikitext.revision.parent.headings + 1)) feature.log((wikitext.revision.parent.wikilinks + 1)) feature.log((wikitext.revision.parent.external_links + 1)) feature.log((wikitext.revision.parent.templates + 1)) feature.log((wikitext.revision.parent.ref_tags + 1)) feature.revision.parent.chars_per_word feature.revision.parent.words_per_token feature.revision.parent.uppercase_words_per_word feature.revision.parent.markups_per_token feature.wikitext.revision.diff.markup_delta_sum feature.wikitext.revision.diff.markup_delta_increase feature.wikitext.revision.diff.markup_delta_decrease feature.wikitext.revision.diff.markup_prop_delta_sum feature.wikitext.revision.diff.markup_prop_delta_increase feature.wikitext.revision.diff.markup_prop_delta_decrease feature.wikitext.revision.diff.number_delta_sum feature.wikitext.revision.diff.number_delta_increase feature.wikitext.revision.diff.number_delta_decrease feature.wikitext.revision.diff.number_prop_delta_sum feature.wikitext.revision.diff.number_prop_delta_increase feature.wikitext.revision.diff.number_prop_delta_decrease feature.wikitext.revision.diff.uppercase_word_delta_sum feature.wikitext.revision.diff.uppercase_word_delta_increase feature.wikitext.revision.diff.uppercase_word_delta_decrease feature.wikitext.revision.diff.uppercase_word_prop_delta_sum feature.wikitext.revision.diff.uppercase_word_prop_delta_ increase feature.wikitext.revision.diff.uppercase_word_prop_delta_ decrease feature.revision.diff.chars_change feature.revision.diff.tokens_change feature.revision.diff.words_change feature.revision.diff.words_change feature.revision.diff.headings_change feature.revision.diff.external_links_change feature.revision.diff.wikilinks_change feature.revision.diff.templates_change feature.revision.diff.ref_tags_change feature.revision.diff.longest_new_token feature.revision.diff.longest_new_repeated_char feature.revision.user.is_botfeature.revision.user.has_advanced_rights feature.revision.user.is_adminfeature.revision.user.is_trusted feature.revision.user.is_patroller feature.revision.user.is_curator feature.revision.user.is_anonfeature.log((temporal.revision.user.seconds_since_registrati on + 1)) feature.revision.comment.suggests_section_edit feature.revision.comment.has_link feature.english.badwords.revision.diff.match_delta_sum feature.english.badwords.revision.diff.match_delta_increase feature.english.badwords.revision.diff.match_delta_decrease feature.english.badwords.revision.diff.match_prop_delta_sum feature.english.badwords.revision.diff.match_prop_delta_ increase feature.english.badwords.revision.diff.match_prop_delta_ decrease feature.english.informals.revision.diff.match_delta_sum feature.english.informals.revision.diff.match_delta_increase feature.english.informals.revision.diff.match_delta_decrease feature.english.informals.revision.diff.match_prop_delta_sum feature.english.informals.revision.diff.match_prop_delta_ increase feature.english.informals.revision.diff.match_prop_delta_ decrease feature.english.dictionary.revision.diff.dict_word_delta_sum feature.english.dictionary.revision.diff.dict_word_delta_inc rease feature.english.dictionary.revision.diff.dict_word_delta_dec rease feature.english.dictionary.revision.diff.dict_word_prop_delt a_sum feature.english.dictionary.revision.diff.dict_word_prop_delt a_increase feature.english.dictionary.revision.diff.dict_word_prop_delt a_decrease feature.english.dictionary.revision.diff.non_dict_word_delta _sum feature.english.dictionary.revision.diff.non_dict_word_delta _increase feature.english.dictionary.revision.diff.non_dict_word_delta _decrease feature.english.dictionary.revision.diff.non_dict_word_prop_ delta_sum feature.english.dictionary.revision.diff.non_dict_word_prop_ delta_increase feature.english.dictionary.revision.diff.non_dict_word_prop_ delta_decrease You can ask ORES to tell you details about each of the models such as test statistics and modeling algorithm. E.g. https://ores.wikimedia.org/v3/scores/enwiki/?models=wp1 returns:0&model_info=type|params|versi on
{
"enwiki": {
"models": {
"wp10": {"type": "GradientBoosting",
"version": "0.5.0",
"params": {
"balanced_sample": true,
"balanced_sample_weight": false,
"center": true,
"init": null,
"learning_rate": 0.01,
"loss": "deviance",
"max_depth": 7,
"max_features": "log2",
"max_leaf_nodes": null,
"min_samples_leaf": 1,
"min_samples_split": 2,
"min_weight_fraction_leaf": 0.0,
"n_estimators": 700,
"presort": "auto",
"random_state": null,
"scale": true,
"subsample": 1.0,
"verbose": 0,
"warm_start": false
}
}
}
}
}For information about how features are extracted, see http://pythonhosted.org/revscoring . For the full process by which models are built, see the makefile for the appropriate repository. E.g. https://github.com/wiki-ai/wikiclass/blob/master/ Makefile#L111
-AaronOn Fri, Jun 9, 2017 at 10:50 AM, Pierce Edmiston <pedmiston@wisc.edu> wrote:______________________________Hello,I'm wondering how to find out the details of edit and article quality models, specifically the reverted and damaging edit quality models, and the wp10 article quality model. I'm wondering what algorithm is being used and what features are being trained on.I believe the wp10 model may have originated with Warncke-Wang, Cosley, & Riedl (2013) Tell me more: An actionable quality model for Wikipedia, in which case I can figure out the model specification and features from the paper. But I'm not sure if the details of the edit quality models have been similarly summarized in any papers or in any online documentation.Thanks for your help!Pierce_________________
AI mailing list
AI@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ai
_______________________________________________
AI mailing list
AI@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ai