Hi everybody,
(With apologies for cross-posting...)
You may have seen the recent communication [1 https://www.mediawiki.org/wiki/Wikimedia_Engineering/June_2017_changes] about the product and tech tune-up which went live the week of June 5th, 2017. In that communication, we promised an update on the future of Discovery projects and we will talk about those in this email.
The Discovery team structure has now changed, but the new teams will still work together to complete the goals as listed in the draft annual plan.[2] A summary of their anticipated work, as we finalize these changes, is below. We plan on doing a check-in at the end of the calendar year to see how our goals are progressing with the new smaller and separated team structure.
Here is a list of the various projects under the Discovery umbrella, along with the goals that they will be working on:
Search Backend
Improve search capabilities:
-
Implement ‘learning to rank’ [3] and other advanced machine learning methodologies -
Improve support for languages using new analyzers -
Maintain and expand power user search functionality
Search Frontend
Improve user interface of the search results page with new functionality:
-
Implement explore similar [4] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/Testing#A.2FB_test:_Add_.27explore_similar.27_pages_and_categories_for_search_results -
Update the completion suggester box [5] https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester -
Investigate the usage of a Wiktionary widget for English Wikipedia [6]
Wikidata Query Service
Expand and scale:
-
Improve ability to support power features on-wiki for readers -
Improve full text search functionality -
Implement SPARQL federation support
Portal
Create and implement automated language statistics and translation updates for Wikipedia.org
Analysis
Provide in-depth analytics support:
-
Perform experimental design, data collection, and data analysis -
Perform ad-hoc analyses of Discovery-domain data -
Maintain and augment the Discovery Dashboards,[7] which allow the teams to track their KPIs and other metrics
Maps
Map support:
-
Implement new map style -
Increase frequency of OSM data replication -
As needed, assist with individual language Wikipedia’s implementation of mapframe [8] https://www.mediawiki.org/wiki/Maps/how_to:_embedded_maps
Note: There is a possibility that we can do more with maps in the coming year; we are currently evaluating strategic, partnership, and resourcing options.
Structured Data on Commons
Extend structured data search on Commons, as part of the structured data grant [9] via:
-
Research and implement advanced search capabilities -
Implement new elements, filters, relationships
Graphs and Tabular Data on Commons
We will be re-evaluating this functionality against other Commons initiatives such as the structured data grant. As with maps, we will provide updates when we know more.
We are still working out all the details with the new team structure and there might be some turbulence; let us know if there are any concerns and we will do our best to answer them.
Best regards,
Deborah Tankersley, Product Manager, Discovery
Erika Bjune, Engineering Manager, Search Platform
Jon Katz, Reading Product Lead
Toby Negrin, Interim Vice President of Product
Victoria Coleman, Chief Technology Officer
[1] https://www.mediawiki.org/wiki/Wikimedia_Engineering/June_2017_changes
[2] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/D...
[3] https://en.wikipedia.org/wiki/Learning_to_rank
[4] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/Testing...
[5] https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester
[6] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/Testing...
[7] https://discovery.wmflabs.org/
[8] https://www.mediawiki.org/wiki/Maps/how_to:_embedded_maps
[9] https://commons.wikimedia.org/wiki/Commons:Structured_data
On Wed, Jun 14, 2017 at 5:25 AM, Deborah Tankersley dtankersley@wikimedia.org wrote:
The Discovery team structure has now changed, but the new teams will still work together to complete the goals as listed in the draft annual plan.[2] A summary of their anticipated work, as we finalize these changes, is below. We plan on doing a check-in at the end of the calendar year to see how our goals are progressing with the new smaller and separated team structure.
Here is a list of the various projects under the Discovery umbrella, along with the goals that they will be working on:
Search Backend
Improve search capabilities:
Implement ‘learning to rank’ [3] and other advanced machine learning methodologies ... [3] https://en.wikipedia.org/wiki/Learning_to_rank
How will the Foundation's approach to machine learning of search results ranking guard against overfitting?
For example, if most searches on "rent" do not pertain to "rent seeking", then how will the machine learning approach to search results for "rent" guard against never presenting any results on "rent seeking"?
James Salsman wrote:
How will the Foundation's approach to machine learning of search
results ranking guard against overfitting?
Overfitting, for those who aren't familiar with the term, describes the situation where a machine learning model inappropriately learns very specific details about its training set that don't generalize to the real world. From the point of view of training, the model seems to be getting better and better, while real-world performance is actually decreasing. As a somewhat silly example, a model could learn that queries that have exactly 38 words in them are 100% about baseball—because there is only one example of a query in the training set that is 38 words long, and it is about baseball. For more on overfitting, see Wikipedia.[1]
We employ the usual safeguards against overfitting. Certain parameters that control how a specific type of model is built can discourage overfitting. For example, not allowing a decision inside the model to be made on too little data—so rather than 1 or 2 examples to base a decision on, the model can be told it needs to see 5, or 50, or 500.
We also have separate training and testing data sets. So we build a model on one set of data, then evaluate the model on another set. The estimate of model performance from the training set will always be at least a bit optimistic, but the testing set—which is large enough to be representative and which does not overlap with the training set—gives a more realistic estimate. We choose the model that performs the best on the testing set. Overfitted models will do worse on the testing set, and we won't use them.
We have other methods of validating our models as well.
We have a set of machines and software that we collectively call Relevance Forge (a.k.a. RelForge) that we can use to run large sets of queries against different versions of the same index. We can compare the before and after results, both automatically and manually. RelForge lets us easily gauge the *impact* of a change. For example, a 1% net improvement could come from making 1% of queries a bit better, or from making 49% a bit worse and 50% a bit better. So, we can easily see whether 1% or 99% of results change. If we see a 2% improvement but a 99% impact, something weird is happening, and we'd investigate more deeply.
We also have many definitions of "results change" that we can evaluate: #1 result changes, top 3 results change (ordered or unordered), number of results changes, number of queries getting zero results changes. And for each of these we can manually inspect a random selection of affected queries to decide whether the results are generally better or not.
We also run A/B tests, where we let a small sample of users get the proposed change, while a similar number get the standard results. We do statistical analyses on user engagement with results and various other click metrics that let us compare the control and experimental conditions. For more on how we test search changes in general, see Testing Search on mediawiki.org.[2]
In both of these cases—RelForge testing and A/B testing in production—overfitted models would perform poorly, and that would become apparent.
For example, if most searches on "rent" do not pertain to "rent
seeking", then how will the machine learning approach to search results for "rent" guard against never presenting any results on "rent seeking"?
Your wording has left me a bit confused, and I'm not sure whether your concern is (a) that a query of "rent" should never return "rent seeking", and so the machine learning model should never present it, or (b) that we should guard against building a model that *never* presents results on "rent seeking" for a query of "rent". I'll briefly address each.
Case (a): "rent" should *never* return "rent seeking"
It's not clear to me that returning "rent seeking" for a query of "rent" is necessarily a case of overfitting per se, but in general the click models that we use would take note that users who search for "rent", say, click on the musical 70% of the time and the disambiguation page 29% of the time. Those would be the "good" results and the model would prioritize moving them to the top of the list.
*Never* presenting results on "rent seeking" would be an error. The word is present in the article, and in the title, so it should be somewhere in the results. Moving it up or down the results list is a question of ranking, which is what the machine learning model is trying to figure out.
Case (b): "rent" should not be *prevented* from returning "rent seeking"
Our click data shows that about 80% of clicks on search results are on one of the first two results, and more than 90% are on the top 10. Our click models for scoring the order of results reflect that. All of the value then, from the machine learning model's point of view, comes from getting the top 3 to 5 results in the best possible order. There's not a lot of value in pushing down any particular result much farther than that. For a single word query like "rent", title matches are the best. There are only 138 results for intitle:rent, vs over 44K for just rent—however, the first page of results for both is the same.
We are interested in use cases other than searchers who are looking for a particular article or particular information, though that tends to predominate. Editors might want to find all the articles with a particular word (e.g., a misspelling) and no result would be excluded by the machine learning model, just possibly ranked lower.
Hope that helps, —Trey
[1] https://en.wikipedia.org/wiki/Overfitting [2] https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Testing_Search
Trey Jones Software Engineer, Discovery Wikimedia Foundation *(via Deb Tankersley's email address as Trey's original email got moderated)*
On Tue, Jun 13, 2017 at 3:43 PM, James Salsman jsalsman@gmail.com wrote:
On Wed, Jun 14, 2017 at 5:25 AM, Deborah Tankersley dtankersley@wikimedia.org wrote:
The Discovery team structure has now changed, but the new teams will
still
work together to complete the goals as listed in the draft annual
plan.[2]
A summary of their anticipated work, as we finalize these changes, is below. We plan on doing a check-in at the end of the calendar year to see how our goals are progressing with the new smaller and separated team structure.
Here is a list of the various projects under the Discovery umbrella,
along
with the goals that they will be working on:
Search Backend
Improve search capabilities:
Implement ‘learning to rank’ [3] and other advanced machine learning methodologies ... [3] https://en.wikipedia.org/wiki/Learning_to_rank
How will the Foundation's approach to machine learning of search results ranking guard against overfitting?
For example, if most searches on "rent" do not pertain to "rent seeking", then how will the machine learning approach to search results for "rent" guard against never presenting any results on "rent seeking"?
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hi Trey,
Thanks for your very detailed reply. I have a followup question.
How do you determine search intents? For example, if you see someone searching for "rents" how do you know whether they are looking for economic or property rents when evaluating the quality of the search results? If you're training machine learning models from "5, 50, or 500," example you need to have labels on each of those examples indicating whether the results are good or not.
Do you interview searchers after the fact? Ask people to search and record the terms they search on? What kind of infrastructure do you have to make sure you're getting correct intents robust enough to score the example results? Maybe surveys occurring on some small fraction of results asking users to describe in greater detail exactly what they were trying to find?
Best regards, Jim
On Thu, Jun 15, 2017 at 10:40 PM, Deborah Tankersley dtankersley@wikimedia.org wrote:
James Salsman wrote:
How will the Foundation's approach to machine learning of search
results ranking guard against overfitting?
Overfitting, for those who aren't familiar with the term, describes the situation where a machine learning model inappropriately learns very specific details about its training set that don't generalize to the real world. From the point of view of training, the model seems to be getting better and better, while real-world performance is actually decreasing. As a somewhat silly example, a model could learn that queries that have exactly 38 words in them are 100% about baseball—because there is only one example of a query in the training set that is 38 words long, and it is about baseball. For more on overfitting, see Wikipedia.[1]
We employ the usual safeguards against overfitting. Certain parameters that control how a specific type of model is built can discourage overfitting. For example, not allowing a decision inside the model to be made on too little data—so rather than 1 or 2 examples to base a decision on, the model can be told it needs to see 5, or 50, or 500.
We also have separate training and testing data sets. So we build a model on one set of data, then evaluate the model on another set. The estimate of model performance from the training set will always be at least a bit optimistic, but the testing set—which is large enough to be representative and which does not overlap with the training set—gives a more realistic estimate. We choose the model that performs the best on the testing set. Overfitted models will do worse on the testing set, and we won't use them.
We have other methods of validating our models as well.
We have a set of machines and software that we collectively call Relevance Forge (a.k.a. RelForge) that we can use to run large sets of queries against different versions of the same index. We can compare the before and after results, both automatically and manually. RelForge lets us easily gauge the *impact* of a change. For example, a 1% net improvement could come from making 1% of queries a bit better, or from making 49% a bit worse and 50% a bit better. So, we can easily see whether 1% or 99% of results change. If we see a 2% improvement but a 99% impact, something weird is happening, and we'd investigate more deeply.
We also have many definitions of "results change" that we can evaluate: #1 result changes, top 3 results change (ordered or unordered), number of results changes, number of queries getting zero results changes. And for each of these we can manually inspect a random selection of affected queries to decide whether the results are generally better or not.
We also run A/B tests, where we let a small sample of users get the proposed change, while a similar number get the standard results. We do statistical analyses on user engagement with results and various other click metrics that let us compare the control and experimental conditions. For more on how we test search changes in general, see Testing Search on mediawiki.org.[2]
In both of these cases—RelForge testing and A/B testing in production—overfitted models would perform poorly, and that would become apparent.
For example, if most searches on "rent" do not pertain to "rent
seeking", then how will the machine learning approach to search results for "rent" guard against never presenting any results on "rent seeking"?
Your wording has left me a bit confused, and I'm not sure whether your concern is (a) that a query of "rent" should never return "rent seeking", and so the machine learning model should never present it, or (b) that we should guard against building a model that *never* presents results on "rent seeking" for a query of "rent". I'll briefly address each.
Case (a): "rent" should *never* return "rent seeking"
It's not clear to me that returning "rent seeking" for a query of "rent" is necessarily a case of overfitting per se, but in general the click models that we use would take note that users who search for "rent", say, click on the musical 70% of the time and the disambiguation page 29% of the time. Those would be the "good" results and the model would prioritize moving them to the top of the list.
*Never* presenting results on "rent seeking" would be an error. The word is present in the article, and in the title, so it should be somewhere in the results. Moving it up or down the results list is a question of ranking, which is what the machine learning model is trying to figure out.
Case (b): "rent" should not be *prevented* from returning "rent seeking"
Our click data shows that about 80% of clicks on search results are on one of the first two results, and more than 90% are on the top 10. Our click models for scoring the order of results reflect that. All of the value then, from the machine learning model's point of view, comes from getting the top 3 to 5 results in the best possible order. There's not a lot of value in pushing down any particular result much farther than that. For a single word query like "rent", title matches are the best. There are only 138 results for intitle:rent, vs over 44K for just rent—however, the first page of results for both is the same.
We are interested in use cases other than searchers who are looking for a particular article or particular information, though that tends to predominate. Editors might want to find all the articles with a particular word (e.g., a misspelling) and no result would be excluded by the machine learning model, just possibly ranked lower.
Hope that helps, —Trey
[1] https://en.wikipedia.org/wiki/Overfitting [2] https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Testing_Search
Trey Jones Software Engineer, Discovery Wikimedia Foundation *(via Deb Tankersley's email address as Trey's original email got moderated)*
On Tue, Jun 13, 2017 at 3:43 PM, James Salsman jsalsman@gmail.com wrote:
On Wed, Jun 14, 2017 at 5:25 AM, Deborah Tankersley dtankersley@wikimedia.org wrote:
The Discovery team structure has now changed, but the new teams will
still
work together to complete the goals as listed in the draft annual
plan.[2]
A summary of their anticipated work, as we finalize these changes, is below. We plan on doing a check-in at the end of the calendar year to see how our goals are progressing with the new smaller and separated team structure.
Here is a list of the various projects under the Discovery umbrella,
along
with the goals that they will be working on:
Search Backend
Improve search capabilities:
Implement ‘learning to rank’ [3] and other advanced machine learning methodologies ... [3] https://en.wikipedia.org/wiki/Learning_to_rank
How will the Foundation's approach to machine learning of search results ranking guard against overfitting?
For example, if most searches on "rent" do not pertain to "rent seeking", then how will the machine learning approach to search results for "rent" guard against never presenting any results on "rent seeking"?
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hi Jim,
Determining the intent of a particular search is indeed very difficult, and is not really feasible to even attempt it at the scale needed for machine learning (unless you have an immense budget like some for-profit search engine companies).
For our machine learning training data, we use click models suggested by academic research. These models allow us to score the results for a given query based on which results users actually clicked on (and didn't click on). The results aren't perfect, but they are good, and they can be automatically generated for millions of training examples taken from real user queries and clicks.
These scores serve as a proxy for user intent, without needing to actually understand it. As an example, if 35% of people click on the first result for a particular query, and 60% on the second result, the click scores would indicate that the order should be swapped, even without knowing the intent of the query or the content of the results.
Swapping the top two results isn't really a big win, but the hope is that by identifying features of the query (e.g., number of words), of the articles (e.g., popularity), and of the relationship between them (e.g., number of words in common between the query and the article title) we will learn something that is more generally true. If we do, then we may move a result for a different query from, say, position 8 (where few people ever click) to position 3 (where there is at least a chance of a click). Iterating the whole process will allow us to detect that the result newly in position 3 is actually a really popular result so we should adjust the model to boost it even more, or that it's not that great and we should adjust the model to put something better in the #3 slot. Of course, all of the "adjusting" of the model happens automatically during training.
Through this iterative process of modeling, training, evaluation, and deployment, we are attempting to take into account the relationship between the user's intent and the search results—inferred from the user's behavior—to improve the search results.
Cheers, —Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Fri, Jun 16, 2017 at 10:26 AM, James Salsman jsalsman@gmail.com wrote:
Hi Trey,
Thanks for your very detailed reply. I have a followup question.
How do you determine search intents? For example, if you see someone searching for "rents" how do you know whether they are looking for economic or property rents when evaluating the quality of the search results? If you're training machine learning models from "5, 50, or 500," example you need to have labels on each of those examples indicating whether the results are good or not.
Do you interview searchers after the fact? Ask people to search and record the terms they search on? What kind of infrastructure do you have to make sure you're getting correct intents robust enough to score the example results? Maybe surveys occurring on some small fraction of results asking users to describe in greater detail exactly what they were trying to find?
Best regards, Jim
On Thu, Jun 15, 2017 at 10:40 PM, Deborah Tankersley dtankersley@wikimedia.org wrote:
James Salsman wrote:
How will the Foundation's approach to machine learning of search
results ranking guard against overfitting?
Overfitting, for those who aren't familiar with the term, describes the situation where a machine learning model inappropriately learns very specific details about its training set that don't generalize to the real world. From the point of view of training, the model seems to be getting better and better, while real-world performance is actually decreasing.
As
a somewhat silly example, a model could learn that queries that have exactly 38 words in them are 100% about baseball—because there is only
one
example of a query in the training set that is 38 words long, and it is about baseball. For more on overfitting, see Wikipedia.[1]
We employ the usual safeguards against overfitting. Certain parameters
that
control how a specific type of model is built can discourage overfitting. For example, not allowing a decision inside the model to be made on too little data—so rather than 1 or 2 examples to base a decision on, the
model
can be told it needs to see 5, or 50, or 500.
We also have separate training and testing data sets. So we build a model on one set of data, then evaluate the model on another set. The estimate
of
model performance from the training set will always be at least a bit optimistic, but the testing set—which is large enough to be
representative
and which does not overlap with the training set—gives a more realistic estimate. We choose the model that performs the best on the testing set. Overfitted models will do worse on the testing set, and we won't use
them.
We have other methods of validating our models as well.
We have a set of machines and software that we collectively call
Relevance
Forge (a.k.a. RelForge) that we can use to run large sets of queries against different versions of the same index. We can compare the before
and
after results, both automatically and manually. RelForge lets us easily gauge the *impact* of a change. For example, a 1% net improvement could come from making 1% of queries a bit better, or from making 49% a bit
worse
and 50% a bit better. So, we can easily see whether 1% or 99% of results change. If we see a 2% improvement but a 99% impact, something weird is happening, and we'd investigate more deeply.
We also have many definitions of "results change" that we can evaluate:
#1
result changes, top 3 results change (ordered or unordered), number of results changes, number of queries getting zero results changes. And for each of these we can manually inspect a random selection of affected queries to decide whether the results are generally better or not.
We also run A/B tests, where we let a small sample of users get the proposed change, while a similar number get the standard results. We do statistical analyses on user engagement with results and various other click metrics that let us compare the control and experimental
conditions.
For more on how we test search changes in general, see Testing Search on mediawiki.org.[2]
In both of these cases—RelForge testing and A/B testing in production—overfitted models would perform poorly, and that would become apparent.
For example, if most searches on "rent" do not pertain to "rent
seeking", then how will the machine learning approach to search results for "rent" guard against never presenting any results on "rent seeking"?
Your wording has left me a bit confused, and I'm not sure whether your concern is (a) that a query of "rent" should never return "rent seeking", and so the machine learning model should never present it, or (b) that we should guard against building a model that *never* presents results on "rent seeking" for a query of "rent". I'll briefly address each.
Case (a): "rent" should *never* return "rent seeking"
It's not clear to me that returning "rent seeking" for a query of "rent"
is
necessarily a case of overfitting per se, but in general the click models that we use would take note that users who search for "rent", say, click
on
the musical 70% of the time and the disambiguation page 29% of the time. Those would be the "good" results and the model would prioritize moving them to the top of the list.
*Never* presenting results on "rent seeking" would be an error. The word
is
present in the article, and in the title, so it should be somewhere in
the
results. Moving it up or down the results list is a question of ranking, which is what the machine learning model is trying to figure out.
Case (b): "rent" should not be *prevented* from returning "rent seeking"
Our click data shows that about 80% of clicks on search results are on
one
of the first two results, and more than 90% are on the top 10. Our click models for scoring the order of results reflect that. All of the value then, from the machine learning model's point of view, comes from getting the top 3 to 5 results in the best possible order. There's not a lot of value in pushing down any particular result much farther than that. For a single word query like "rent", title matches are the best. There are only 138 results for intitle:rent, vs over 44K for just rent—however, the
first
page of results for both is the same.
We are interested in use cases other than searchers who are looking for a particular article or particular information, though that tends to predominate. Editors might want to find all the articles with a
particular
word (e.g., a misspelling) and no result would be excluded by the machine learning model, just possibly ranked lower.
Hope that helps, —Trey
[1] https://en.wikipedia.org/wiki/Overfitting [2] https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/
Testing_Search
Trey Jones Software Engineer, Discovery Wikimedia Foundation *(via Deb Tankersley's email address as Trey's original email got moderated)*
On Tue, Jun 13, 2017 at 3:43 PM, James Salsman jsalsman@gmail.com
wrote:
On Wed, Jun 14, 2017 at 5:25 AM, Deborah Tankersley dtankersley@wikimedia.org wrote:
The Discovery team structure has now changed, but the new teams will
still
work together to complete the goals as listed in the draft annual
plan.[2]
A summary of their anticipated work, as we finalize these changes, is below. We plan on doing a check-in at the end of the calendar year to
see
how our goals are progressing with the new smaller and separated team structure.
Here is a list of the various projects under the Discovery umbrella,
along
with the goals that they will be working on:
Search Backend
Improve search capabilities:
Implement ‘learning to rank’ [3] and other advanced machine
learning
methodologies ... [3] https://en.wikipedia.org/wiki/Learning_to_rank
How will the Foundation's approach to machine learning of search results ranking guard against overfitting?
For example, if most searches on "rent" do not pertain to "rent seeking", then how will the machine learning approach to search results for "rent" guard against never presenting any results on "rent seeking"?
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wik
i/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wik i/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org