Dear Jimmy Wales,
Could you ask those in charge to make the wikimedia.org mailing lists searchable again?
There may be good reasons why they aren't,
but of course one cannot find them,
because they are not searchable!
Thanks.
jidanni@jidanni.org wrote:
Dear Jimmy Wales,
Could you ask those in charge to make the wikimedia.org mailing lists searchable again?
There may be good reasons why they aren't,
but of course one cannot find them,
because they are not searchable!
Thanks.
The reason were people posting private data to the mailing list, then annoying sysadmins to remove then when finding about them on Google.
On Sat, Aug 22, 2009 at 4:14 PM, Platonides Platonides@gmail.com wrote:
jidanni@jidanni.org wrote:
Dear Jimmy Wales,
Could you ask those in charge to make the wikimedia.org mailing lists searchable again?
There may be good reasons why they aren't,
but of course one cannot find them,
because they are not searchable!
Thanks.
The reason were people posting private data to the mailing list, then annoying sysadmins to remove then when finding about them on Google.
False. The reason is that people can post entirely reasonable things on these lists and employers can then be hyper-discriminative about their personal interests and choose not to hire them, or to fire them.
2009/8/22 Brian Brian.Mingus@colorado.edu:
False. The reason is that people can post entirely reasonable things on these lists and employers can then be hyper-discriminative about their personal interests and choose not to hire them, or to fire them.
Pity they're all over gmane and nabble, then.
Suggestion: a Google search on gmane and/or nabble linked from the archive pages.
- d.
On Sat, Aug 22, 2009 at 5:01 PM, David Gerard dgerard@gmail.com wrote:
2009/8/22 Brian Brian.Mingus@colorado.edu:
False. The reason is that people can post entirely reasonable things on these lists and employers can then be hyper-discriminative about their personal interests and choose not to hire them, or to fire them.
Pity they're all over gmane and nabble, then.
Suggestion: a Google search on gmane and/or nabble linked from the archive pages.
- d.
Yes, those sites fail to respect robots.txt and have the outrageous policy that they will only withhold emails from their archive that have the X-No-Archive header in them. Which you, as a gmail user, could not possibly include in one of your e-mails unless you used a very new feature which allows you to host your own smtp server and route messages through that.
Furthermore it's a double standard that the foundation makes the list archives available to non-members but they "agree" with the logic of putting the lists in robots.txt exclusion.
Suggestion: a Google search on gmane and/or nabble linked from the archive pages.
B> Yes, those sites fail to respect robots.txt Huh, people subscribed the lists to gmane. Gmane does not spider sites. Brion even told the Gmane admins it was OK to not encrypt addresses upon my request... (but you'll have to look that thread up yourself :-), if you can, Muhahaha)
B> and have the outrageous policy that they will only withhold emails B> from their archive that have the that they will only withhold emails B> from their archive that have the X-No-Archive header in them. Which B> you, as a gmail user, could not possibly include in one of your B> e-mails unless you used a very new feature which allows you to host B> your own smtp server and route messages through that.
Or just use a different Mail User Agent. I mean if the user has such special needs, they can at least learn to use a different Mail User Agent, instead of playing like President Kim, and outlawing searching for everybody.
B> Furthermore it's a double standard that the foundation makes the list B> archives available to non-members but they "agree" with the logic of putting B> the lists in robots.txt exclusion.
All I know is I don't know of any other examples of "security through obscurity" on mailing lists. Wasn't Jimbo inventing a new search engine? I don't know though... can't search for the announcement.
On Sat, Aug 22, 2009 at 11:20 PM, jidanni@jidanni.org wrote:
All I know is I don't know of any other examples of "security through obscurity" on mailing lists. Wasn't Jimbo inventing a new search engine? I don't know though... can't search for the announcement.
Download the gzipped mbox files from when you were not subscribed, for example http://lists.wikimedia.org/pipermail/foundation-l/2009-July.txt.gz
Import this into the client software of your choice. Enjoy your new-found ability to search.
"GM" == Gregory Maxwell gmaxwell@gmail.com writes:
GM> On Sat, Aug 22, 2009 at 11:20 PM, jidanni@jidanni.org wrote:
All I know is I don't know of any other examples of "security through obscurity" on mailing lists. Wasn't Jimbo inventing a new search engine? I don't know though... can't search for the announcement.
GM> Download the gzipped mbox files from when you were not subscribed, for GM> example http://lists.wikimedia.org/pipermail/foundation-l/2009-July.txt.gz
GM> Import this into the client software of your choice. Enjoy your GM> new-found ability to search.
Why have each user jump through such hoops, and still leave this door open to the "the bad guys" whoever they are.
Anyway my preferred client is http://www.google.com/ so that won't help anyway.
I don't see why all the years and years of technical discussion must be held at ransom just because one article where someone said one thing that he is afraid his future employer will see or something.
Just remove that one article for heavens sake, and ask the user concerned to be more careful in the future.
Or say: "We here at Wikimedia are happy to announce that beginning of 09/09/2009 the Wikimedia mailing lists will again allow search engines to index. Any users who wish an article they wrote to be removed can contact us at any time..."
What if the Linux Kernel list had to be held at ransom just because one little article? How could anybody look up a technical problem that had been encountered in the past?
How can one instruct good netiquette that one should first check if a problem has been solved in the past before posting a question, if there is no way to check? (Other than hoping that something got indexed anyway elsewhere due to "leaks" (I.e., gmane, telling us to look there while not willing to index primarily, is cheating.).
How can one user's one personal problem hold all those technical references at ransom? What other organization blackholes their entire technical discussions just because one person's one time personal problem? Remove the thing that is bothering that user, and then get these lists back into Google where they belong.
The usual way organizations deal with sensitive discussions is to have a separate closed "personal problems" list that is not indexed, instead of taking down all the other lists, North Korean style.
(Mr. Peachy, I have left the formatting in this time. Thanks.)
We used to provide a search using htdig, but it failed to update and finally got deactivated. What about adding a new search with lucene, just as the wiki search? Then mediawiki.org search could incorporate a search mediawiki-l' checkbox. :) Seems like a neat project for the codeathon.
On Mon, Aug 24, 2009 at 1:16 AM, jidanni@jidanni.org wrote:
Why have each user jump through such hoops, and still leave this door open to the "the bad guys" whoever they are.
[snip]
If you wish to have a productive discussion with people you'll be most successful if you try to understand and empathize with their concerns, so that you can find a solution which satisfies everyone. You won't go far with scare-quoted phrases like "the bad guys" and hyperbole like "held for ransom" and "North Korean style".
The current behaviour was established as the result of experience: It's not something that was done speculatively, but as a solution to real problems which were occurring. Removing messages from archives was found to be time-consuming and ineffective because once out the removal often did nothing. The annoying of dealing with it was magnified because it had to be done by someone with shell access and because it was, naturally, always urgent.
People make mistakes, both the "clicked the wrong button type" and the "failed to consider the consequence" type, and people often play fast and loose with other people's privacy. As an example— an issue we've had in the past is people responding with private details to a message which included a public list buried in its carbon-copy chain. So admonishing "be more careful" really doesn't solve it: The lack of google indexing is intended to address the cases where "be careful" failed.
The intent isn't to stop people from searching for information in the lists, which would be an impossible goal, but to prevent material from the lists from showing up at the top of google when people perform random searches for various people's names and to make removals actually effective. So the availability of archive files is not a problem.
Perhaps this is more of a problem for the Wikimedia Lists than many others due to the high search placement of the Wiki(p|m)edia sites in general. I think the comparison to LKML is entirely inappropriate: not only can you make an entirely different set of assumptions about the users technical prowess but LKML is open for posting to non-subscribers … the level of SPAM received through it in the past has exceeded the volume of some of our lists, its like arguing that we shouldn't wear underwear because the nice folks at the nudist colony don't either. :) Different culture, different issues, different solutions.
Other people do have the same problems and concerns— though obviously you're less likely to see them if they aren't indexed by google! Being able to keep your messages out of the search indexes while remaining open to anyone who is willing to click a few buttons is a primary attraction of the yahoo-groups service. Be thankful that we don't force you though an infuriating web interface like they do.
I think everyone would like better search than we currently have available. It should be possible to provide a solid search interface without increasing the level of exposure.
On Mon, Aug 24, 2009 at 6:44 AM, Gregory Maxwellgmaxwell@gmail.com wrote:
On Mon, Aug 24, 2009 at 1:16 AM, jidanni@jidanni.org wrote:
Why have each user jump through such hoops, and still leave this door open to the "the bad guys" whoever they are.
[snip]
If you wish to have a productive discussion with people you'll be most successful if you try to understand and empathize with their concerns, so that you can find a solution which satisfies everyone. You won't go far with scare-quoted phrases like "the bad guys" and hyperbole like "held for ransom" and "North Korean style".
The current behaviour was established as the result of experience: It's not something that was done speculatively, but as a solution to real problems which were occurring. Removing messages from archives was found to be time-consuming and ineffective because once out the removal often did nothing. The annoying of dealing with it was magnified because it had to be done by someone with shell access and because it was, naturally, always urgent.
People make mistakes, both the "clicked the wrong button type" and the "failed to consider the consequence" type, and people often play fast and loose with other people's privacy. As an example— an issue we've had in the past is people responding with private details to a message which included a public list buried in its carbon-copy chain. So admonishing "be more careful" really doesn't solve it: The lack of google indexing is intended to address the cases where "be careful" failed.
The intent isn't to stop people from searching for information in the lists, which would be an impossible goal, but to prevent material from the lists from showing up at the top of google when people perform random searches for various people's names and to make removals actually effective. So the availability of archive files is not a problem.
Perhaps this is more of a problem for the Wikimedia Lists than many others due to the high search placement of the Wiki(p|m)edia sites in general. I think the comparison to LKML is entirely inappropriate: not only can you make an entirely different set of assumptions about the users technical prowess but LKML is open for posting to non-subscribers … the level of SPAM received through it in the past has exceeded the volume of some of our lists, its like arguing that we shouldn't wear underwear because the nice folks at the nudist colony don't either. :) Different culture, different issues, different solutions.
Other people do have the same problems and concerns— though obviously you're less likely to see them if they aren't indexed by google! Being able to keep your messages out of the search indexes while remaining open to anyone who is willing to click a few buttons is a primary attraction of the yahoo-groups service. Be thankful that we don't force you though an infuriating web interface like they do.
I think everyone would like better search than we currently have available. It should be possible to provide a solid search interface without increasing the level of exposure.
I'd like to echo the last point. I'd certainly like to see a decent search function for the mailing lists. (Though given the number of sites that already archive some of our mailing lists, even opening them to Google doesn't seem likely to increase exposure by all that much.)
How difficult would it be for someone to set up Lucene (or similar) to go through the collective mailing list archives and provide some form of centralized search interface?
If someone is feeling really ambitious, one might even look at replacing the pipermail archive with something more stable (links can break when the index gets rebuilt) and easier to manage with respect to things like removing private info. (There might even be workable alternatives already in existence somewhere.) We've been using what appears to be a more or less generic Mailman install for ages. Seems like a good target for improvements.
-Robert Rohde
On 8/24/09 2:09 PM, Robert Rohde wrote:
On Mon, Aug 24, 2009 at 6:44 AM, Gregory Maxwellgmaxwell@gmail.com wrote:
I think everyone would like better search than we currently have available. It should be possible to provide a solid search interface without increasing the level of exposure.
I'd like to echo the last point. I'd certainly like to see a decent search function for the mailing lists. (Though given the number of sites that already archive some of our mailing lists, even opening them to Google doesn't seem likely to increase exposure by all that much.)
How difficult would it be for someone to set up Lucene (or similar) to go through the collective mailing list archives and provide some form of centralized search interface?
It's not hard in theory, just needs an interested party and some elbow grease to replace the horror that is pipermail. :) The existing solutions we've tried (htdig integration) have not been very stable, hence the status quo.
In the meantime, the wide availability of searchable archive copies through multiple third-party list aggregation services means most of our lists are *already* searchable via Google etc, so we're not missing much functionality.
-- brion
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
In the meantime, the wide availability of searchable archive copies through multiple third-party list aggregation services means most of our lists are *already* searchable via Google etc, so we're not missing much functionality.
-- brion
The key exception of course being our private mailing lists like stewards-l and in particular Checkuser-l which desperately needs to be searcheable.
- -Mike
Thank you everybody for your comments. Pipermail is fine, if you would only let Google™ index it. Every mailing list in the world occasionally sees the accidental slip of the cut and paste finger, and the need for an administrator to remove the spilled beans, which he should then do. But there is no need to not let Google index it. When we think search, we think Google. I hate proprietary software, and RMS http://jidanni.org/comp/index.html#rms is my idol, but when I think search, I think Google, and will not remember to use a special search for a special list. I end up reposting this thread every time I realize I can't find something again due to someone's arbitrary decision, so this time Cc'd jwales to perhaps get a second arbitrary opinion. Anyway, you are throwing the baby out with the bathwater, and as you mention the stuff is mostly in Google indirectly anyway, why do this crippleware concept of not letting it be indexed? Also no need to reinvent the wheel of a substitute search engine... OK to have it alongside Google, but don't block Google. There are a lot of tools, noindex, nofollow, just don't block entirely. OK, maybe you all operate on some higher logic.
P.S., if removing a message will cause renumbering, just leave a stub message.
I have an idea, put a message on each subscription page, and post to all subscribers: "Starting 9.9.2009 all lists will once again be open to indexing in Google™. This means due to Mediawiki.org ranking, anything you say can and will end up at the top of search engine results, you have been warned."
Tim, good job finding that. I in fact long ago have given up on searching for anything related to these mailing lists.
On Mon, Aug 24, 2009 at 3:40 PM, jidanni@jidanni.org wrote:
Thank you everybody for your comments. Pipermail is fine, if you would only let Google™ index it. Every mailing list in the world occasionally sees the accidental slip of the cut and paste finger, and the need for an administrator to remove the spilled beans, which he should then do. But there is no need to not let Google index it. When we think search, we think Google. I hate proprietary software, and RMS http://jidanni.org/comp/index.html#rms is my idol, but when I think search, I think Google, and will not remember to use a special search for a special list. I end up reposting this thread every time I realize I can't find something again due to someone's arbitrary decision, so this time Cc'd jwales to perhaps get a second arbitrary opinion. Anyway, you are throwing the baby out with the bathwater, and as you mention the stuff is mostly in Google indirectly anyway, why do this crippleware concept of not letting it be indexed? Also no need to reinvent the wheel of a substitute search engine... OK to have it alongside Google, but don't block Google. There are a lot of tools, noindex, nofollow, just don't block entirely. OK, maybe you all operate on some higher logic.
P.S., if removing a message will cause renumbering, just leave a stub message.
I have an idea, put a message on each subscription page, and post to all subscribers: "Starting 9.9.2009 all lists will once again be open to indexing in Google™. This means due to Mediawiki.org ranking, anything you say can and will end up at the top of search engine results, you have been warned."
Tim, good job finding that. I in fact long ago have given up on searching for anything related to these mailing lists.
If you have such a huge crush on Google then use Gmail to search these lists. You've already been shown how to import the list archives.
[snip]
As already noted in this thread and your several previous repetitions of it, all the public lists you're talking about are already searchable: a Google search for "wikitech-l jidanni" returns 3,010 results.
I'm placing Jidanni under moderation on this list; further repetitions of this previously-answered question will not be let through.
-- brion
On Sun, Aug 23, 2009 at 1:20 PM, jidanni@jidanni.org wrote:
Can you please not alter the way quotes are done by including letters at the start of the line because most clients pick up the ">" at the start of the line and indicate that it's a quote. The best way to indicate who said it is leave the bit up the top that indicates the day/date and sender.
- Peachey
Platonides wrote:
jidanni@jidanni.org wrote:
Dear Jimmy Wales,
Could you ask those in charge to make the wikimedia.org mailing lists searchable again?
There may be good reasons why they aren't,
but of course one cannot find them,
because they are not searchable!
Thanks.
The reason were people posting private data to the mailing list, then annoying sysadmins to remove then when finding about them on Google.
I think he already knows the reasons why, and the archive sites where search is available, he found out that information last time he posted to this list on this subject:
http://marc.info/?l=wikitech-l&m=124348095926993&w=2
-- Tim Starling
On Mon, Aug 24, 2009 at 11:54 AM, Tim Starlingtstarling@wikimedia.org wrote:
Platonides wrote:
jidanni@jidanni.org wrote:
Dear Jimmy Wales,
Could you ask those in charge to make the wikimedia.org mailing lists searchable again?
There may be good reasons why they aren't,
but of course one cannot find them,
because they are not searchable!
Thanks.
The reason were people posting private data to the mailing list, then annoying sysadmins to remove then when finding about them on Google.
I think he already knows the reasons why, and the archive sites where search is available, he found out that information last time he posted to this list on this subject:
http://marc.info/?l=wikitech-l&m=124348095926993&w=2
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Jidanni, for what it's worth you brought up this question back in May and were told the exact same things as in this thread. Did you forget the previous discussion? Or did you think by CC'ing Jimmy you'd get a different answer this time around?
-Chad
wikitech-l@lists.wikimedia.org