WikiDE-l December 2006

wikide-l@lists.wikimedia.org

56 participants
32 discussions

Fakten, Fakten, Fakten
by Mathias Schindler 05 Jan '07

05 Jan '07

Wenn alles so stimmt, wird in der Focus-Ausgabe von morgen ein etwas längerer Artikel über Wikipedia stehen. Am Flughafen in Milano lag eine ältere Ausgabe, wo dieser Artikel zu den drei Hauptthemen der kommenden Woche zählt. Letzte Woche hat Focus von einzelnen Leuten Interviews und Fotos gemacht. In other news: http://www.faz.net/s/Rub117C535CDF414415BB243B181B8B60AE/Doc~EF3BE7A913F2C4… Eben gerade entdeckt. Orhan Pamuks Artikel in der türkischsprachigen Wikipedia und die Entstehungsgeschichte dazu. Sieht so aus, als hätten die Admins einiges an Arbeit....

27 55

die freunde der demokratie
by Klaus Graf 30 Dec '06

30 Dec '06

Was mich echt ankotzt sind diese totalitaeren gesellen hier wie schindler, die man sich auch gut im faschismus vorstellen koennte. sobald auch nur die leiseste kritik an der obrigkeit geuebt wird oder das reizwort demokratie faellt (sogar in der csu ein mehr oder minder akzeptiertes prinzip) wird die grosse klatsche rausgeholt. Quod omnes tangit, ab omnibus approbari debet. diese weitsheit hat ihren ursprung nicht in einer demokratie. histo

7 8

[PRESS] Q&A With Jimmy Wales On Search Wikia
by Mathias Schindler 30 Dec '06

30 Dec '06

http://searchengineland.com/061229-193718.php Dec. 29, 2006 at 7:37pm Eastern Q&A With Jimmy Wales On Search Wikia News came out earlier this week that Wikipedia cofounder Jimmy Wales had a new project in mind, to build a community-driven "Google-killer" search engine. I've just finished talking with Jimmy about his plans. Here's a rundown on his vision and what may come as his Search Wikia project grows over the course of the next year or two. Note that in the Q&A, I've had to recreate my questions as best I remember asking them. I was focused more on getting down Jimmy's responses. Q. Since the news emerged, there's been some confusion about Amazon and Wikipedia in relation to Search Wikia project. What's the situation? We recently completed a funding round with Amazon [for Wikia], but other than that, they don't have anything to do with the search project. [The project] is a Wikia project [the for-profit company that Wales is chairman of], not a Wikipedia project [the separate community-driven encyclopedia he co-founded]. Q. Was the search project formally announced, or did the Search Wikia site come online as a result of The Times article discussing it. It was a combination of them both. I've been working on this for a long time. We didn't actually intend to announce per se just yet, but me and my big mouth, the reporter asked me if I ever thought about search. Q. It's been said the search engine would launch in the first quarter of 2007. That's fast. Is that really just when you expect active development work to begin? During Q1, we're going to set up a project to get developers involved with building the site, writing the code and getting the search engine going. We're going to rely initially with Nutch and Lucene [related open-source search software that's been developed over the past few years]. We'll start from scratch on how to apply the Wikipedia principles to keep it as simple as possible and move forward. It's just the development starting. We're not producing a Google killing search engine in three months. I only wish I were that good of a programmer. We'll have some servers open, some development, maybe a pre-pre-alpha demo site up. We'd really anticipate it would be a year or two until we're able to launch a viable search engine. Q. How do you see this improving on what's out there? There are a lot of things that we've learned in the wiki world on how to get communities involved and engaged to build trusted networks in communities. A lot of the people who have tried to do this in the past have stumbled not on technical issues but on community issues ... dmoz [The Open Directory] was too closed ... that was their response because of the pressure of spammers ... others have thought in terms of ranking algorithms. That's not the right approach. The right approach allows for open dialog and debate and discussion. Q. How do you envision the community participating? Will they be selecting sites? Will this leverage material in Wikipedia? Will they rate sites? This will be completely independent of Wikipedia. Exactly how people can be involved is not yet certain. If I had to speculate about it, I would say it's several of those things, not just community involved with rating URLs but also community rating for whole web sites, what to include or not to include and also the whole algorithm ... That's a human type process that we can empower people to guide the spider Q. Do you see humans reviewing the most popular queries, perhaps picking the right answers to come up? Part of it might be a human review of queries. For the narrow subset of the really popular queries, I think it's important to apply humans .... if someone types Ford Motor Company, there is a correct answer for that. There's no reason to beat our brains out to train our algorithm to do that. Q. Search engines have actually gotten much better over time with these type of navigational requests. You don't need humans so much to make sure the right answer shows up. Those kinds are not too difficult. The harder one if you type ford, did you mean President Ford or do you mean the Ford Motor Company? That's the type of thing where human disambiguation pages like we have at Wikipedia are helpful. Q. Search engines already do a lot of this type of stuff. Ask has its Zoom suggestions, others have clusterings or related searches. Do you imagine people being forced to make a query refinement choice before they actually get search results? If you type ford, you should get some disambiguation terms that humans have collected, then some search results....this is one of the places where I think human intelligence is most important [NOTE: For more on query refinement, see some of my past posts such as Robert Scoble Wants What We Had -- Better Query Refinement. So Do I!, Hello Natural Language Search, My Old Over-Hyped Search Friend and Why Search Sucks & You Won't Fix It The Way You Think. The first link in particular discusses how Microsoft used to have disambiguation created by editors very similar to what Wales hopes to recreate. Sadly, it was killed in the quest to chase Google on the algorithmic front.] Q. Are you planning to crawl the entire web, billions and billions of pages? Or will you go after a subset of important ones? The number of pages is yet to be determined. Obviously we won't be doing that initially [gathering everything], but we'll invest in the hardware. Not to belittle the investment required to do a full crawl of the web on a regular basis, but I think it's a fairly commoditized. Q. Crawling is one thing. Serving up millions of queries per day is an entire other issue. Wikipedia handles a lot of traffic, but not at a Google scale. How's it going with that? The traffic's not too bad. Servers are getting more and more powerful. Bandwidth is getting cheaper. It's all pretty much off the shelf. It's pretty efficient. Q. Will you be selling ads, and if so, how will that work? There are no immediate plan to sell ads, so for now we're not too focused on that. If we don't build something useful, selling ads on it is sort of a moot point. Q. Why do this at all? What do you see wrong with search? For certain types of searches, search engines are very good. But I still see major failures, where they aren't delivering useful results. I think at a deeper almost political level, I think it's important that we as a global society have some transparency in search. What are the algorithms involved? What are the reasons why one site comes up over another one. [Wales also raised the issue of how ads might influence regular listings, perhaps search engines trying to keep commercial sites out of the free listings to make money. From there, he went on....] Those types of incentives are problematic in search. The only solution I know to that is to be transparent Q. How are you going to keep the community from being gamed. Wikipedia is very good at keeping out spam, but it's not perfect. And despite its size, it's dealing with far fewer topics than unique searches that will happen on any particular day. How do you police all those searches? You have to recognize the difference between the way community is often used on the internet, which is short hand for millions of people clicking on some stuff as compared to community in the wiki world, which is people who actually know each other. It's one thing to say if you have millions of spammers out there trying to game and trick an algorithm .... but it's not the number of queries. it's the web sites themselves. A lot of numbers are thrown about for sites on the web, but the number of legitimate pages that are not coming from affiliate sites and spammers is a much more finite number. It's much easier for a community to ban the bad stuff. Q. But what if someone gets into a "good" domain. We've had cases where bad content gets shoved into "trusted" sites or even places like university sites. Do you ban those entire domains? How do they get back in? At Wikipedia, we'd have a big discussion. [Wales then explained that people might realize a domain had done something accidentally wrong or without thinking about spam issues and so might be allowed back in.] Q. You probably already search a lot, probably mostly with Google. Is it not finding what you want already most of the time, without a flood of spam or crud in your way? Usually I'm looking for pages on Wikipedia, so they do a good job with that. It depends on the types of searches you are doing. If you're doing a factual search, then Wikipedia [in the results] would be good. In other areas, I think there's a strong commercial incentive. Why is it bad if I search for tampa hotels? [NOTE: I then did this search on Google, which we discussed. I noted I saw plenty of good hotels listed, and that if I clicked through to the local search results, I got an even better experience of hotels listed. Wales replied that he's often after reviews of hotels, not the hotels themselves. That took me back to the original results, where I pointed out the top listing was from TripAdvisor, exactly the type of review site he mentioned liking -- and that I often found them listed on these types of queries. I also noted that Google even offers refinement categories at the top of the page similar to the disambiguation he wanted, with lodging guides as one of the categories. Unfortunately for Google, I didn't find that the results from that refinement did a good job bringing back trusted hotel guides] Q. Back to transparency. People keep saying they want more of this. But can you name some exact examples of what you want to see? Do you want Google to say that using a term in bold text adds X percent of a score to the ranking criteria? And if you do that, don't you think spammers will just abuse the recipe that's been published? If your search relies on some secret factors that you hope people won't discover, you haven't really come up with a good solution the problem. Q. Microsoft has spent millions of dollars and years now of effort to try and be a Google killer and haven't made it. You're coming into this fresh with fewer resources and no real prior experience. Can you really do it? I have no idea. I only do whatever sounds like it is fun. Q. What type of funding do you have behind this? Wikia's initial round was 4 million from a variety of angels, then there was second round from Amazon, but the amount wasn't announced. Closing Comments When I first heard of the plans, I was pretty dubious the project would have much success. For one thing, the idea of the "open source" search engine to take on the world and provide more transparency is old news. Consider this from back when Nutch first came out, out of New Scientist in 2003: The project "is about providing free technology that should not be controlled by private, commercial, secretive organisations," says Doug Cuttings, veteran web search engineer, and a Nutch founder. Three years on, nothing really changed despite the reasoning behind such a project being the same. And this was despite Nutch having some big names behind it. In 2004, Nutch got another round of attention in an ACM article looking at how it works. My comment at that time was: Interesting read especially for the efforts that are involved to defeat spam. The argument is that though Nutch is open, revealing secrets won't hurt because spammers will batter down any defenses, no matter how tightly protected. OK, so what will stop spam? Nutch hopes that an open, public discussion may reveal new methods. Perhaps. But the real test will only come if Nutch is deployed by a major, highly-trafficked site. Spammers aren't going to bother trying the defenses of other places. It's not worth the time. That's also a positive for those considering Nutch. If you operate a small, vertical site or just want Nutch to be used on your own content, then spam concerns are much less an issue. The spam test simply hasn't happened with Nutch. And every new search engine project I've looked at coming in over the years completely underestimates the spam problem they face. When I looked at the Search Wikia site, comments like this almost seemed laughable: search active for spammer sites * trying to simulate user-typos (ie. "yaoho.com" rather than "yahoo.com"); see also: Microsoft's URL Tracer * blacklist domains, where spammails are linking to; create actively honeypods to get spam; use a pattern like <domain-where-we-have-registered>@myhoneypod.com to identify the spam networks; shell the common user get the possibility to register such a mail-adress? Seek out the spam sites? Hey, don't worry -- if you're popular, they'll find you fast enough. And as you blacklist one, two more throwaway domains will show up in their place. I also tend to think Wales is completely underestimating how crawling a big chunk of the web, keeping those pages fresh, ranking them quickly to provide answers and doing so for millions each day isn't an off-the-shelf commodity. Still, I find myself oddly hopeful. I don't think a Google killer will emerge, but perhaps some new ways of a community to be involved with search will come out of it. I wouldn't have thought Wikipedia would work. Certainly it's flawed, but it's also an incredible resource. Maybe something useful will come from the Search Wikia project. At the very least, I've long wanted humans to be back in the role of reviewing queries and actually looking to see if they make sense, rather than so much reliance on algorithms. Maybe the mere concept of the Search Wikia project will encourage the major search engines to do more in this area.

1 0

Gefühlte Halbzeit im Spendenaufruf
by Mathias Schindler 29 Dec '06

29 Dec '06

In den letzten paar Stunden hat der Spendencounter der Wikimedia Foundation die "750.000" Dollar übersprungen. Das ist insofern der konservative Stand, als dass noch nicht die Gelder verrechnet wurden, die beispielsweise derzeit noch in der Post liegen oder das Matching der Spenden von Virgin Unite von gestern (AFAIK). Der Spendenbalken hat damit auch die 50% übersprungen, was eine Form von Halbzeit suggeriert. "Offiziell" gibt es kein Spendenziel von 1,5 Millionen mehr, auch wenn es mal eines gegeben haben sollte. Danke an alle Spender. Mathias

1 0

Fakten, Fakten, Fakten und bunte Graphiken
by Mathias Schindler 22 Dec '06

22 Dec '06

Im aktuellen Focus (Nr. 52) auf den Seiten 114 bis 116 findet sich nun der Artikel über Wikipedia. Auf der Titelseite ist er mit Logo und "Wikipedia.... Manipulationen kratzen am Ruf" ausgewisen. Der Artikel selbst läuft unter der Rubrik Medien, Abteilung Internet, Titel: "Mehr Qualität, bitte!" Inhalt: Gemessen an den guten Vorgaben des Magazins aus Hamburg ist der Anfang eher langweilig, sachlich angreifbar. Es gäbe keine Redakteure und keine Qualitätskontrolle. Wenn da noch ein "bezahlt" drin stünde, könnte man schon eher zustimmen. Die directmedia-DVD wird erwähnt (wie kommen die darauf, daß es die zweite Auflage ist?) Die Geschäftstelle des deutschen Vereins wird erwähnt und Jaron Lanier bekommt sein Upgrade zum "Computerwissenschaftler". Arne darf zwischendurch erklären, was das Konzept von "stabilen Versionen" ist. Ein Prof Andreas Dengel wird als voll des Lobes zitiert, *trotz* unserer Weigerung, daß wir keine "Qualitätsgarantie" abgeben wollen. Meinen Tag machte der Satz (...wird man in wenigen Jahren vernünftig mit dem Werk arbieten können (Dengel)): "Dann müssten sich die klassischen Lexikonverlage ernsthafte Gedanken über künftige Geschäftsmodelle machen." Ah, dann also erst. Ich kann ja nicht für Brockhaus sprechen, aber die Unterstellung, daß die Verlage bis dahin nichts machen, sollte man sich als Lexikonverlag nicht bieten lassen, imho. Florian Langenscheidt hingegen wird mit dem Versprechen auf "Wissen mit Qualitätssiegel" zitiert. Leider ohne "Für jeden gefundenen Fehler gibt es 5 Euro zurück"-Garantie. Außerdem wird auf Larry's Citizendium verwiesen. Focus-like gibt es auch wieder bunte Graphiken: 1. "Das Welt-Wisssen der Massen": Balkengraphik der Artikel nach Sprache, deutsch ist rot, der rest blau. 2. Nielsen-Netratings-Zahlen: Unique user per month aus Deutschland: 5,4 Millionen im Oktober 200, 11,6 Millionen im November 2006. 3. Kasten: "Löcher im System": 1. Kleinfeld 2. DTH 3. Süddeutsche 4. "Wikipedia übernimmt Falschmeldung des Tagesspiegel, korrigiert Fehler nach wenigen Minuten" 5. Borat 4. Kasten: "Tipps: Die freie Enzyklopädie richtig nutzen" 1. Stets kritisch lesen 2. Viel Raum für Nebensächliches (das ist kein Tipp, sondern ein Hinweis, immerhin) 3. Meinungen der Autoren prüfen 4. Heisse Themen doppelt prüfen 5. Solides aus der Wissenschaft (will heissen: Technik-Artikel sind toll, Geschichte auch) 6. Guter Einstieg in die Recherche Grobe Fehler konnte ich nicht finden, vielleicht bin ich nach dem "Manfred Schindler" der Deutschen Welle aber einfach nur abgehärtet... Mathias

1 0

wikipedia dvd beim aldi sued
by elvis＠chan.de 22 Dec '06

22 Dec '06

momentan gibt es beim aldi sued 3 verschiedene "wissenswuerfel" in einem davon ist die dvd von direct media drin. (ok, ich war warscheinlich der letzte, der es mitbekommen hat)

2 1

thumbs werden nicht alle dagestellt
by S. Heinz 21 Dec '06

21 Dec '06

Hallo! seit ein paar tagen werden bei mir nicht alle thumbs dargestellt, wobei die Betonung auf "alle" liegt statt dessen wird ein [AD] als Platzhalter angezeigt. z.b. http://commons.wikimedia.org/wiki/Beagle erste reihe Bild...[AD]...[AD]...Bild in IE 7, Firefox und Opera. Win XP Die Firewall kanns nicht sein (bin auf eine andere umgestiegen, hatte also ein paar Stunden gar keine. (mache bringe ja einen Werbeblocker mit)) Steffen -- Grüße aus der Eifel

3 3

Username Blacklist
by Marco Schuster 21 Dec '06

21 Dec '06

Moin Liste. Heute wurde für uns die Username blacklist aktiviert, die es Admins erlaubt, Benutzer per Regexp von der Anmeldung auszuschließen. Grüsse, Marco

12 15

Copy und Paste aus Wikipedia statt wissenschaftlichem Arbeiten
by Jakob 18 Dec '06

18 Dec '06

Hi, Unter http://www.heise.de/tp/r4/artikel/24/24221/1.html findet sich der Dritte Teil von Stefan Weber's Auszügen aus seinem Buch "Das Google-Copy-Paste-Syndrom. Wie Netzplagiate Ausbildung und Wissen gefährden" - in dem mehrfach auf Wikipedia und die Unsitte des wenn schon nicht urheberrechtlich doch wissenschaftlich und moralisch falschen Kopierens hingewiesen wird. Die Schlußfolgerungen des Autors teile ich nicht ganz (es kommt nämlich nicht darauf an, wo eine Quelle herkommt, sondern wie sie verwendet wird) aber zur nächsten Veranstaltung zu Qualität und Wikipedia sollten wir Stefan Weber mal als Referent einladen. Schöne Grüße Jakob

4 4

Anpassen der zentralen Navigation - Strukturierung der Wikipedia
by Eneas 18 Dec '06

18 Dec '06

Hallo, in den letzten beiden Tagen hat es im Zuge der Anpassung der Sidebar sehr viel Kritik gegeben, die im Endeffekt aber die vor Monaten vorgenommene Struktur betrifft (http://de.wikipedia.org/wiki/Wikipedia:Sitemap). Um diese Struktur den Wünschen und Bedürfnissen der Wikipedianer als auch Leser und Neulingen anzupassen, gibt es nun im WikiProjekt Usability eine Vorschlagseite, auf der Verbesserungsvorschläge und Kommentare eingebracht werden können: http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Usability/Strukturierung… Ich bitte, um große Beteiligung und konstruktive Kritik. Grüße Eneas

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

WikiDE-l December 2006