Well,
I finally got around to it. I have completed and replicated Wikipedi, got around all of Erik's issues with 1.6.8, and I am offering turn key systems with automated site mirroring software, image sync tools, and machine translators for Wikipedia. The knowledge is free, the software is free, however, hardware costs money ....
Anyone needing 7 x 24 support and systems to host Wikipedia and provide Star Trek universal translator support is welcome as a customer to support Wikipedia collaboration.
Back to work on Cherokee thesaraus and Otali dialect issues. I am selling the hardware at near cost + my time and expense to support folks who need to mirror Wikipedia in a format that works. Let me know if there are issues with Wikimedia trademarks. I am not using them other than to say I am installing the Wikipedia encyclopedia on the systems.
All my Wikilove,
Jeff
Jeffrey V. Merkey wrote:
Well,
I finally got around to it. I have completed and replicated Wikipedi, got around all of Erik's issues with 1.6.8, and I am offering turn key systems with automated site mirroring software, image sync tools, and machine translators for Wikipedia. The knowledge is free, the software is free, however, hardware costs money ....
Anyone needing 7 x 24 support and systems to host Wikipedia and provide Star Trek universal translator support is welcome as a customer to support Wikipedia collaboration. Back to work on Cherokee thesaraus and Otali dialect issues. I am selling the hardware at near cost + my time and expense to support folks who need to mirror Wikipedia in a format that works. Let me know if there are issues with Wikimedia trademarks. I am not using them other than to say I am installing the Wikipedia encyclopedia on the systems.
All my Wikilove,
Jeff
URL is
http://www.wolfmountaingroup.com
Jeff
Hi!
and I am offering turn key systems with automated site mirroring software, image sync tools, and machine translators for Wikipedia. The knowledge is free, the software is free, however, hardware costs money ....
I'm not sure if running that hardware to support wikipedia is best way to do it ;-) I'd better support WMF to run the site! ;-)
Domas Mituzas wrote:
Hi!
and I am offering turn key systems with automated site mirroring software, image sync tools, and machine translators for Wikipedia. The knowledge is free, the software is free, however, hardware costs money ....
I'm not sure if running that hardware to support wikipedia is best way to do it ;-) I'd better support WMF to run the site! ;-)
Disagree. I've talked to a lot of organizatons and groups who love WMF content but want to filter it for in-house use similiar to CleanFlix. Most educational facilities and religious groups who really love wikipedia want a "clean" version of the content, with most of the sexual revolution/pro-alternate lifestyle propoganda materials removed from the dumps. WMF does a great job allowing all points of view and all knowledge in the dumps -- I respect this -- unfortunately, a lot of conservative groups who like WMF content don't want the information overload from much of the topics, or perceived erotic imagery in many of the articles. My tools let you filter import XML dumps for certain classes of content -- so there's a "clean" version of the content. I doubt the LDS church wants to host WMF materials with articles on beastialiy, same sex marriage, etc. It's a matter of degrees. Making WP more pervasive also means accomodating folks who have differing views and values. We offer a family oriented version that's customizable as well as Native American translation support.
The tribes hosting translations will consume a large number of these units along with conservative religious groups and family oriented groups. I have included a "donate now" button pointing back to WMF in the main pages.
Jeff
Hi!!!!
revolution/pro-alternate lifestyle propoganda materials removed from the dumps.
You forgot China and their needs!
Anyway, I'm somewhat confused. You disagree with idea that foundation is more worthy cause to support Wikipedia, than buying filtering hardware.
from much of the topics, or perceived erotic imagery in many of the articles. My tools let you filter import XML dumps
Do your filters try to count how much pink is in picture? :)
Anyway, I'm not sure censoring Wikipedia is helping Wikipedia. And I'm not sure anyone limiting access to proper wikipedia content is helping Wikipedia either.
I guess there're more books which contain violence (say... smiting in Bible), which should be definitely filtered.
Domas
Domas Mituzas wrote:
Hi!!!!
revolution/pro-alternate lifestyle propoganda materials removed from the dumps.
You forgot China and their needs!
Good point. I don;t know where to start on that one, so some suggestions would be welcome. Perhaps I should post the filtering code somwhere and let folks who know better than I on the china issue create a categories template for filtering China content.
Anyway, I'm somewhat confused. You disagree with idea that foundation is more worthy cause to support Wikipedia, than buying filtering hardware.
from much of the topics, or perceived erotic imagery in many of the articles. My tools let you filter import XML dumps
Do your filters try to count how much pink is in picture? :)
They use an algorithm developed at U of U that detects pixel ratios of human skin tone (the ratios are the same no matter what complexion) and tags them for removal. From what I have seen its only a handful of images (@ 100) in most of the dumps, and medial articles like "penis" and "vagina" I think are ok. Certain art images are objectionable to folks, along with articles on words like F__K, etc. which needs cleaning up.
Anyway, I'm not sure censoring Wikipedia is helping Wikipedia. And I'm not sure anyone limiting access to proper wikipedia content is helping Wikipedia either.
Well, that's also my view as well, I hate censorship, but like it or not, the conservative element of the US comprises 98% of the folks who run these organizations and have real $$$ to support Wikimedia, and the criticisms I get when I talk to folks about supporting Wikipedia are:
1). It's in the press due to inaccurate bios 2). It hosts radical left wing content 3). It is used as a lobbying platform by various political groups
Saying we support "filtering" and a "clean" version, whether its really clean or not, shuts these people up and at a minimum provides some level of comfort they can use the content and have a degree of control over certain categories of content use.
Were it up to me, I would say "let them eat Wikipedia as is", but its not up to me, and I have to deploy Native Translations in areas like the Cherokee Nation which is 99% hard core Southern Baptist, and other areas which do not have such an open view.
Jeff
I guess there're more books which contain violence (say... smiting in Bible), which should be definitely filtered.
Domas
foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
On 7/26/06, Jeff V. Merkey jmerkey@wolfmountaingroup.com wrote:
They use an algorithm developed at U of U that detects pixel ratios of human skin tone (the ratios are the same no matter what complexion) and tags them for removal.
I doubt that would work too well on say:
http://en.wikipedia.org/wiki/Yiff
Domas Mituzas wrote:
Hi!!!!
revolution/pro-alternate lifestyle propoganda materials removed from the dumps.
You forgot China and their needs!
Anyway, I'm somewhat confused. You disagree with idea that foundation is more worthy cause to support Wikipedia, than buying filtering hardware.
from much of the topics, or perceived erotic imagery in many of the articles. My tools let you filter import XML dumps
Do your filters try to count how much pink is in picture? :)
Anyway, I'm not sure censoring Wikipedia is helping Wikipedia. And I'm not sure anyone limiting access to proper wikipedia content is helping Wikipedia either.
I guess there're more books which contain violence (say... smiting in Bible), which should be definitely filtered.
Yes, the Bible is full of violence, sex, and lots of other "nasty stuff"; and so is Wikipedia. Sure, you can filter them, but you're going to end up with a lot of holes.
A better use of time/effort would be not to filter articles on their subject area, but on their quality; if you have the time, going through an article to find which is the "best" revision and then letting everyone else know, and/or making a list of what the "best" articles are, will ultimately be more productive than filtering out a (hopefully NPOV) decent article because you don't think that your readers should have access to this kind of information.
Jeffrey V. Merkey wrote:
Disagree. I've talked to a lot of organizatons and groups who love WMF content but want to filter it for in-house use similiar to CleanFlix. Most educational facilities and religious groups who really love wikipedia want a "clean" version of the content, with most of the sexual revolution/pro-alternate lifestyle propoganda materials removed from the dumps.
You're confusing the issue. Those groups *don't* love Wikipedia. We're not here to accomodate and please.
Just consider the "conservative group" in Germany only 70 years ago that didn't want all this "propaganda" for interracial marriage, etc. How would you adopt Wikipedia to suit them?
Lars Aronsson wrote:
Jeffrey V. Merkey wrote:
Disagree. I've talked to a lot of organizatons and groups who love WMF content but want to filter it for in-house use similiar to CleanFlix. Most educational facilities and religious groups who really love wikipedia want a "clean" version of the content, with most of the sexual revolution/pro-alternate lifestyle propoganda materials removed from the dumps.
You're confusing the issue. Those groups *don't* love Wikipedia. We're not here to accomodate and please.
It sounds like you are not here to accomodate and please, but I doubt this reflects everyones views.
Just consider the "conservative group" in Germany only 70 years ago that didn't want all this "propaganda" for interracial marriage, etc. How would you adopt Wikipedia to suit them?
Since I am married to a German Citizen I can answer that. I am not adapting it to serve anyone, I simply provide filtering mechanisms they can use to filter the dumps. That's up to the users, not me. I personally don't care what's in it.
Jeff
Jeffrey V. Merkey wrote:
I simply provide filtering mechanisms they can use to filter the dumps. That's up to the users, not me. I personally don't care what's in it.
I find this deeply problematic. Are you providing your audience with a product that says "this is Wikipedia" and where the edit history for an article says "this part was written by user:LA2" but where the stuff I wrote might be filtered out?
On 7/26/06, Lars Aronsson lars@aronsson.se wrote:
I find this deeply problematic. Are you providing your audience with a product that says "this is Wikipedia" and where the edit history for an article says "this part was written by user:LA2" but where the stuff I wrote might be filtered out?
Do you also find it problematic that someone can remove what you wrote with a later edit and yet your username still remains in the history?
Lars Aronsson wrote:
Jeffrey V. Merkey wrote:
I simply provide filtering mechanisms they can use to filter the dumps. That's up to the users, not me. I personally don't care what's in it.
I find this deeply problematic. Are you providing your audience with a product that says "this is Wikipedia" and where the edit history for an article says "this part was written by user:LA2" but where the stuff I wrote might be filtered out?
That's what the GFDL is all about **ANYONE** may edit. This also means **ANYONE** can edit or filter out materials.
Jeffrey V. Merkey wrote:
I find this deeply problematic. Are you providing your audience with a product that says "this is Wikipedia" and where the edit history for an article says "this part was written by user:LA2" but where the stuff I wrote might be filtered out?
That's what the GFDL is all about **ANYONE** may edit. This also means **ANYONE** can edit or filter out materials.
Yes of course, but then the article history shows that someone else had edited the text after I was there, and the diff (included in the distribution or available by link to the Wikipedia website) shows exactly what my contribution was. Does your filter program show its own modifications in the article history?
On 7/26/06, Lars Aronsson lars@aronsson.se wrote:
Yes of course, but then the article history shows that someone else had edited the text after I was there, and the diff (included in the distribution or available by link to the Wikipedia website) shows exactly what my contribution was. Does your filter program show its own modifications in the article history?
The GFDL doesn't require that a distributor make every previous version available.
Folks,
Jeff is actually trying to build a business based on reusing Wikipedia content, and yet a good half of these posts attack him for making use of the content. Nowhere in our charter does it say that we're building an immutable work. Let people who want to tweak it, rename it, change my username to a foul word, rename all mentions of Jimmy Wales to "Evil Dictator" or whatever do so. The source material is Wikipedia. The end users can't call theirs Wikipedia, since that's the WMF's trademark. Beyond that, we should be encouraging all uses --
On 7/26/06, Gregory Maxwell gmaxwell@gmail.com wrote:
On 7/26/06, Lars Aronsson lars@aronsson.se wrote:
Yes of course, but then the article history shows that someone else had edited the text after I was there, and the diff (included in the distribution or available by link to the Wikipedia website) shows exactly what my contribution was. Does your filter program show its own modifications in the article history?
The GFDL doesn't require that a distributor make every previous version available. _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
Jeff is making his own business with these? I thought he was just trying to help Wikipedia with his assistance.
Either way, I am pleased with his efforts.
On 7/27/06, Ilya Haykinson haykinson@gmail.com wrote:
Folks,
Jeff is actually trying to build a business based on reusing Wikipedia content, and yet a good half of these posts attack him for making use of the content. Nowhere in our charter does it say that we're building an immutable work. Let people who want to tweak it, rename it, change my username to a foul word, rename all mentions of Jimmy Wales to "Evil Dictator" or whatever do so. The source material is Wikipedia. The end users can't call theirs Wikipedia, since that's the WMF's trademark. Beyond that, we should be encouraging all uses --
On 7/26/06, Gregory Maxwell gmaxwell@gmail.com wrote:
On 7/26/06, Lars Aronsson lars@aronsson.se wrote:
Yes of course, but then the article history shows that someone else had edited the text after I was there, and the diff (included in the distribution or available by link to the Wikipedia website) shows exactly what my contribution was. Does your filter program show its own modifications in the article history?
The GFDL doesn't require that a distributor make every previous version available. _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
Ilya Haykinson wrote:
Folks,
Jeff is actually trying to build a business based on reusing Wikipedia content, and yet a good half of these posts attack him for making use of the content. Nowhere in our charter does it say that we're building an immutable work. Let people who want to tweak it, rename it, change my username to a foul word, rename all mentions of Jimmy Wales to "Evil Dictator" or whatever do so. The source material is Wikipedia. The end users can't call theirs Wikipedia, since that's the WMF's trademark. Beyond that, we should be encouraging all uses --
Yes. The more of these there are the better. This includes the right to use all or part of the Wikimedia material to start a new wiki. The result still has to be consistent with GFDL, and trademark limitations should not be that much of a problem. These new wikis aren't bound by NPOV, and they can be as liberal or conservative as they want around copyright law. If they want to avoid wikilove and establish a flame-wiki that's up to them too.
This all seems consistent with freedom of information.
Ec
Ray Saintonge wrote:
Ilya Haykinson wrote:
Folks,
Jeff is actually trying to build a business based on reusing Wikipedia content, and yet a good half of these posts attack him for making use of the content. Nowhere in our charter does it say that we're building an immutable work. Let people who want to tweak it, rename it, change my username to a foul word, rename all mentions of Jimmy Wales to "Evil Dictator" or whatever do so. The source material is Wikipedia. The end users can't call theirs Wikipedia, since that's the WMF's trademark. Beyond that, we should be encouraging all uses --
Yes. The more of these there are the better. This includes the right to use all or part of the Wikimedia material to start a new wiki. The result still has to be consistent with GFDL, and trademark limitations should not be that much of a problem. These new wikis aren't bound by NPOV, and they can be as liberal or conservative as they want around copyright law. If they want to avoid wikilove and establish a flame-wiki that's up to them too.
This all seems consistent with freedom of information.
Ec
foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
Well,
I don't have plans to call Jimbo an "Evil Dictator" and slam anyone else ever again in the community.
This project is for Native Langauge Preservation with each tribe getting their own appliance and translator to translate content and host it locally. I plan to help out with Foundation with my translation work, and we are actually one of the few groups actually using wikibooks for an actual education program, so the fit is a good one.
As for wikilove, I have my good share of it, after all, I am one of the few community members still here and doing my best to help out after the flame war of the century with the Wikipedia Community. I just hope everyone can forgive and forget and we can all work together moving forward.
Jeff
Hoi, In the past encyclopedias have played a certain role. They explained how things are. They did this with quite some disregard to what the powers that be thought about this. When information is filtered from Wikipedia, you explicitly allow people to be blindsighted because of an intentional lack of information.
Statistics show that kids who are exposed to sexual education, in an atmosphere where the functionality of sexuality is not a taboo subject are less likely to get prematurely pregnant. They are also less likely to get venereal deseases.
Indeed the GFDL allows you to do this "service" to your communities. It is as they say: the law is an ass.
Thanks, GerardM
On 7/27/06, Jeffrey V. Merkey jmerkey@wolfmountaingroup.com wrote:
Lars Aronsson wrote:
Jeffrey V. Merkey wrote:
I simply provide filtering mechanisms they can use to filter the dumps. That's up to the users, not me. I personally don't care what's in it.
I find this deeply problematic. Are you providing your audience with a product that says "this is Wikipedia" and where the edit history for an article says "this part was written by user:LA2" but where the stuff I wrote might be filtered out?
That's what the GFDL is all about **ANYONE** may edit. This also means **ANYONE** can edit or filter out materials.
On Wed, Jul 26, 2006 at 10:15:00AM -0600, Jeffrey V. Merkey wrote:
Making WP more pervasive also means accomodating folks who have differing views and values.
That's what NPOV was supposed to do. Ut oh!
read you soon, Kim Bruning
On 7/27/06, Kim Bruning kim@bruning.xs4all.nl wrote:
On Wed, Jul 26, 2006 at 10:15:00AM -0600, Jeffrey V. Merkey wrote:
Making WP more pervasive also means accomodating folks who have differing views and values.
That's what NPOV was supposed to do. Ut oh!
NPOV, used properly, makes life bearable for a broad spectrum of *editors*.
NPOV does nothing to help someone who has decided that an image of a penis will scar their children for life.
Gregory Maxwell wrote:
On 7/27/06, Kim Bruning kim@bruning.xs4all.nl wrote:
On Wed, Jul 26, 2006 at 10:15:00AM -0600, Jeffrey V. Merkey wrote:
Making WP more pervasive also means accomodating folks who have differing views and values.
That's what NPOV was supposed to do. Ut oh!
NPOV, used properly, makes life bearable for a broad spectrum of *editors*.
NPOV does nothing to help someone who has decided that an image of a penis will scar their children for life.
Hoi, Would that someone have only daughters ? And is she a woman ? Thanks, GerardM
"Single or Dual Xeon or AMD64 based system with up to 6.0 Terabytes of storage and 12 GB of DDR memory. Comes fully installed with the Wikipedia Encyclopedia and a full suite of automated site mirroring and update tools and 1 year of service and support."
Fantastic. Wikipedia Weekly News couldn't have put it better. Jeff, congratulations on getting Wolf Mountain ready in time for the national holiday*.
SJ
On 7/26/06, Samuel Klein meta.sj@gmail.com wrote:
"Single or Dual Xeon or AMD64 based system with up to 6.0 Terabytes of storage and 12 GB of DDR memory. Comes fully installed with the Wikipedia Encyclopedia and a full suite of automated site mirroring and update tools and 1 year of service and support."
Why so much storage?
Uncompressed enwiki text is 680GB. All images in total are about 300GB. Loaded into postgresql with a normal set of indexes enwiki takes about 400GB, I can imagine that a mysql load would be any larger (in mysql mediawiki uses zlib batch compression).
I guess the 3TB makes sense to have some growth room.. but 18TB?
:)
So that we may never have to buy another server for decades... or until we need more processing power.
On 7/26/06, Gregory Maxwell gmaxwell@gmail.com wrote:
On 7/26/06, Samuel Klein meta.sj@gmail.com wrote:
"Single or Dual Xeon or AMD64 based system with up to 6.0 Terabytes of storage and 12 GB of DDR memory. Comes fully installed with the
Wikipedia
Encyclopedia and a full suite of automated site mirroring and update
tools
and 1 year of service and support."
Why so much storage?
Uncompressed enwiki text is 680GB. All images in total are about 300GB. Loaded into postgresql with a normal set of indexes enwiki takes about 400GB, I can imagine that a mysql load would be any larger (in mysql mediawiki uses zlib batch compression).
I guess the 3TB makes sense to have some growth room.. but 18TB?
:) _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
On 7/26/06, James Hare messedrocker@gmail.com wrote:
So that we may never have to buy another server for decades... or until we need more processing power.
I wish you luck in keeping the disks running that long. ;)
I wouldn't know -- I think the longest I kept a computer was 3 years before I got a new one. I'm highly impatient with old computers.
On 7/26/06, Gregory Maxwell gmaxwell@gmail.com wrote:
On 7/26/06, James Hare messedrocker@gmail.com wrote:
So that we may never have to buy another server for decades... or until
we
need more processing power.
I wish you luck in keeping the disks running that long. ;) _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
Gregory Maxwell wrote:
On 7/26/06, James Hare messedrocker@gmail.com wrote:
So that we may never have to buy another server for decades... or until we need more processing power.
I wish you luck in keeping the disks running that long. ;)
3 x 1 power supplys and high output fans. These are the same units I sell through Solera Networks (I am using the same manufacture I contracted with for Solera), They are installed with Wikipedia instead of solera networks software (forensics tracking software is optional), and these units seem to go for about 4 years before needing hard drive swapouts in only 2 % of cases. The massive airflow keeps the drives cool and they seem to go over 48 months before I start to see any sector failure issues even in the most severe writing cases (which DSFS the file system I wrote for Solera is extremely write intensive).
Jeff
foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
On 7/26/06, Gregory Maxwell gmaxwell@gmail.com wrote:
Why so much storage?
Uncompressed enwiki text is 680GB. All images in total are about 300GB. Loaded into postgresql with a normal set of indexes enwiki takes about 400GB, I can imagine that a mysql load would be any larger (in mysql mediawiki uses zlib batch compression).
I guess the 3TB makes sense to have some growth room.. but 18TB?
:)
If you are doing mass computerised translations you could burn through it pretty fast.
geni wrote:
On 7/26/06, Gregory Maxwell gmaxwell@gmail.com wrote:
Why so much storage?
Uncompressed enwiki text is 680GB. All images in total are about 300GB. Loaded into postgresql with a normal set of indexes enwiki takes about 400GB, I can imagine that a mysql load would be any larger (in mysql mediawiki uses zlib batch compression).
I guess the 3TB makes sense to have some growth room.. but 18TB?
:)
If you are doing mass computerised translations you could burn through it pretty fast.
I have to host three instances of Wikipedia, so my storage for WikiGadugi is at 2 TB at present. I use the a shared images setup just like the Commons, and an automated syncrhonizer that checks for new images in a non-obstrusive way.
Wikimedia may want to consider setting up a subscription service through cogento for folks using these appliances to increase their revenues by offering a private mirror link for images and content. Folks could buy appliances then purchase a subscription to Wikimedia's images and commons setup via rsync or http in a similair setup to Red Hat's setup for support. Would allow a lot of MediaWiki appliances to move and also provide Wikimedia additional sources of revenue with little to no investment on their end.
Jeff
Jeffrey V. Merkey wrote:
geni wrote:
On 7/26/06, Gregory Maxwell gmaxwell@gmail.com wrote:
Why so much storage?
Uncompressed enwiki text is 680GB. All images in total are about 300GB. Loaded into postgresql with a normal set of indexes enwiki takes about 400GB, I can imagine that a mysql load would be any larger (in mysql mediawiki uses zlib batch compression).
I guess the 3TB makes sense to have some growth room.. but 18TB?
:)
If you are doing mass computerised translations you could burn through it pretty fast.
I have to host three instances of Wikipedia, so my storage for WikiGadugi is at 2 TB at present. I use the a shared images setup just like the Commons, and an automated syncrhonizer that checks for new images in a non-obstrusive way.
Wikimedia may want to consider setting up a subscription service through cogento for folks using these appliances to increase their revenues by offering a private mirror link for images and content. Folks could buy appliances then purchase a subscription to Wikimedia's images and commons setup via rsync or http in a similair setup to Red Hat's setup for support. Would allow a lot of MediaWiki appliances to move and also provide Wikimedia additional sources of revenue with little to no investment on their end.
Jeff _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
And one more thing. If we are truly going to challenge google and yahoo at some point for internet search engine supremecy, this is the path to go. Google got there by seeding the internet with appliances which skulk and harvest content from the web with massive storage. I am currently adding search engine capabilities combined with automated translation and wiki formatting so we can construct a search engine over time. It will take us about two years to get where google is, but I have more powerful technology than they use, so over time, I think we can eventually get there and wikify the internet. That's where this is going long term. Native Language preservation is a huge deal right now -- one step at a time.
Jeff
On 7/26/06, Jeffrey V. Merkey jmerkey@wolfmountaingroup.com wrote:
And one more thing. If we are truly going to challenge google and yahoo at some point for internet search engine supremecy, this is the path to go.
No real reseason to do that. In any case the Open Directory Project has been going since 1998 (about the same length as google) and has never really challanged for the lead position in search technology.
On 7/26/06, geni geniice@gmail.com wrote:
On 7/26/06, Jeffrey V. Merkey jmerkey@wolfmountaingroup.com wrote:
And one more thing. If we are truly going to challenge google and yahoo at some point for internet search engine supremecy, this is the path to go.
No real reseason to do that. In any case the Open Directory Project has been going since 1998 (about the same length as google) and has never really challanged for the lead position in search technology.
A directory isn't a search engine, so I wouldn't expect it to do so.
A wiki is also not a search engine... But what I think Jeff is thinking about isn't so much directly competing in the search space but instead supplanting the world wide web with the world wide wiki.
It is interesting to ponder the social implications of a massive switch to highly participatory systems like Wiki... perhaps delivering on some of the promises that people used to promote 'blogging'... In any case, it's interesting.. but off topic.
Gregory Maxwell wrote:
On 7/26/06, geni geniice@gmail.com wrote:
On 7/26/06, Jeffrey V. Merkey jmerkey@wolfmountaingroup.com wrote:
And one more thing. If we are truly going to challenge google and yahoo at some point for internet search engine supremecy, this is the path to go.
No real reseason to do that. In any case the Open Directory Project has been going since 1998 (about the same length as google) and has never really challanged for the lead position in search technology.
A directory isn't a search engine, so I wouldn't expect it to do so.
A wiki is also not a search engine... But what I think Jeff is thinking about isn't so much directly competing in the search space but instead supplanting the world wide web with the world wide wiki.
It is interesting to ponder the social implications of a massive switch to highly participatory systems like Wiki... perhaps delivering on some of the promises that people used to promote 'blogging'... In any case, it's interesting.. but off topic.
Greg got it ....
Jeff
foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
Hi!!!
And one more thing. If we are truly going to challenge google and yahoo at some point for internet search engine supremecy, this is the path to go. Google got there by seeding the internet with appliances which skulk and harvest content from the web with massive storage.
I never thought the appliances they sold (and that was their revenue..) did ever participate in their global search. Is this another myth used, or do you really believe they used spiders they _sold_ to harvest information for them? Why did they build their datacenters at all? They did market their appliances as company-wide search boxen. Stuff we do with lucenes.
I am currently adding search engine capabilities combined with automated translation and wiki formatting so we can construct a search engine over time.
Search engine of what? "Distributed" in search engine context doesn't mean John in Alaska is having a cluster with Jeb in Texas. It means that people run datacenters with lots of servers stacked there, and sub-millisecond medium connecting all that. Unless, of course, you are following bittorrent lead.
It will take us about two years to get where google is, but I have more powerful technology than they use, so over time, I think we can eventually get there and wikify the internet.
Um. You could share your technology, for common good, sure!
With all revision pages its around 3 TB total.
That really requires advanced tech. At Wikipedia revision pages are compressed, and a proper compression run contracts whole dataset into 0.5T or so (or less).
You forgot China and their needs!
Good point. I don;t know where to start on that one, so some suggestions would be welcome. Perhaps I should post the filtering code somwhere and let folks who know better than I on the china issue create a categories template for filtering China content.
irony 1 |ˈīrənē; ˈiərnē| |ˌaɪrəni| |ˌʌɪrəni| noun ( pl. -nies) the expression of one's meaning by using language that normally signifies the opposite, typically for humorous or emphatic effect : “Don't go overboard with the gratitude,” he rejoined with heavy irony. See note at wit . • a state of affairs or an event that seems deliberately contrary to what one expects and is often amusing as a result : [with clause ] the irony is that I thought he could help me. • (also dramatic or tragic irony) a literary technique, originally used in Greek tragedy, by which the full significance of a character's words or actions are clear to the audience or reader although unknown to the character.
Well, that's also my view as well, I hate censorship, but like it or not, the conservative element of the US comprises 98% of the folks who run these organizations and have real $$$ to support Wikimedia
hypocrisy |hiˈpäkrisē| |həˌpɑkrəsi| |hɪˌpɒkrɪsi| noun ( pl. -sies) the practice of claiming to have moral standards or beliefs to which one's own behavior does not conform; pretense.
Greg got it ....
I did not :(
I just opened Five Pillars article, and I know that I may fail at 'code of conduct' sometimes, sorry, but, Wikipedia has neutral point of view. No 'please donate' buttons will make filtered forks (a.k.a. mirrors) worth mentioning as "helping Wikipedia". Filtered resource/mirror does not give full power to edit the free content. Sure, $$$ attracts business people, and we have few business people here, but we have the pride of not taking that with strings attached.
We have pride to make free information resource ;-)
It would be an easy task to make a list of countries worth visiting to provide wikifilter (I'm quite amused to find wikifilter.org is still free ;-) - there're lots of things people should not know (because it is all very very depressing (c) Colbert ;-) But I hope foundation has more noble causes to exist.
Cheers, Domas
Domas Mituzas wrote:
Um. You could share your technology, for common good, sure!
What does this mean exactly? Whose good? Commerce is not evil, Wikipedia was born from it .... and is supported by it.
I did not :(
That's clear....
I just opened Five Pillars article, and I know that I may fail at 'code of conduct' sometimes, sorry, but, Wikipedia has neutral point of view. No 'please donate' buttons will make filtered forks (a.k.a. mirrors) worth mentioning as "helping Wikipedia".
Wikipedia usage as a credible resource is the standard for measurement of its success. If folks don't think its credible, or view it as a novelty, then it has failed. Personal soapboxs aside.
Filtered resource/mirror does not give full power to edit the free content.
I can think of nothing more powerful than every person on the planet one day owning their own personal wikipedia mirror. Now that's freedom to edit.
Sure, $$$ attracts business people, and we have few business people here, but we have the pride of not taking that with strings attached.
Food costs money, rent costs money, computers cost money, hosting costs money, internet connectivity costs money -- the strings are that Danny and Jimbo have to pay everyones bills to keep it going. How about those who wish to funnel $$$ into their efforts by creating some viable revenue generating initiatives around it.
We have pride to make free information resource ;-)
I have pride in creating translations to bring it to every native tribe in the US and Hawaii. And I can back up what I promise with action and solid delivery -- not hot air :-).
It would be an easy task to make a list of countries worth visiting to provide wikifilter (I'm quite amused to find wikifilter.org is still free ;-) - there're lots of things people should not know (because it is all very very depressing (c) Colbert ;-)
< But I hope foundation has more noble causes to exist. >
You have missed the point and I don't think understand my intentions.
Jeff
Thread taken off line ...
Cheers, Domas
foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
Hi!
What does this mean exactly? Whose good? Commerce is not evil, Wikipedia was born from it .... and is supported by it.
Um again.
Wikipedia usage as a credible resource is the standard for measurement of its success. If folks don't think its credible, or view it as a novelty, then it has failed. Personal soapboxs aside.
I don't see direct link between [automated] censorship and credibility.
I can think of nothing more powerful than every person on the planet one day owning their own personal wikipedia mirror. Now that's freedom to edit.
I somehow stay at the idea that there's more freedom, when you can edit wikipedia without owning a mirror. Unless you want to fork ;-)
Food costs money, rent costs money, computers cost money, hosting costs money, internet connectivity costs money -- the strings are that Danny and Jimbo have to pay everyones bills to keep it going.
Now strings are defined by community, as community supports Wikipedia. I'm not CFO, so I'm quite happy about that ;-)
And I'm happy if there's anyone else who would like to support without strings attached (especially on content), let it be some nice big foundation.
On the other hand, currently computers and hosting are enjoying economy of scale - we run quite efficient system, and cost to run personal mirror (cost per view) may be somewhat higher than the cost wmf is facing. This is where community joins and assists with keeping the site up.
And I can back up what I promise with action and solid delivery -- not hot air :-).
Well, it is really nice that wikipedia activities attract people who maintain the solid delivery!
You have missed the point and I don't think understand my intentions.
Wikify the world? Intentions are clear, now the methods are not. I guess concepts of distributed wikis might be better discussed at wikitech-l rather than foundation-l. There it would be really more on topic!
BR,
Domas Mituzas wrote:
I somehow stay at the idea that there's more freedom, when you can edit wikipedia without owning a mirror. Unless you want to fork ;-)
I think that's what I was talking about. I forked and now there exists a Cherokee Wikipedia and soon a Uto-Aztecan Wikipedia. Read http://www.wikipedia.org/wiki/Cherokee_Clans. Forking is how we established settlements all over the Southeast in ancient times - banishment was a good thing -- a sign of new beginnings. Forking seeds new communities -- without the conflict.
Now strings are defined by community, as community supports Wikipedia. I'm not CFO, so I'm quite happy about that ;-)
I thought they were defined by Society and Economics? The community unfortunately apparantly operates in a vacuum (based on my empirical observation) and does not seem to listen to the outside world much. Society seems to awaken it once in a while with USA today articles, then it scratches, rolls over, and goes back to sleep to the outside world. Not a bad model for keeping focus, but one that prevents its members from seeing the forest for all the trees -- reminds me of the old Novell culture in the mid 1990s which was also a self contained isolated island in the middle of Utah Valley.
Wikify the world? Intentions are clear, now the methods are not. I guess concepts of distributed wikis might be better discussed at wikitech-l rather than foundation-l. There it would be really more on topic!
BR,
I find I work better alone on projects like these until the dinner is prepared and ready to serve. I am happy to share, but I want to get 98% of the way there before I have to deal with the organizational issues. I've done this stuff for 25 years and people slow me down at times.
All my Wikilove,
Jeff
On 7/27/06, Domas Mituzas midom.lists@gmail.com wrote:
Now strings are defined by community, as community supports Wikipedia. I'm not CFO, so I'm quite happy about that ;-)
And although you might disagree, I have no reason you are not Hardware Officer either. I know Delphine calls herself "the former" Chapter Officer (and now Chair of LCCom). Danny thinks himself GrO still, Elian resigned, as for other officers, I have no clue.
Related to a certain website, I would like to know the exact contact Wikimedia Officers or their successors concerning updates of reports from each field. Is there any latest chart of the Wikimedia Foundation Organisation? Or we can say "the officers are still in office, unless they resigned"
Um, however, as for Hardware, I think we need to argue with whom and on what a comcom member should talk to get a new update? <g>
Hi!
And although you might disagree, I have no reason you are not Hardware Officer either. I know Delphine calls herself "the former" Chapter Officer (and now Chair of LCCom). Danny thinks himself GrO still, Elian resigned, as for other officers, I have no clue.
Right now tech. committee has the authority (even more of that!) than I did before. I still handle the work of 'hardware officer', but I guess I did that before we even invented officers. And if you want formal hierarchy, then I guess Brion is our chairman ;-)
Related to a certain website, I would like to know the exact contact Wikimedia Officers or their successors concerning updates of reports from each field. Is there any latest chart of the Wikimedia Foundation Organisation?
I'm sure those reports might be called quite voluntary - at least I did them before to describe our activities (then someone thought it was good idea to use them as semi-official documents).
Or we can say "the officers are still in office, unless they resigned"
Or we can say - we have community of nice guys who do stuff. Titles are nice, but they're not tags that make us act in some specific ways. :)
Um, however, as for Hardware, I think we need to argue with whom and on what a comcom member should talk to get a new update? <g>
We already discussed that, didn't we? :)
On 7/28/06, Domas Mituzas midom.lists@gmail.com wrote:
Hi!
And although you might disagree, I have no reason you are not Hardware Officer either. I know Delphine calls herself "the former" Chapter Officer (and now Chair of LCCom). Danny thinks himself GrO still, Elian resigned, as for other officers, I have no clue.
Right now tech. committee has the authority (even more of that!) than I did before. I still handle the work of 'hardware officer', but I guess I did that before we even invented officers. And if you want formal hierarchy, then I guess Brion is our chairman ;-)
/me notes "the bureaucracy and hierarchy of WMF is as complicated as ... some christian denominacions which claim their apostelic traditions
Related to a certain website, I would like to know the exact contact Wikimedia Officers or their successors concerning updates of reports from each field. Is there any latest chart of the Wikimedia Foundation Organisation?
I'm sure those reports might be called quite voluntary - at least I did them before to describe our activities (then someone thought it was good idea to use them as semi-official documents).
You makes a good note. Let me explain here our situation in general. Currently _all_ editors of its are voluntary, and I understand committees' members are too, say, it is mundatory for none of us to edit / update the website (IIRC the resolution which demanded periodical reports from committe(s) has been pending, so no binding for us). It's one pole of our world. The other pole is that it is the official website of the WMF, and external people, including donors visits that. And the most of informatiom provided to them on that site is outdated. We are sustained by those good people but they might know nothing in details for what they contributed to the Foundation. And as far as I know, many of them believe they donate for hardware purchase. It would be good to inform them how their former donations were used.
That is why I think Report on Budget and Hardware purchase, those two items would be crutial parts of that website. But not only hardware, but also other activities are matters, as far as the Foundation budget is spent for those. Or it was done under the name of WMF. Other kinds of reports are therefore informative and helpful to develop the Foundation's and consequently our own activities on the projects.
Formerly Quarto provided its readers quarterly reports but it hasn't been issued since last year. A report twice a year (it's currently just my personal thought, I need to discuss this idea on Comcom lter) for example might be easiler to make than quarterly ones, and more visitor-friendly than webpages not undated through about one year.
Or we can say "the officers are still in office, unless they resigned"
Or we can say - we have community of nice guys who do stuff. Titles are nice, but they're not tags that make us act in some specific ways. :)
Yep. And our currently potential problems are, so I presume, 1) there is no particular community on that [and perhaps it would be one reason most of its editors are inactive] and 2) many people who have an account on that seem to tend to think they need titles or tags alike to make a significant contributions [and it makes a sense in some cases; but personally some people seem to take it too a big deal, as if even they are disqualified to make a draft for updating. The tag "Official" might scare them].
Um, however, as for Hardware, I think we need to argue with whom and on what a comcom member should talk to get a new update? <g>
We already discussed that, didn't we? :)
So we expect you remember other present and important issues clearly, including the informally proposed Houserule for Wikimania?
Hi!
/me notes "the bureaucracy and hierarchy of WMF is as complicated as ... some christian denominacions which claim their apostelic traditions
:-) yeah, all that stuff to prove there is no cabal..
nothing in details for what they contributed to the Foundation. And as far as I know, many of them believe they donate for hardware purchase. It would be good to inform them how their former donations were used.
Haha, well, the best report for what tech team is doing is that site is still up. ;-) All our major hardware purchases (well, other stuff than a meter of UTP cable) are shown on meta pages, and reports used to be more of an overview of problems and visions.
is spent for those. Or it was done under the name of WMF. Other kinds of reports are therefore informative and helpful to develop the Foundation's and consequently our own activities on the projects.
I sure agree that it is nice to communicate :) I also believe that method for that should be part of our common sense :)
Anyway, I think that at least tech stuff is pretty transparent for those who care. We're communicating (and always online :) in IRC, there's active mailing list, our wikis (and logs in them) are public, yadda yadda. Some guys even read village pump ;-) If only everyone around would provide that much of information about activities ;-)
So we expect you remember other present and important issues clearly, including the informally proposed Houserule for Wikimania?
I'm rebellious, surely not following that one. :)
Hi,
On Fri, 28 Jul 2006, Domas Mituzas wrote:
nothing in details for what they contributed to the Foundation. And as far as I know, many of them believe they donate for hardware purchase. It would be good to inform them how their former donations were used.
Haha, well, the best report for what tech team is doing is that site is still up. ;-) All our major hardware purchases (well, other stuff than a meter of UTP cable) are shown on meta pages, and reports used to be more of an overview of problems and visions.
Problems and visions : useful things to report on.
We could use a designated reporting setup -- either a person or a place/schedule -- for each committee and major project, with some standards tats/report-elements that would help minimize scaling pains.
SJ
Domas Mituzas wrote:
Hi!
/me notes "the bureaucracy and hierarchy of WMF is as complicated as ... some christian denominacions which claim their apostelic traditions
:-) yeah, all that stuff to prove there is no cabal..
nothing in details for what they contributed to the Foundation. And as far as I know, many of them believe they donate for hardware purchase. It would be good to inform them how their former donations were used.
Haha, well, the best report for what tech team is doing is that site is still up. ;-) All our major hardware purchases (well, other stuff than a meter of UTP cable) are shown on meta pages, and reports used to be more of an overview of problems and visions.
is spent for those. Or it was done under the name of WMF. Other kinds of reports are therefore informative and helpful to develop the Foundation's and consequently our own activities on the projects.
I sure agree that it is nice to communicate :) I also believe that method for that should be part of our common sense :)
Anyway, I think that at least tech stuff is pretty transparent for those who care. We're communicating (and always online :) in IRC, there's active mailing list, our wikis (and logs in them) are public, yadda yadda. Some guys even read village pump ;-) If only everyone around would provide that much of information about activities ;-)
So we expect you remember other present and important issues clearly, including the informally proposed Houserule for Wikimania?
I'm rebellious, surely not following that one. :)
I missed a step. What is this Houserule ?
As for hardware purchase reports, they have never been as clear as today, since precise purchases are detailed and voted in a resolution. They may all be found on the wikimediafoundation site. Now, just a thought : if you are a visitor not familiar with this website, I wish you good luck to find the resolutions :-) They are very much hidden. It would be nice that someone helps to make them more visible. Perhaps even putting a [[resolutions]] link in the toolbar.
I would love to see reports of the techco, at least to explain what they are working on right now. I have no idea if Brad gets feedback on this.
The comcom is providing regular reports to the board. They are sent by the chair (Michael), though I am unsure if they are written by himself or by a collective. I do not think these reports (or part of them) are public.
The spcom only provided three reports. The first two ones were not publicly published; They mostly concerned the set up. I published the third one very recently : http://meta.wikimedia.org/wiki/SPC_Report_April-July_2006. I omitted some parts on purpose :-)
We got one or two reports of the trademark committee. For obvious reasons, our strategy with regards to brand is not openly published.
ant
On 7/29/06, Anthere Anthere9@yahoo.com wrote:
Domas Mituzas wrote:
So we expect you remember other present and important issues clearly, including the informally proposed Houserule for Wikimania?
I'm rebellious, surely not following that one. :)
I missed a step. What is this Houserule ?
ah it's a joke. Danny proposed "R35 for alchol" and certain people like our Lithuanian friend don't like it.
As for hardware purchase reports, they have never been as clear as today, since precise purchases are detailed and voted in a resolution. They may all be found on the wikimediafoundation site. Now, just a thought : if you are a visitor not familiar with this website, I wish you good luck to find the resolutions :-) They are very much hidden. It would be nice that someone helps to make them more visible. Perhaps even putting a [[resolutions]] link in the toolbar.
Toolbar is one of improvement needed things definitely. It can be optimized through MediaWiki:Sidebar only for sysop.
Other potential addition is "policies" -> Privacy policy & Visual guideline. The latter is hardly found too, I'm afraid.
It is fine to know the Board gets reports steadly. As the nect step, would the Board be interested in re-releasing some of those reports to the Foundation wiki or passing to Comcom as potential materials to the foundation wiki? Since 1) it is suggested some of those reports might contain things not suitable to be in public, 2) those reports haven't been passed to Comcom [even by itself, IIRC]. I think we are rather better to bother the Board in this area.
Delphine permitted me to reuse her reports on early summer European tour as report from Chapter Com (see my meta talk), but it would be nicer much newer stuffs could be published.
We got one or two reports of the trademark committee. For obvious reasons, our strategy with regards to brand is not openly published.
Aphaia wrote:
On 7/29/06, Anthere Anthere9@yahoo.com wrote:
Domas Mituzas wrote:
So we expect you remember other present and important issues clearly, including the informally proposed Houserule for Wikimania?
I'm rebellious, surely not following that one. :)
I missed a step. What is this Houserule ?
ah it's a joke. Danny proposed "R35 for alchol" and certain people like our Lithuanian friend don't like it.
As for hardware purchase reports, they have never been as clear as today, since precise purchases are detailed and voted in a resolution. They may all be found on the wikimediafoundation site. Now, just a thought : if you are a visitor not familiar with this website, I wish you good luck to find the resolutions :-) They are very much hidden. It would be nice that someone helps to make them more visible. Perhaps even putting a [[resolutions]] link in the toolbar.
Toolbar is one of improvement needed things definitely. It can be optimized through MediaWiki:Sidebar only for sysop.
Why do I feel...
Okay, remind me of that issue after Wikimania please :-)
Other potential addition is "policies" -> Privacy policy & Visual guideline. The latter is hardly found too, I'm afraid.
It is fine to know the Board gets reports steadly. As the nect step, would the Board be interested in re-releasing some of those reports to the Foundation wiki or passing to Comcom as potential materials to the foundation wiki? Since 1) it is suggested some of those reports might contain things not suitable to be in public, 2) those reports haven't been passed to Comcom [even by itself, IIRC]. I think we are rather better to bother the Board in this area.
okay. I'll see what we can do. Ant
Delphine permitted me to reuse her reports on early summer European tour as report from Chapter Com (see my meta talk), but it would be nicer much newer stuffs could be published.
We got one or two reports of the trademark committee. For obvious reasons, our strategy with regards to brand is not openly published.
On 7/26/06, Domas Mituzas midom.lists@gmail.com wrote: [snip]
With all revision pages its around 3 TB total.
That really requires advanced tech. At Wikipedia revision pages are compressed, and a proper compression run contracts whole dataset into 0.5T or so (or less).
A minor nit.. with braindead stupid compression (toasted columns in PGsql which use a modified LZ algo which gets less compression than gzip -3 but is much faster and compresses a single row at a time) you can get the whole of english wikipedia into 0.4TB including the needed indexes and the (not insubstantial) DB overhead.
With state of the art compression (lzma) you can get all the revisions into 6gb, but you lose random access. At wikimania tech days I'll be presenting a system which achieves similar compression perform ace but preserves random access... Which is at least a mildly interesting subject, although perhaps without practical implications for wikimedia until the disk/cpu performance gap widens a bit further. :)
Gregory Maxwell wrote:
On 7/26/06, Samuel Klein meta.sj@gmail.com wrote:
"Single or Dual Xeon or AMD64 based system with up to 6.0 Terabytes of storage and 12 GB of DDR memory. Comes fully installed with the Wikipedia Encyclopedia and a full suite of automated site mirroring and update tools and 1 year of service and support."
Why so much storage?
Uncompressed enwiki text is 680GB. All images in total are about 300GB. Loaded into postgresql with a normal set of indexes enwiki takes about 400GB, I can imagine that a mysql load would be any larger (in mysql mediawiki uses zlib batch compression).
I guess the 3TB makes sense to have some growth room.. but 18TB?
With all revision pages its around 3 TB total. Room for growth and mirroring of 5 + 1 or 0 + 1 configurations. Using RAID 1 + 5 Mirrored RAID 5 arrays really boosts read performance about 60%, so the extra storage are for these fault tolerant and performance options.
Jeff
:) _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
wikimedia-l@lists.wikimedia.org