Hi,
At de.wikipedia, we are using the MediaWiki namespace for things like current events.
Has anyone tried to export those parts into a rss/rdf file, i.e. [[MediaWiki:Hauptseite Aktuelle Ereignisse]] on de or [[MediaWiki:In the news]] on en?
I think there might be some useful purpose for this file for external web sites and this could also help wikipedia.
Mathias
On Sat, 29 May 2004 11:16:28 +0200, Mathias Schindler neubau@presroi.de wrote:
At de.wikipedia, we are using the MediaWiki namespace for things like current events.
Has anyone tried to export those parts into a rss/rdf file, i.e. [[MediaWiki:Hauptseite Aktuelle Ereignisse]] on de or [[MediaWiki:In the news]] on en?
I think there might be some useful purpose for this file for external web sites and this could also help wikipedia.
[... de-lurking, since I have worked on various RSS and Atom syndication projects before.]
1. Not to start a holy war or anything, but if you are going to be syndicating, you really should use Atom. Most major newsfeed readers support it; those that don't yet, most likely will soon, and in the interim people can run scripts which convert Atom to RSS using a simple XSLT transformation. Atom is *more* flexible, has a single standard (rather than 9 different, ad hoc, and incompatible versions--see http://diveintomark.org/archives/2004/02/04/incompatible-rss), and is much more flexible.
2. That said, you may very well *not* want to do any syndication on WikiPedia at the moment, whatever the format. Why? Well, because delivery mechanisms for syndicated feeds are currently *terrible*. WikiPedia has enough load-related performance problems as it is; imagine the same situation, except with a significant portions of your users reloading one page over and over again every 10 minutes to see if something has changed (most newsreaders *default* to polling somewhere around every 5-15 minutes).
-C
Rad Geek schrieb:
http://diveintomark.org/archives/2004/02/04/incompatible-rss), and is much more flexible.
Thank you for this link.
Why? Well, because delivery mechanisms for syndicated feeds are currently *terrible*.
This is not an argument. Since there is no dynamic content, putting it somewhere else is quite easy. I could host those files, htat's not a prolby.
WikiPedia has enough load-related performance problems as it is;
Wikipedias fine servers would not be affected.
Yours, Mathias
On Sat, 29 May 2004 16:39:25 +0200, Mathias Schindler neubau@presroi.de wrote:
Rad Geek schrieb:
. . .
[problems with delivery of syndicated feeds by polling ...]
Why? Well, because delivery mechanisms for syndicated feeds are currently *terrible*.
This is not an argument. Since there is no dynamic content, putting it somewhere else is quite easy. I could host those files, htat's not a prolby.
O.K.; I misunderstood the proposal; mea culpa. If there is some kind of system of replicating the data to mirror servers, and people are polling *those* servers instead of the WikiPedia servers, then of course there should be no performance hit.
How would users be directed to the right server? Would they just go to the front page of the data that the feed is syndicating, and find a link to the (offsite) URI? Or would there be some other mechanism?
-C
Rad Geek schrieb:
How would users be directed to the right server? Would they just go to the front page of the data that the feed is syndicating, and find a link to the (offsite) URI? Or would there be some other mechanism?
This reply here from me might reveal me as the greatest greenhorn in IT world ever but I wasn't thinking of load balancing at all. I have some gigabyte of traffic per month to spend and some hard disk space and the feed file from current headlinews would be less than 5 KByte. A refresh rate of 5-15 Minutes does make use of the HTTP 304 status code, doesn't it?
Yours, Mathias
On Sat, 29 May 2004 17:24:53 +0200, Mathias Schindler neubau@presroi.de wrote:
Rad Geek schrieb:
How would users be directed to the right server? Would they just go to the front page of the data that the feed is syndicating, and find a link to the (offsite) URI? Or would there be some other mechanism?
This reply here from me might reveal me as the greatest greenhorn in IT world ever but I wasn't thinking of load balancing at all. I have some gigabyte of traffic per month to spend and some hard disk space and the feed file from current headlinews would be less than 5 KByte. A refresh rate of 5-15 Minutes does make use of the HTTP 304 status code, doesn't it?
Unless your server is odd, sure, this won't be much of a worry if you have lots of traffic to spare. But I'm still not sure I understand what you're proposing. And, in any case, I think I wasn't clear when I was expressing my worry.
The worry is this: the point of having a syndicated feed is for users who consume the normal content ("In the News"-style content on WikiPedia in this case) to be able to find it, and put it into a feed reader or news aggregator of some kind. So the question is: if (1) you are going to be creating feeds for WikiPedia (en: or de:), but (2) they are going to be hosted on *your* server rather than WikiPedia's servers, then that raises the question of (3) how are the prospective users going to find out where the feeds *are* hosted?
When you first made the proposal, I thought that what you meant was that the MediaWiki software would generate a feed of the content in this section, and that it would supply a link (through an <a href="...">...</a> element, a <link rel="alternate" .../> element, or both) to the location where the file can be retrieved from WikiPedia. After you clarified that you were thinking of hosting it on *your* server, I assumed that you meant something more along the lines of WikiPedia generating the file, your server mirroring it from WikiPedia, and directing users over to the feed on your server *instead of* directing them to the feed on WikiPedia's servers.
*If* that's what you're proposing, then you solve the problem of adding extra load to WikiPedia's servers (since only a few sources--your mirror server, and anyone else who has a mirror server) is accessing the feed, and everyone else is getting the feed from your server instead of WikiPedia's. But it creates a new problem: WikiPedia now needs to know the URI to the feed on *your* server, and present it to people reading the "In the news" section, in order for users to be able to get the feed from where you want them to get it.
That's not a very tricky problem; I was just wondering what you had in mind to solve it: if, for example, you intended to have those who maintain the "In the news" section insert a link to your mirror of the newsfeed manually, or if you had some more automated method in mind.
But, as I said, I don't know anymore if I've even clearly understood what it was that you had in mind. If this is all based on a misreading on my part, I apologize.
-C
Rad Geek wrote:
On Sat, 29 May 2004 11:16:28 +0200, Mathias Schindler neubau@presroi.de wrote:
At de.wikipedia, we are using the MediaWiki namespace for things like current events.
Has anyone tried to export those parts into a rss/rdf file, i.e. [[MediaWiki:Hauptseite Aktuelle Ereignisse]] on de or [[MediaWiki:In the news]] on en?
I think there might be some useful purpose for this file for external web sites and this could also help wikipedia.
[... de-lurking, since I have worked on various RSS and Atom syndication projects before.]
- Not to start a holy war or anything, but if you are going to be
syndicating, you really should use Atom. Most major newsfeed readers support it; those that don't yet, most likely will soon, and in the interim people can run scripts which convert Atom to RSS using a simple XSLT transformation. Atom is *more* flexible, has a single standard (rather than 9 different, ad hoc, and incompatible versions--see http://diveintomark.org/archives/2004/02/04/incompatible-rss), and is much more flexible.
- That said, you may very well *not* want to do any syndication on
WikiPedia at the moment, whatever the format. Why? Well, because delivery mechanisms for syndicated feeds are currently *terrible*. WikiPedia has enough load-related performance problems as it is; imagine the same situation, except with a significant portions of your users reloading one page over and over again every 10 minutes to see if something has changed (most newsreaders *default* to polling somewhere around every 5-15 minutes).
-C
On fr; there is now both RSS and Atom...
Is that risky in terms of load ?
On Sat, 29 May 2004 16:45:44 +0200, Anthere anthere9@yahoo.com wrote:
. . .
On fr; there is now both RSS and Atom...
Is that risky in terms of load ?
Probably not just *yet*, but it could become risky *soon*, and it's hard to predict just how quickly it will.
The problem is that the way feeds are delivered right now is by polling with an HTTP GET according to an interval set by the newsreader. The way that this is implemented, in every newsreader that's available right now, so far as I know, is by using a fixed interval, which usually defaults to somewhere between 5 and 15 minutes, and which the user can manually set to something else (so that, for example, they're not wasting cycles checking in on a site that never updates more than once daily). Of course, most users *don't* bother to change the interval, so most feed readers end up polling your site every 5-15 minutes.
This isn't a big deal if you've got relatively few visitors--particularly if the server is set up to just send a 304 if there have been no modifications. But it doesn't scale very well at all: in terms of load, it's not much different form having one user press the "Reload" button every few minutes to see whether there have been any changes. As RSS and Atom syndication become more ubiquitous, and feed readers become more popular, more and more users are going to want to use services like this on highly trafficked sites such as WikiPedia. So there's a distinct risk that the current model won't be sustainable for very much longer.
-C
On May 29, 2004, at 1:06 PM, Rad Geek wrote:
On Sat, 29 May 2004 16:45:44 +0200, Anthere anthere9@yahoo.com wrote:
. . .
On fr; there is now both RSS and Atom...
Is that risky in terms of load ?
Probably not just *yet*, but it could become risky *soon*, and it's hard to predict just how quickly it will.
The problem is that the way feeds are delivered right now is by polling with an HTTP GET according to an interval set by the newsreader. The way that this is implemented, in every newsreader that's available right now, so far as I know, is by using a fixed interval, which usually defaults to somewhere between 5 and 15 minutes, and which the user can manually set to something else (so that, for example, they're not wasting cycles checking in on a site that never updates more than once daily). Of course, most users *don't* bother to change the interval, so most feed readers end up polling your site every 5-15 minutes.
This isn't a big deal if you've got relatively few visitors--particularly if the server is set up to just send a 304 if there have been no modifications. But it doesn't scale very well at all: in terms of load, it's not much different form having one user press the "Reload" button every few minutes to see whether there have been any changes. As RSS and Atom syndication become more ubiquitous, and feed readers become more popular, more and more users are going to want to use services like this on highly trafficked sites such as WikiPedia. So there's a distinct risk that the current model won't be sustainable for very much longer.
Wouldn't Squid take care of this???? It seems like very much a non-issue to me
Lightning
I am glad there is all that discussion anyway (thanks for your answer Rad and others). I hope there is a technical solution to potential issues of load. I am convinced that at least right now, for journalists and documentation experts, syndicating is getting a major issue and I am glad we participate to that evolution.
Rad Geek wrote:
On Sat, 29 May 2004 16:45:44 +0200, Anthere anthere9@yahoo.com wrote:
. . .
On fr; there is now both RSS and Atom...
Is that risky in terms of load ?
Probably not just *yet*, but it could become risky *soon*, and it's hard to predict just how quickly it will.
The problem is that the way feeds are delivered right now is by polling with an HTTP GET according to an interval set by the newsreader. The way that this is implemented, in every newsreader that's available right now, so far as I know, is by using a fixed interval, which usually defaults to somewhere between 5 and 15 minutes, and which the user can manually set to something else (so that, for example, they're not wasting cycles checking in on a site that never updates more than once daily). Of course, most users *don't* bother to change the interval, so most feed readers end up polling your site every 5-15 minutes.
This isn't a big deal if you've got relatively few visitors--particularly if the server is set up to just send a 304 if there have been no modifications. But it doesn't scale very well at all: in terms of load, it's not much different form having one user press the "Reload" button every few minutes to see whether there have been any changes. As RSS and Atom syndication become more ubiquitous, and feed readers become more popular, more and more users are going to want to use services like this on highly trafficked sites such as WikiPedia. So there's a distinct risk that the current model won't be sustainable for very much longer.
-C
On Saturday 29 May 2004 16:06, Rad Geek wrote:
On Sat, 29 May 2004 11:16:28 +0200, Mathias Schindler neubau@presroi.de wrote:
At de.wikipedia, we are using the MediaWiki namespace for things like current events.
Has anyone tried to export those parts into a rss/rdf file, i.e. [[MediaWiki:Hauptseite Aktuelle Ereignisse]] on de or [[MediaWiki:In the news]] on en?
I think there might be some useful purpose for this file for external web sites and this could also help wikipedia.
[... de-lurking, since I have worked on various RSS and Atom syndication projects before.]
- Not to start a holy war or anything, but if you are going to be
syndicating, you really should use Atom. Most major newsfeed readers support it; those that don't yet, most likely will soon, and in the interim people can run scripts which convert Atom to RSS using a simple XSLT transformation. Atom is *more* flexible, has a single standard (rather than 9 different, ad hoc, and incompatible versions--see http://diveintomark.org/archives/2004/02/04/incompatible-rss), and is much more flexible.
- That said, you may very well *not* want to do any syndication on
WikiPedia at the moment, whatever the format. Why? Well, because delivery mechanisms for syndicated feeds are currently *terrible*. WikiPedia has enough load-related performance problems as it is; imagine the same situation, except with a significant portions of your users reloading one page over and over again every 10 minutes to see if something has changed (most newsreaders *default* to polling somewhere around every 5-15 minutes).
I wonder could a site such as www.myrss.com do the job?
Mathias Schindler wrote:
At de.wikipedia, we are using the MediaWiki namespace for things like current events.
Has anyone tried to export those parts into a rss/rdf file, i.e. [[MediaWiki:Hauptseite Aktuelle Ereignisse]] on de or [[MediaWiki:In the news]] on en?
Somebody's running an RSS feed from [[Wikipedia:Announcements]] on en; this is done with a scraper and the file hosted on an external site: http://jeays.net/wikipedia/announcements.xml
There is built-in RSS2 and Atom feed generation support now for Special:Recentchanges and Special:Newpages. IIRC both pages are cacheable, though of course RC changes very frequently on en! If they turn out to be problematic, we can have them update only periodically.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Mathias Schindler wrote:
Has anyone tried to export those parts into a rss/rdf file, i.e. [[MediaWiki:Hauptseite Aktuelle Ereignisse]] on de or [[MediaWiki:In the news]] on en?
Somebody's running an RSS feed from [[Wikipedia:Announcements]] on en; this is done with a scraper and the file hosted on an external site: http://jeays.net/wikipedia/announcements.xml
He's added a Current Events feed now too: http://jeays.net/wikipedia/current.xml
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org