(I have posted this on [[Larry Sanger/Estimating article numbers]].)
So Wikipedia has almost 10,000 pages. This represents a heck of a lot of
work and a heck of a lot of content, and we can all be proud, but...I think
there's a bit of a problem. I have a puzzle for you: how many of these
pages are *articles*?
''We'' all know "10,000 pages" does not mean "10,000 encyclopedia articles."
There are a lot of redirection pages, Talk pages, member pages and subpages,
commentary pages, Wikipedia project pages, and other non-articles. But the
new reader doesn't know this, and if some news media source comes along (as
they inevitably will--it's only a matter of time now), I think we might be
blasted for misrepresenting the extent of our achievement. Not only would
that be shameful, it lose us the participation of potential new contributors
who ''care'' about how accurately we represent our achievement, though they
don't care if we say we have 10,000 or, instead, a mere 6,000 articles. :-)
I wouldn't envy anyone the task of counting the ''actual'' number of
articles. But we could estimate the number.
Anyone care to give it a shot, and report the results?
For purposes of this exercise, I don't think we need to draw a distinction
between one-sentence articles to the effect that so-and-so was a famous
novelist, and treatise-length pages. Both of those can, for our purposes,
be called articles, or perhaps "entries." What ''can't'' be called articles
* redirection pages
* Talk pages
* member pages and subpages thereof
* pages describing the Wikipedia project (e.g., the FAQ, news,
* pages that consist *only* of links to other articles, with virtually no
content of their own
* any other categories?
What I'd like to do with your estimate is to make the assumption that the
''ratio'' of present pages to present articles will remain roughly the same
for the next 5,000 or so ''pages.'' (Or perhaps you can tell me how long
the ratio can probably be relied upon.)
Then, we can (honestly) boast "over 5,000 articles" (notice, articles, not
pages) or "over 5,000 entries" on the front page. This will make our work
seem more substantial, more real, and that's important if we're going to
make this a reasonably serious project.
I'll bet the present number is right around 5,000, but I really don't know!
----- Forwarded message from "Krzysztof P. Jasiutowicz" <kpjas(a)ceti.pl> -----
Does Wikipedia need these addons to the periodic table ?
If other layout is needed please let me know I have the data
in xml file it can be parsed as one pleases.
For the past two days I hasn't been able to post anything to Wikipedia
besides the file is rather large so probably won't get through anyway.
See the attached file (ZIP).
Krzysztof P. Jasiutowicz, M.D | Trzeba by stu oczu, by móc je na wszystko
Czestochowa, Poland ... | przymykać. Stanisław Jerzy Lec
Więcej cytatów : http://www.cytaty.phg.pl
I was just thinking about e-mail interface to Wikipedia.
What for, you might ask.
Sometimes it is for some people hard to get onto the Wikipedia server.
I also think it would booster authors' productivity.
Off the top of my head I can think of three commands that can be placed in
subject line of an e-mail :
* GET - raw text of the page
* RECEIVE - formatted text of the page
of course there should also be 'an article a day' service.
POSTing should be reserved to active memebers of this mailing list.
There should be some way to safeguarde against accidental overwriting
of existing pages.
When POSTing raw text in body before '-- ', and author's id from e-mail
Krzysztof P. Jasiutowicz, M.D | Piekło jest wybrukowane dobrymi chęciami.
Czestochowa, Poland ... | Samuel Johnson
Więcej cytatów : http://www.cytaty.phg.pl
> I have an ancient copy of the Rubaiyat of Omar Khayyam (book doesn't
> even have a date or copyright page).
It was published in 1913. The illustrations are by Edmund Sullivan,
who died in 1933, so the copyright on them has indeed expired. I
would recommend that you convert them to PNG for better compression.
Since they are black-and-white line drawings, converting them en
masse from the existing GIF will be lossless and not sacrifice any
If someone with a Chinese browser can tell you exactly what character
encoding is commonly used for their text fields, then you just need
to modify bomis's server configs to change the "Character-Encoding"
line to that. Same for Japanese. If Chinese and/or Japanese
browsers can be configured to use UTF-8, that would probably be best
since that would enable simulataneous display of all languages, but I
think most older browsers use ISO-2022 or something.
> We intend to support everything. Unfortunately, we may not always
> know how.
>> As far as I know, Chinese words are not supported by
>> wikipedia. Range 0x80 - 0xFF in a byte was used by
>> Chinese operation system to represent Chinese words.
>> However, wikipedia will not render those codes "as is". )
>> Any plan to support it?
> I've a list of all the winners of major sporting events in an
> encyclopedia. Obviously, the accompanying text is copyright, but
> how about the lists themselves? Can I just copy them verbatim?
> What if I scan them?
The _information_ is not copyrightable. Any _creative_ expression
involved is, but that includes such things as the selection of which
data to present, what order it is presented in, what other data is
combined with, the physical layout, etc. See "Feist v. Rural" in
As long as your list includes ALL the winners of certain events (so
there's no "selection" issue), and lists them in some obvious way
such as by year, and you list only the information and none of the
accompanying text, and you arrange the data in an obvious way (don't
copy a special table format or something), there is no problem.
Scanning and then OCRing is fine--but presenting the scan as an image
might run into choice of visuals like fonts and colors and such.
I've a list of all the winners of major sporting events in an encyclopedia.
Obviously, the accompanying text is copyright, but how about the lists
themselves? Can I just copy them verbatime? What if I scan them?
We would like to solicit your help, hoping that some of you might be
On our "International Wikipedia" wikis, we have not yet translated the
copyright warning into the target languages. This is very important,
because we want to do what we can to prevent people from uploading
copyrighted content. This could be damaging to Bomis legally and therefore
to both Nupedia and Wikipedia.
Here is the text of the warning:
Please notice that all contributions to Wikipedia are considered to be
released under the GNU Free Documentation License. If you don't want your
writing to be edited mercilessly and redistributed at will, then don't hit
submit. You are also promising us that you wrote this yourself, or copied
it from a public domain resource. DO NOT USE COPYRIGHTED WORK WITHOUT
Here are the languages we need this in:
http://ca.wikipedia.com/ Catalan (Catalan)
http://zh.wikipedia.com/ Chinese (Hanyu)
http://de.wikipedia.com/ German (Deutsch)
http://fr.wikipedia.com/ French (Français)
http://he.wikipedia.com/ Hebrew (Ivrit)
http://it.wikipedia.com/ Italian (Italiano)
http://ja.wikipedia.com/ Japanese (Nihongo)
http://pt.wikipedia.com/ Portuguese (Português)
http://ru.wikipedia.com/ Russian (Russkiy)
http://es.wikipedia.com/ Spanish (Castellano)
http://sv.wikipedia.com/ Swedish (Svensk)]
Please, send translations of this very important text to jasonr(a)bomis.com.
Thanks and best regards,
whats going on on the wikipedia Site ????
I cannot go to any page anymore except for the Homepage of the Int.WP.
When I analyzed it it is the case that all pages
which show up as (or called by)a path e.g. www.wikipedia.com/wiki/Item are
All pages which are somehow called by www.wikipedia.com/wiki.cgi?Item
I've tested that the other way round on one German page too.
Server rights changed?
mit freundlichen Gruessen
StefanRybo in Wikipedia
apologies for any language errors (please correct)