Can someone help me help Roberto? He would like to use Wikipedia
texts in a magazine that helps people to learn English. He would
take our texts and give line-by-line translations. He is happy to
release his translations under the GNU FDL but wants advice on just
what he needs to do. What sort of notice should he put on the
articles?
--Jimbo
----- Forwarded message from roberto casiraghi <robertocasiraghi(a)iol.it> -----
From: "roberto casiraghi" <robertocasiraghi(a)iol.it>
Date: Thu, 27 Feb 2003 15:08:13 +0100
To: <jwales(a)bomis.com>
Subject: Can Wikipedia articles be translated or "manipulated" to teach
English?
Hello.
I am the publisher of English4Life, an Italian magazine that teaches English by providing a double translation into Italian (word by word and in good Italian) and a complete pronunciation guide.
My question is: can I use Wikipedia texts for my purpose? I would leave the texts unchanged but supply suitable translations and comments. Do I have to ask for a special permission in order to do that?
If this particular use of your texts is available under the Open Content license, will I lose my copyright on all the translations and comments (i. e. will they become available too under the Open Content License?).
Kind regards.
Roberto Casiraghi - English4Life
Casiraghi Jones Publishing srl
Via Marconi 28
20091 Bresso (Milano)
website: www.linguefaidate.com
----- End forwarded message -----
> I am not sure if I understand the idea correct, but
> here are
> my thoughts.
No, I don't think you understood my idea correctly. I
simply wanted a bot to go through the English
Wikipedia and count the interlanguage links for each
article and then list which articles have the most. I
know it's not going to be perfect, but it's an
estimate. I mean our article count isn't perfect
either. ;-) The statistics count could also measure
how many interlanguage links are for each language as
well. Could be interesting.
> In its early stage, I guess,
> users of a non-English wikipedia may be dominantly
> bilinguals, non-native speakers, and etc. (Well,
> esperanto
> wiki and maybe some others will always remain so.
> :-)
Yeah, I think we have only have one native speaker of
Esperanto in the Wikipedia. He really helped out on
our article about native Esperanto speakers though.
:)
Chuck
=====
Learn Esperanto! - http://www.lernu.net/
Enciklopedio: http://eo.wikipedia.org/
___________________________________________________
Yahoo! Móviles
Personaliza tu móvil con tu logo y melodía favorito
en http://moviles.yahoo.es
>>I asked this before, but I don't think anyone replied.
>>Is there anyway to see a list of articles based on in
>>how many languages they're in? It would be interested
>>to see which articles are the most international or
>>the most important...
I am not sure if I understand the idea correct, but here are
my thoughts.
1.
There is perhaps no easy way to see the list. It takes a
well-developed wikitionary to find out how "United Nations,"
"Dog," or any other good candidate is spelled in all the
different languages.
We can use inter-language links, but they are not complete.
The only way is to build a list through a interlingual
collaborative research project.
It could be daunting, but the scope of the research could
be limited based on
a) the list of articles in smallerst wikis
and/or
b) to just a list of top 100 most-viewed
articles in each site.
2.
I have some guesses about which articles exist in the
greatest number of languages.
-Wikipedia might be existing in most sites.
-GNU_FDL, because it is linked from most pages, may be
as prevalent.
-Basic academic disciplines such as mathematics, linguistics,
and sociology, because they are likely to be linked from
main pages. Lists of world countries, languages, etc.
would be perhaps as popular.
While these are not unimportant subjects, they do not
represent "what's globally important" well. (I'm assuming
that's what Chuck wanted to observe.)
Beyond that, two types:
Some articles are placed in very inactive site because a
user wants to distribute an article multilingually. I
have seen at least two such articles in Japanese wiki
perhaps machine-translated. In its early stage, I guess,
users of a non-English wikipedia may be dominantly
bilinguals, non-native speakers, and etc. (Well, esperanto
wiki and maybe some others will always remain so. :-)
Then computer related topics would be popular because
of the demographics of early adopters.
With many wikis still in their infancies, those may turn
out to be the ultimate winners, scoring points from the
small-sized wikis.
But I'm not saying the idea of a list of articles based
on # of languages is uninteresting.
If we really construct one, something else may come up,
say, "Beatles" or "European Union," that would be
interesing. I'm curious, indeed.
I would be even more interested if different list exist for
people, country, artist, movie, novel, music, etc.
Did I answer your question? If my take is correct and
you will initiate an interlingual research project,
please let me know.
Tomos
_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*
http://join.msn.com/?page=features/virus
On Monday 17 February 2003 12:24 pm, Magnus Manske wrote:
> So, it seems (if I interpret Jimbo's mail on wikitech and the discussion
> here correctly) that most of us would like *some kind* of category
> scheme in wikipedia. I do, too! But, we seem to differ on the details
> (shocked silence!).
>
> So far, I saw three concepts:
> 1. Simple categories like "Person", "Event", etc.; about a dozen total.
> 2. Categories and subcategories, like
> "Science/Biology/Biochemistry/Proteomics", which can be "scaled down" to
> #1 as well ("Humankind/Person" or something)
> 3. Complex object structures with machine-readable meta-knowledge
> encoded into the articles, which would allow for quite complex
> queries/summaries, like "biologists born after 1860".
>
> Pros:
> 1. Easy to edit (the wiki way!)
> 2. Still easy to edit, but making wikipedia browseable by category,
> fine-tune Recent Changes, etc.
> 3. Strong improvement in search functions, meta-knowledge available for
> data-mining.
>
> Cons:
> 1. Not much of a help...
> 2. We'd need to agree on a category scheme, and maintenance might get a
> *little* complicated.
> 3. Quite complex to edit (e.g., "<category type='person'
> occupation='biologist' birth_month='5' birth_day='24' birth_year='1874'
> birth_place='London' death_month=.....>")
>
> For a wikipedia I'd have to write myself, I'd choose #3, but with
> respect to the wiki way, #2 seems more likely to achieve consensus (if
> there is such a thing;-)
>
> Magnus
Hm. I agree that #1 would be nearly useless and #3 is asking too much of mere
mortals but #2 smells a lot like subpages. I remember one of the arguments
against subages was that there are multiple ways to express the hierarchy of
most subjects.
So for example, an alternate hierarchy for [[proteomics]] might focus more on
historical development instead; Science/Biology/Molecular
biology/Genetics/Proteomics or depending on your opinion even
Science/Biology/Molecular biology/Biochemistry/Proteomics. Biographies would
be even more difficult: Biographies/Science/Physics/Albert Einstein vs
Biographies/Science/Physics/Theoretical physics/Albert Einstein vs
Science/Physics/Albert Einstein . But he was also a peace activist so;
Biographies/Politics/Peace activists/Albert Einstein or even
Biographies/Politics/Political movements/Peace activism/Albert Einstein.
Having hierarchies is also bad database design for the above reasons and
because novel and interesting relationships can not be searched for unless a
human has already created a hierarchy specific to that relationship.
What would be much better is to allow a spider to follow links starting from
the Main Page and automaticly classify articles based on what they link to
and what links to them. The spider would check the classification of articles
linked to and from the unclassified article in order to classify it.
Additional tweaks could be added for the spider to determine the
classification of the article based on the presentation of its content and
whether or not they are linked from certain pages that are given more weight.
Our many lists would be very useful here; if something that looks like a
biography to the spider is listed on [[List of astronomers]] then the spider
would classify that article as "astronomer". The classification "astronomer",
in turn, would already be classified under "astronomy", "science" and
probably many other things at varying "weights of relevance".
Biographies, for example, have a certain format (at least in en.wiki) whereby
the birth and death dates are in parenthesis after the name on the first
line. Most of them also state on the first line the country of origin of the
person and their occupation(s). And many are listed on the list of
biographies and lists of people pages and in the birth and death sections of
year and day articles.
All this can be used to classify the article so that the types of queries that
you mention in #3 could be done. Another example is the use of the <math> tag
for writing formula. So having that tag in an article would be /one/ thing
that would be considered by the spider in determining its classification (but
the spider wouldn't categorize the article under "mathematics" unless other
articles linked to and from it are already categorized that way).
And as time goes by we can make this categorization spider more and more
sophisticated. But it will be rather crude at first so a human-powered
feedback mechanism would have to be put into place to tweak the spider. The
many lists that we already have can be a very useful way to prime the spider
to quickly improve its accuracy.
In short, let the complex linking and standardized article formating already
present in Wikipedia determine its own "weighted relationship" categorization
with a minimal amount of human intervention. There is a goldmine of
categorization information already in Wikipedia that hasn't been tapped yet
(like in the BBC program Connections, Wikipedia could eventually be used to
find odd but interesting connections between disparate subjects along with
the more predictable relationships).
Yeah, I know - I'm just dreaming. But it would be a very neat thing to have
and it would greatly minimize the amount of work (and inconsistent guesswork)
that humans would have to do.
But the above is probably already patented by somebody.... Software patents
are evil!
-- Daniel Mayer (aka mav)
WikiKarma:
I expanded and converted [[Titanium]] over to the WikiProject Elements format.
On Tuesday 25 February 2003 06:53 pm, wikipedia-l-request(a)wikipedia.org wrote:
> Wikipedia isn't just in English, folks. An English-dependent syntax is
> *not* acceptable.
>
> -- brion vibber (brion @ pobox.com)
*cough* HTML (<table> <font size="{}"> <center> <small> <br>...) *cough*
We should be better than that but how can you make something human readable if
it isn't written in any human language? Translating the syntax would work
except for the fact that contributors would not be able to copy and adapt
metadata from one language to another. That is, unless we allowed all
translated syntax to be in one "basement" - but then we are back to the
unreadable aspect.
--mav
WikiKarma
Added a bunch of events to [[February 19]]; updated all the year pages and
many of the other articles linked from that page.
As far as language-links can't it be a simple matter of script
categorization?:
<LANGA - would be understood by most any latin-based reader as a potential
start point.... This would lead to a a disambiguation page of Latin based
languages...
<Simplified Kanji here) is readable by Chinese, Japanese, Koreans...etc...
That alone takes care of probaby 85 percent of the world...
<arabic-based text here> - could even be Aramaic! - for an icon, it is
similar enough, and could be the basis of all Aramaic -based languages..
Hebrew, Arabic... Urdu..?
Three Icons, and weve taken care of ninety2 percent of the world...
The rest... what... Cyrillic? - might be combined in the Latin. with a
backwards N or something...
Thai, and various different types of script can be incorporated... Thai into
Semitic or Asian.... etc... - In other words Icons are the way to go, you
just have to understand how they work to elicit a response... To someone who
prefers to read Kanji, the (middle sign) or the (Character sign) is an
island in the ocean...
Stevertigo...
I asked this before, but I don't think anyone replied.
Is there anyway to see a list of articles based on in
how many languages they're in? It would be interested
to see which articles are the most international or
the most important...
Thanks,
Chuck
=====
Learn Esperanto! - http://www.lernu.net/
Enciklopedio: http://eo.wikipedia.org/
___________________________________________________
Yahoo! Móviles
Personaliza tu móvil con tu logo y melodía favorito
en http://moviles.yahoo.es
Regarding Japanese article count:
yes, the number is kind of inaccurate. The problem is not unique to
Japanese, but shared with Chinese and others, I believe. The reason is as
pointed out - Japanese writing can go in length without an alphabetical
comma.
The problem has been known to some Japanese users, because one user
translted [[Wikipedia:What is an article]] and noticed the article count's
behavior. I also experimented with a fairly long article (10921 bytes). As
soon as I deleted all the commas and saved, the article count dropped one.
When I reverted, the count went up one.
I then wrote about it in [[Wikipedia_talk:What is an article]], but I guess
it didn't get much attention. I am glad to find out that elian brought this
up on this list.
Regarding Embassy, I don't mind setting one up on English wiki, (I'm a
native Japanese speaker and understand English okay), but I am afraid I
cannot fully function as an ambassador. Is a limited staffer embassy and
part-time ambassador accepted?
Tomos
_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
http://join.msn.com/?page=features/virus
> Question is : what is the *most* important message
> to
> convey, that we are multilingual, or that we are an
> encyclopedia ?
Of course, it is that we are an encyclopedia, but it
is fine to have one list on the non-English
Wikipedias, because we've already been suppressed to
using subdomain names instead of the prestigious www.
> Most people read only one language. Nobody can read
> 30 languages.
> (very few anyway). Even people who can read more
> than one language
> have a preference.
You're an American, right? I live in Rotterdam
(although I'm an American too) and I don't know anyone
here who would only read Wikipedia in one language. I
know people who go first to Wikipedia, simple
*because* it's the only multilingual encyclopedia in
existence (from what I know, correct me if I'm wrong)
and it's incredibly enlightening to see the different
POV that articles get in different languages and my
European friends thinks it's funny to see how often
the English language articles in Wikipedia are biased
toward an American POV.
*Everyone* on the Esperanto Wikipedia can (and does)
read two languages and most can read three or four.
Most of the people who would have Internet access and
would read Wikipedia can and will read more than one
language! The majority of people who would only read
Wikipedia in one language would be in the United
States, England and Australia which is 0.6% of the
world population...
> I think the French wikipedia is correct in not
> placing language lists
> on three sides of the main page the way we do in
> English. I can find
> my four languages on one list. I don't need three
> lists.
The three lists on the English Wikipedia is a
temporary measure until we can make www a multilingual
portal and then we can eliminate the extra language
lists on en.wikipedia.org and move it down to one
language list. The other question is, how many first
time visitors will see their languages on just one
list? How many would even notice the language list at
all if it wasn't shown three times?
Having the language list three times is there to
appease the people who find it offensive that www
isn't a multilingual portal. Don't forget that it was
one of the reasons why the Spanish Wikipedia broke off
from us AND why it won't rejoin.
> I don't understand why I seem to be coming across
> like some kind of
> rude crank on this. I don't mean to. I apologize
> again to everyone.
> I have nothing against anyone's language or anyone's
> wikipedia. I
> just don't see the point of the redundant,
> inconsistent information on
> available languages.
Well, it's because you don't see language
discrimination around you everyday. People in French
Canada are getting their native language shoved aside
by English and they needed laws to protect them. Most
countries of Europe don't have laws to protect their
languages and are getting run over by English. Some
job offers are only for English native speakers which
means that even if your English is perfect, but you
aren't a native speaker, you're not getting the job.
This can be quite an emotional issue for some,
especially those who are fighting against language
discrimination full-time. That's why I'm here in
Rotterdam volunteering for TEJO (www.tejo.org).
Language discrimination is a tough issue and it's not
going to go away anytime soon... I just want to add,
that I don't think that you personally are
discriminating, but that you're simply not aware of
the problem. I hope I wasn't too harsh, it's just
that I feel quite strongly about these issues...
Chuck
=====
Learn Esperanto! - http://www.lernu.net/
Enciklopedio: http://eo.wikipedia.org/
___________________________________________________
Yahoo! Móviles
Personaliza tu móvil con tu logo y melodía favorito
en http://moviles.yahoo.es
At the end of August last year, anonymous contributor 24.80.230.145
contributed this,
http://www.wikipedia.org/w/wiki.phtml?title=Gardner_Fox&oldid=191349
sprung full formed from the head of Zeus, and a few other comic book writer
progiles, then abruptly vanished.
Well, thats uncannily like this:
http://www.google.com/search?q=cache:_L8AdgxKTRwC:www.geocities.com/Athens/…
Firstly, the wiki article is clearly based on the geocities one, (or vice
versa, but I'd say thats unlikely) and the wording is very similar but
different. How close must texts be to be a breach of copyright?
--
Gareth Owen
"I love the wikipedia, but sometimes I get the impression that certain people
on this list are very bored, and so argue about something when there's really
bugger all to argue about. Edit some articles, for god's sake." -- LP