Hey,
I am very much interested in the idea of a Taxobox. I have an interesting method of generating it using a basic Python script. It would gather all the data using the basic Python application and would be able to store it in the taxonomy templates and we can display according with respect to the display templates.
I will look into the merits and de - merits of the Generation of Automatic Taxobox and you will be receiving my proposal in the following week.
I just need to know whether I am on the right track here?
Regards, Ashwin
Hey Ashwin,
Where are you getting the data for the taxobox? Does it need human supervision? If not, depending on how heavy this is, this could be a very cool Lua Template, where it can be used to generate it dynamically.
On Mon, Apr 2, 2012 at 1:47 PM, Ashwin Ravichandran ashwin107@gmail.com wrote:
Hey,
I am very much interested in the idea of a Taxobox. I have an interesting method of generating it using a basic Python script. It would gather all the data using the basic Python application and would be able to store it in the taxonomy templates and we can display according with respect to the display templates.
I will look into the merits and de - merits of the Generation of Automatic Taxobox and you will be receiving my proposal in the following week.
I just need to know whether I am on the right track here?
Regards, Ashwin
-- Ashwin.S.Ravichandran _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hey Martjin,
The data for the Taxobox can be obtained from Wikipedia, itself, can't it? Imagine, we had a to generate the template for the animal "Elephant".
http://en.wikipedia.org/wiki/Elephant. We could write a python script which would be able to generate the data and store it in the Taxonomy template. And, we can choose it to generate equivalently in the display template.
We can generate the taxobox using the Lua Template or we can derive our own template which would make it more user friendly.
Regards, Ashwin
On Mon, Apr 2, 2012 at 5:35 PM, Martijn Hoekstra martijnhoekstra@gmail.comwrote:
Hey Ashwin,
Where are you getting the data for the taxobox? Does it need human supervision? If not, depending on how heavy this is, this could be a very cool Lua Template, where it can be used to generate it dynamically.
On Mon, Apr 2, 2012 at 1:47 PM, Ashwin Ravichandran ashwin107@gmail.com wrote:
Hey,
I am very much interested in the idea of a Taxobox. I have an interesting method of generating it using a basic Python script. It would gather all the data using the basic Python application and would be able to store it in the taxonomy templates and we can display according with respect to
the
display templates.
I will look into the merits and de - merits of the Generation of
Automatic
Taxobox and you will be receiving my proposal in the following week.
I just need to know whether I am on the right track here?
Regards, Ashwin
-- Ashwin.S.Ravichandran _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Could you clarify how exactly do you want to generate data for Taxobox? [1] Do you plan to use Natural Language Processing? Or do you want to do the opposite and you want to parse the template like guys from dbpedia.org?
[1] http://en.wikipedia.org/wiki/Template:Taxobox ----- Yury Katkov
On Mon, Apr 2, 2012 at 4:17 PM, Ashwin Ravichandran ashwin107@gmail.com wrote:
Hey Martjin,
The data for the Taxobox can be obtained from Wikipedia, itself, can't it? Imagine, we had a to generate the template for the animal "Elephant".
http://en.wikipedia.org/wiki/Elephant. We could write a python script which would be able to generate the data and store it in the Taxonomy template. And, we can choose it to generate equivalently in the display template.
We can generate the taxobox using the Lua Template or we can derive our own template which would make it more user friendly.
Regards, Ashwin
On Mon, Apr 2, 2012 at 5:35 PM, Martijn Hoekstra martijnhoekstra@gmail.comwrote:
Hey Ashwin,
Where are you getting the data for the taxobox? Does it need human supervision? If not, depending on how heavy this is, this could be a very cool Lua Template, where it can be used to generate it dynamically.
On Mon, Apr 2, 2012 at 1:47 PM, Ashwin Ravichandran ashwin107@gmail.com wrote:
Hey,
I am very much interested in the idea of a Taxobox. I have an interesting method of generating it using a basic Python script. It would gather all the data using the basic Python application and would be able to store it in the taxonomy templates and we can display according with respect to
the
display templates.
I will look into the merits and de - merits of the Generation of
Automatic
Taxobox and you will be receiving my proposal in the following week.
I just need to know whether I am on the right track here?
Regards, Ashwin
-- Ashwin.S.Ravichandran _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Ashwin.S.Ravichandran _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Yury,
I was thinking more in terms of what the Dbpedia people do. They have a strong algorithm which tends to utilize data. But, *I didn't think about NLP, thanks for the input. *:)
I was more into looking deriving the information from the page, itself. Like completely checking the page for all the info we need and then, storing it.
Cheers, Ashwin
Ah, that will certainly need human supervision for automatic generation. Chances that we will be able to interpret the line
Elephants are large land mammals in two extant genera of the family Elephantidae: Elephas and Loxodonta, with the third genus Mammuthus extinct.
into
Elephant := genus ( Elphantidae Elephas || Elephantidae Loxodonta || Elephantidae Mammuthus)
and try to resolve the genera from there at all seems pretty slim to get 100% correct, no matter how sophistocated the script is, but to write a script that can assist a human user to make a taxobox should be possible.
On Mon, Apr 2, 2012 at 2:17 PM, Ashwin Ravichandran ashwin107@gmail.com wrote:
Hey Martjin,
The data for the Taxobox can be obtained from Wikipedia, itself, can't it? Imagine, we had a to generate the template for the animal "Elephant".
http://en.wikipedia.org/wiki/Elephant. We could write a python script which would be able to generate the data and store it in the Taxonomy template. And, we can choose it to generate equivalently in the display template.
We can generate the taxobox using the Lua Template or we can derive our own template which would make it more user friendly.
Regards, Ashwin
On Mon, Apr 2, 2012 at 5:35 PM, Martijn Hoekstra martijnhoekstra@gmail.comwrote:
Hey Ashwin,
Where are you getting the data for the taxobox? Does it need human supervision? If not, depending on how heavy this is, this could be a very cool Lua Template, where it can be used to generate it dynamically.
On Mon, Apr 2, 2012 at 1:47 PM, Ashwin Ravichandran ashwin107@gmail.com wrote:
Hey,
I am very much interested in the idea of a Taxobox. I have an interesting method of generating it using a basic Python script. It would gather all the data using the basic Python application and would be able to store it in the taxonomy templates and we can display according with respect to
the
display templates.
I will look into the merits and de - merits of the Generation of
Automatic
Taxobox and you will be receiving my proposal in the following week.
I just need to know whether I am on the right track here?
Regards, Ashwin
-- Ashwin.S.Ravichandran _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Ashwin.S.Ravichandran _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Agreed, but we will be diving into further classification, won't we?
Imagine.
Elephant: = (Elephantidae Elephas || Elephantidae Loxodonta || Elephantidae Mammuthus)
But, we didn't specify what type of elephant?
Imagine, we have the Asian Elephant:
Then, we know the fact Asian Elephant: = (Elephantidae Elephas)
Whereas African Elephant: = (Elephantidae Loxodonta)
and the genera Extinct: = (Elephantidae Mammuthus).
With the above script, we might not be 100% correct, but at least we are trying for 100. Taxobox generation will be quite easy after that.
Cheers, Ashwin
Virtually all Wikipedia articles that need one already have a taxobox, which will be far easier to process than the lead sentence, so I'm not sure where the need for natural language processing comes in. Also, are you aware of the existing automatic taxobox system on en.wikipedia ( https://en.wikipedia.org/wiki/Template:Automatic_taxobox).
2012/4/2 Ashwin Ravichandran ashwin107@gmail.com
Agreed, but we will be diving into further classification, won't we?
Imagine.
Elephant: = (Elephantidae Elephas || Elephantidae Loxodonta || Elephantidae Mammuthus)
But, we didn't specify what type of elephant?
Imagine, we have the Asian Elephant:
Then, we know the fact Asian Elephant: = (Elephantidae Elephas)
Whereas African Elephant: = (Elephantidae Loxodonta)
and the genera Extinct: = (Elephantidae Mammuthus).
With the above script, we might not be 100% correct, but at least we are trying for 100. Taxobox generation will be quite easy after that.
Cheers, Ashwin _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Jelle,
I saw the Automatic taxobox. But, it doesn't seem quite user friendly and isn't that what we are working on so that it becomes more feasible? We can use actually use NLP to a major extend using the Text Extraction which can be quite helpful.
Cheers, Ashwin
I definitely agree that the Automatic taxobox needs a better user interface - that is the biggest obstacle to it's adoption. Right now there is a significant learning curve to being able to use it. I would support a project to improve the user interface of the existing Automatic taxobox, but frankly I don't see much value in using NLP to populate the data. In fact, I would hesitate to automatically populate any of the data for any taxobox from article content. The taxonomy in Wikipedia articles is notoriously unreliable and outdated and very frequently contradictory. Just look through the mess we have in our Zebra articles and you'll see what I mean. And if we can't get Zebras right, imagine what our taxonomy is like for arthropods!
Ryan Kaldari
On 4/2/12 8:20 AM, Ashwin Ravichandran wrote:
Jelle,
I saw the Automatic taxobox. But, it doesn't seem quite user friendly and isn't that what we are working on so that it becomes more feasible? We can use actually use NLP to a major extend using the Text Extraction which can be quite helpful.
Cheers, Ashwin _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Ryan,
*I definitely agree that the Automatic taxobox needs a better user
interface - that is the biggest obstacle to it's adoption. Right now there is a significant learning curve to being able to use it. I would support a project to improve the user interface of the existing Automatic taxobox,*
Thanks a ton for clearing that out. I will know what to put in the proposal then.
*but frankly I don't see much value in using NLP to populate the data. In fact, I would hesitate to automatically populate any of the data for any taxobox from article content. The taxonomy in Wikipedia articles is notoriously unreliable and outdated and very frequently contradictory. Just look through the mess we have in our Zebra articles and you'll see what I mean. And if we can't get Zebras right, imagine what our taxonomy is like for arthropods!* So what you are suggesting is that you want us to generate the taxonomy using other sites? Wiki = Unreliable?
Cheers, Ashwin
wikitech-l@lists.wikimedia.org