[Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

List overview All Threads
Download

newer

older

[Wikidata-l] Wikidata map...

Re: [Wikidata-l] Accelerating...

Michael Hale

6 Jul 2013 6 Jul '13

3:10 p.m.

I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Attachments:

attachment.htm (text/html — 3.7 KB)

Show replies by date

David Cuenca

6 Jul 6 Jul

5:49 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Thanks for sharing your thoughts Michael, it is also something that has been bothering me for a while and not only in programming, also in other technical domains like electronics.

In my opinion, the reason why programming (or technical design in general) couldn't follow the wiki world is because it has some structural differences that require a different approach. To start with, there is the problem of integration, where code solutions are usually part of a larger system and they cannot be isolated or combined with others blocks as easily as you would combine text fragments like in Wikipedia. I'm sure that all those 10 open file examples have some particularities about the operative system, method, supporting libraries, etc. The part of scavenging and gluing will be always there unless you follow the approach used in hardware design (wp: semiconductor intellectual property core).

Since that kind of modularity trend is hard to set up at large scale other than what is already stablished, it would be more practical to focus on what can be improved more easily, which is the scavenging. Instead of copying code fragments, it would be better to point to the fragment in the source code project itself, while at the same time providing the semantic tags necessary for describing that fragment. This can be done (more or less) with current existing semantic annotation technology (see thepund.itand Dbpedia Spotlight).

If this has not been done before it is maybe because semantic tools are now in the transition from "adaptation of an emerging technology" into "social appropriation of that technology". For the wiki concept it took 6 years for it to be transformed into wikipedia, more or less the same amount of years between SMW and Wikidata. Semantic annotation of code will eventually happen, how fast it will depend on interest in such a tool and the success of the supporting technologies.

Micru

On Sat, Jul 6, 2013 at 3:10 PM, Michael Hale hale.michael.jr@live.comwrote:

...

I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well.

Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this.

If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing.

I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Etiamsi omnes, ego non

Michael Hale

10:16 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I'm glad you mentioned that the same issue applies to electronics. I suppose I could have just referred to Moore's law instead of the relatively recent increasing size of datacenters. I like asking computers to work hard, but I find it hard to think of valuable things for them to do. You can play a new game or donate time to BOINC, but not very many great games are produced each year and BOINC typically runs algorithms that benefit humanity but not specifically you. For example, my genetics tests say I have an increased risk of prostate cancer, so I'd like to be able to tell Folding@home to focus on the proteins that are most relevant for the diseases I'm most likely to get. I still have hope that a more wiki-like model could work for developing software libraries though. The problems of technical design in software and hardware are similar, but software can be developed more fluidly and rapidly due to the lower barrier to entry and non-existent manufacturing costs. Essentially all electronics are designed and simulated with software prior to constructing physical prototypes these days. I've thought about the integration problem some, but I haven't ironed out how it would all work yet. I think standard object-oriented programming and modeling techniques have been absorbed by enough programmers that it might be worth a shot though. Essentially, each article would have a standard class and supporting data structures or file formats for the inputs and outputs of its algorithms. It would be like the typical flow chart or visual programming languages you can use with libraries like Modelica, but on a larger scale and the formats would often be more complex. So, like, you would have a class representing a cloud, with flags for different representations (density values in a cubic grid, collections of point particles, polygonal shape approximations, etc) which are used for different algorithms. So then you would have code that can convert between all of the representations, code for generating random clouds (with potentially lots of optional parameters to specify atmospheric conditions), code for outputting images of the generated clouds in different styles, and algorithms for manipulating them through time. Then if I wanted to see the effects on a specific cloud I've made drifting over the ocean in different atmospheric conditions, I could grab the code to instantiate 3D Euclidean space with a virtual camera, add some gravity, add some ground, add some water, add an atmosphere, add my cloud, and then simulate it with adjustable parameters for the accuracy and speed of computation. Now, there are a lot of details that leaves out, but I don't know of another way to easily mix capabilities from high-end graphics software and various specialized simulation algorithms in lots of ways. Graphics software typically gives you some simulation capabilities, and simulation software typically gives you some graphics functionality, but I want lots of both. I think having more semantic annotation tools will be great, but I don't spend most of my time doing searches. There is an astounding amount of information, data, and media on the internet, but it's not hard to find the edge if you really try. It's pretty crazy if you search for images of "blue bear" how many results come up, but if you search for "blue bear and green gorilla" you don't get anything useful. Then you get to face the craziness of how many options you have for combining a picture of a blue bear and a different picture of a green gorilla into one picture. I think it's interesting what they are trying with the Wolfram Alpha website, but they will always have to limit the time of the computations they allow you to do on their servers, so that's why I think we need better libraries to more easily program the computers we have direct control over. From: dacuetu@gmail.com Date: Sat, 6 Jul 2013 17:49:41 -0400 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Thanks for sharing your thoughts Michael, it is also something that has been bothering me for a while and not only in programming, also in other technical domains like electronics.

The part of scavenging and gluing will be always there unless you follow the approach used in hardware design (wp: semiconductor intellectual property core).

Micru

On Sat, Jul 6, 2013 at 3:10 PM, Michael Hale hale.michael.jr@live.com wrote:

If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing.

I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

_______________________________________________

Wikidata-l mailing list

Wikidata-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Etiamsi omnes, ego non _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

David Cuenca

7 Jul 7 Jul

2:12 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

On Sat, Jul 6, 2013 at 10:16 PM, Michael Hale hale.michael.jr@live.comwrote:

...

[...] I think we need better libraries to more easily program the computers we have direct control over.

There is another interesting approach which is http://livecode.com/ where code is generated from sort-of-natural-language statements. Maybe a semantic Wiktionary would be able to help there some day. As for attaching physical effects to 3d entities, it has been done many times. The closest that comes to that modularity is Unity3d. Maybe not advanced enough for scientific purposes, but it is quite used for videogames.

Micru

Michael Hale

2:36 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Yes, I'm just trying to do my part to help turn "someday" into "someday soon". The problem you quickly run into with approaches like LiveCode is that some natural words are much more difficult to translate into code than others. We ultimately want to be able to program with very high-level verbs like "draw", "visualize", "simulate", etc. Most of those are transitive though, so they will require immense amounts of code to specify the details of the objects passed to them if we want programming to approach the fluidity of our thoughts. So we have to figure out how much of those details we can have the computer learn in a somewhat automated way versus how much we have to manually specify in a lower-level fashion. I track the progress of video game engines rather closely. Unity is nice, but like all game engines it leans more toward looking good instead of being physically accurate. I'm just looking for a way to get the best of both worlds. Of course, a true software library that attempted to provide functionality covering all subjects in the encyclopedia would be useful for a lot more than just 3D systems.

From: dacuetu@gmail.com Date: Sun, 7 Jul 2013 14:12:19 -0400 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

On Sat, Jul 6, 2013 at 10:16 PM, Michael Hale hale.michael.jr@live.com wrote:

[...] I think we need better libraries to more easily program the computers we have direct control over. There is another interesting approach which is http://livecode.com/ where code is generated from sort-of-natural-language statements. Maybe a semantic Wiktionary would be able to help there some day.

As for attaching physical effects to 3d entities, it has been done many times. The closest that comes to that modularity is Unity3d. Maybe not advanced enough for scientific purposes, but it is quite used for videogames.

Micru

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Jona Christopher Sahnwaldt

4:47 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I think software and most other engineering products need a much higher level of coherence than artefacts that are consumed by humans, like Wikipedia. Wikipedia is full of inconsistencies and even contradictions. When I browse Wikipedia, I often stumble upon statements on one page that are contradicted by another page. That's not a big deal - sometimes I can easily tell which statement is correct (and fix the other pages), sometimes I can't, but either way, I am not a computer: I don't follow these statements blindly. In a computer program, such inconsistencies would lead to erratic behavior, i.e., bugs. This means that a completely open wiki process will not work for software.

In a way, a lot of open source software is developed in a restricted wiki way: someone proposes a change, but before it is merged, it is checked by people who (hopefully) know all the nooks and crannies of the existing code. A bit like edit-protected pages in Wikipedia: everyone can propose changes on the talk page, but only admins can actually make these changes.

Christopher

On 7 July 2013 04:16, Michael Hale hale.michael.jr@live.com wrote:

...

I'm glad you mentioned that the same issue applies to electronics. I suppose I could have just referred to Moore's law instead of the relatively recent increasing size of datacenters. I like asking computers to work hard, but I find it hard to think of valuable things for them to do. You can play a new game or donate time to BOINC, but not very many great games are produced each year and BOINC typically runs algorithms that benefit humanity but not specifically you. For example, my genetics tests say I have an increased risk of prostate cancer, so I'd like to be able to tell Folding@home to focus on the proteins that are most relevant for the diseases I'm most likely to get.

I still have hope that a more wiki-like model could work for developing software libraries though. The problems of technical design in software and hardware are similar, but software can be developed more fluidly and rapidly due to the lower barrier to entry and non-existent manufacturing costs. Essentially all electronics are designed and simulated with software prior to constructing physical prototypes these days.

I've thought about the integration problem some, but I haven't ironed out how it would all work yet. I think standard object-oriented programming and modeling techniques have been absorbed by enough programmers that it might be worth a shot though. Essentially, each article would have a standard class and supporting data structures or file formats for the inputs and outputs of its algorithms. It would be like the typical flow chart or visual programming languages you can use with libraries like Modelica, but on a larger scale and the formats would often be more complex. So, like, you would have a class representing a cloud, with flags for different representations (density values in a cubic grid, collections of point particles, polygonal shape approximations, etc) which are used for different algorithms. So then you would have code that can convert between all of the representations, code for generating random clouds (with potentially lots of optional parameters to specify atmospheric conditions), code for outputting images of the generated clouds in different styles, and algorithms for manipulating them through time. Then if I wanted to see the effects on a specific cloud I've made drifting over the ocean in different atmospheric conditions, I could grab the code to instantiate 3D Euclidean space with a virtual camera, add some gravity, add some ground, add some water, add an atmosphere, add my cloud, and then simulate it with adjustable parameters for the accuracy and speed of computation. Now, there are a lot of details that leaves out, but I don't know of another way to easily mix capabilities from high-end graphics software and various specialized simulation algorithms in lots of ways. Graphics software typically gives you some simulation capabilities, and simulation software typically gives you some graphics functionality, but I want lots of both.

I think having more semantic annotation tools will be great, but I don't spend most of my time doing searches. There is an astounding amount of information, data, and media on the internet, but it's not hard to find the edge if you really try. It's pretty crazy if you search for images of "blue bear" how many results come up, but if you search for "blue bear and green gorilla" you don't get anything useful. Then you get to face the craziness of how many options you have for combining a picture of a blue bear and a different picture of a green gorilla into one picture. I think it's interesting what they are trying with the Wolfram Alpha website, but they will always have to limit the time of the computations they allow you to do on their servers, so that's why I think we need better libraries to more easily program the computers we have direct control over.

From: dacuetu@gmail.com Date: Sat, 6 Jul 2013 17:49:41 -0400 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Thanks for sharing your thoughts Michael, it is also something that has been bothering me for a while and not only in programming, also in other technical domains like electronics.

In my opinion, the reason why programming (or technical design in general) couldn't follow the wiki world is because it has some structural differences that require a different approach. To start with, there is the problem of integration, where code solutions are usually part of a larger system and they cannot be isolated or combined with others blocks as easily as you would combine text fragments like in Wikipedia. I'm sure that all those 10 open file examples have some particularities about the operative system, method, supporting libraries, etc. The part of scavenging and gluing will be always there unless you follow the approach used in hardware design (wp: semiconductor intellectual property core).

Since that kind of modularity trend is hard to set up at large scale other than what is already stablished, it would be more practical to focus on what can be improved more easily, which is the scavenging. Instead of copying code fragments, it would be better to point to the fragment in the source code project itself, while at the same time providing the semantic tags necessary for describing that fragment. This can be done (more or less) with current existing semantic annotation technology (see thepund.it and Dbpedia Spotlight).

If this has not been done before it is maybe because semantic tools are now in the transition from "adaptation of an emerging technology" into "social appropriation of that technology". For the wiki concept it took 6 years for it to be transformed into wikipedia, more or less the same amount of years between SMW and Wikidata. Semantic annotation of code will eventually happen, how fast it will depend on interest in such a tool and the success of the supporting technologies.

Micru

On Sat, Jul 6, 2013 at 3:10 PM, Michael Hale hale.michael.jr@live.com wrote:

I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well.

Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this.

If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing.

I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Etiamsi omnes, ego non _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Michael Hale

5:07 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I don't think there is anything fundamentally different about writing code as opposed to a natural language that should prevent such a project. I just think that in practice we aren't as good at it yet. All software has bugs, but we can often use them for extended periods of time without encountering one. I can read many Wikipedia articles before I encounter an inconsistent statement. I think a project like this would just start with common source-control restrictions found in open-source and proprietary software like you have to have good code coverage in the test cases and you can't check in changes that break tests. That would require users to understand the code before they change it. People know that Wikipedia isn't perfect (neither are/were traditional encyclopedias), but it provides incomparable value regardless. Studies show that commercial software averages 20-30 bugs per 1000 lines of code. http://www.wired.com/software/coolapps/news/2004/12/66022

...

From: jc@sahnwaldt.de Date: Sun, 7 Jul 2013 22:47:07 +0200 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I think software and most other engineering products need a much higher level of coherence than artefacts that are consumed by humans, like Wikipedia. Wikipedia is full of inconsistencies and even contradictions. When I browse Wikipedia, I often stumble upon statements on one page that are contradicted by another page. That's not a big deal - sometimes I can easily tell which statement is correct (and fix the other pages), sometimes I can't, but either way, I am not a computer: I don't follow these statements blindly. In a computer program, such inconsistencies would lead to erratic behavior, i.e., bugs. This means that a completely open wiki process will not work for software.

In a way, a lot of open source software is developed in a restricted wiki way: someone proposes a change, but before it is merged, it is checked by people who (hopefully) know all the nooks and crannies of the existing code. A bit like edit-protected pages in Wikipedia: everyone can propose changes on the talk page, but only admins can actually make these changes.

Christopher

On 7 July 2013 04:16, Michael Hale hale.michael.jr@live.com wrote:

...
I'm glad you mentioned that the same issue applies to electronics. I suppose I could have just referred to Moore's law instead of the relatively recent increasing size of datacenters. I like asking computers to work hard, but I find it hard to think of valuable things for them to do. You can play a new game or donate time to BOINC, but not very many great games are produced each year and BOINC typically runs algorithms that benefit humanity but not specifically you. For example, my genetics tests say I have an increased risk of prostate cancer, so I'd like to be able to tell Folding@home to focus on the proteins that are most relevant for the diseases I'm most likely to get.

I still have hope that a more wiki-like model could work for developing software libraries though. The problems of technical design in software and hardware are similar, but software can be developed more fluidly and rapidly due to the lower barrier to entry and non-existent manufacturing costs. Essentially all electronics are designed and simulated with software prior to constructing physical prototypes these days.

I've thought about the integration problem some, but I haven't ironed out how it would all work yet. I think standard object-oriented programming and modeling techniques have been absorbed by enough programmers that it might be worth a shot though. Essentially, each article would have a standard class and supporting data structures or file formats for the inputs and outputs of its algorithms. It would be like the typical flow chart or visual programming languages you can use with libraries like Modelica, but on a larger scale and the formats would often be more complex. So, like, you would have a class representing a cloud, with flags for different representations (density values in a cubic grid, collections of point particles, polygonal shape approximations, etc) which are used for different algorithms. So then you would have code that can convert between all of the representations, code for generating random clouds (with potentially lots of optional parameters to specify atmospheric conditions), code for outputting images of the generated clouds in different styles, and algorithms for manipulating them through time. Then if I wanted to see the effects on a specific cloud I've made drifting over the ocean in different atmospheric conditions, I could grab the code to instantiate 3D Euclidean space with a virtual camera, add some gravity, add some ground, add some water, add an atmosphere, add my cloud, and then simulate it with adjustable parameters for the accuracy and speed of computation. Now, there are a lot of details that leaves out, but I don't know of another way to easily mix capabilities from high-end graphics software and various specialized simulation algorithms in lots of ways. Graphics software typically gives you some simulation capabilities, and simulation software typically gives you some graphics functionality, but I want lots of both.

I think having more semantic annotation tools will be great, but I don't spend most of my time doing searches. There is an astounding amount of information, data, and media on the internet, but it's not hard to find the edge if you really try. It's pretty crazy if you search for images of "blue bear" how many results come up, but if you search for "blue bear and green gorilla" you don't get anything useful. Then you get to face the craziness of how many options you have for combining a picture of a blue bear and a different picture of a green gorilla into one picture. I think it's interesting what they are trying with the Wolfram Alpha website, but they will always have to limit the time of the computations they allow you to do on their servers, so that's why I think we need better libraries to more easily program the computers we have direct control over.

From: dacuetu@gmail.com Date: Sat, 6 Jul 2013 17:49:41 -0400 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Thanks for sharing your thoughts Michael, it is also something that has been bothering me for a while and not only in programming, also in other technical domains like electronics.

In my opinion, the reason why programming (or technical design in general) couldn't follow the wiki world is because it has some structural differences that require a different approach. To start with, there is the problem of integration, where code solutions are usually part of a larger system and they cannot be isolated or combined with others blocks as easily as you would combine text fragments like in Wikipedia. I'm sure that all those 10 open file examples have some particularities about the operative system, method, supporting libraries, etc. The part of scavenging and gluing will be always there unless you follow the approach used in hardware design (wp: semiconductor intellectual property core).

Since that kind of modularity trend is hard to set up at large scale other than what is already stablished, it would be more practical to focus on what can be improved more easily, which is the scavenging. Instead of copying code fragments, it would be better to point to the fragment in the source code project itself, while at the same time providing the semantic tags necessary for describing that fragment. This can be done (more or less) with current existing semantic annotation technology (see thepund.it and Dbpedia Spotlight).

If this has not been done before it is maybe because semantic tools are now in the transition from "adaptation of an emerging technology" into "social appropriation of that technology". For the wiki concept it took 6 years for it to be transformed into wikipedia, more or less the same amount of years between SMW and Wikidata. Semantic annotation of code will eventually happen, how fast it will depend on interest in such a tool and the success of the supporting technologies.

Micru

On Sat, Jul 6, 2013 at 3:10 PM, Michael Hale hale.michael.jr@live.com wrote:

I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well.

Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this.

If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing.

I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Etiamsi omnes, ego non _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Michael Hale

5:27 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

...

105000 bugs in the Ubuntu Linux database currently. https://bugs.launchpad.net/ubuntu

From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Sun, 7 Jul 2013 17:07:23 -0400 Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

...

From: jc@sahnwaldt.de Date: Sun, 7 Jul 2013 22:47:07 +0200 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I think software and most other engineering products need a much higher level of coherence than artefacts that are consumed by humans, like Wikipedia. Wikipedia is full of inconsistencies and even contradictions. When I browse Wikipedia, I often stumble upon statements on one page that are contradicted by another page. That's not a big deal - sometimes I can easily tell which statement is correct (and fix the other pages), sometimes I can't, but either way, I am not a computer: I don't follow these statements blindly. In a computer program, such inconsistencies would lead to erratic behavior, i.e., bugs. This means that a completely open wiki process will not work for software.

In a way, a lot of open source software is developed in a restricted wiki way: someone proposes a change, but before it is merged, it is checked by people who (hopefully) know all the nooks and crannies of the existing code. A bit like edit-protected pages in Wikipedia: everyone can propose changes on the talk page, but only admins can actually make these changes.

Christopher

On 7 July 2013 04:16, Michael Hale hale.michael.jr@live.com wrote:

...
I'm glad you mentioned that the same issue applies to electronics. I suppose I could have just referred to Moore's law instead of the relatively recent increasing size of datacenters. I like asking computers to work hard, but I find it hard to think of valuable things for them to do. You can play a new game or donate time to BOINC, but not very many great games are produced each year and BOINC typically runs algorithms that benefit humanity but not specifically you. For example, my genetics tests say I have an increased risk of prostate cancer, so I'd like to be able to tell Folding@home to focus on the proteins that are most relevant for the diseases I'm most likely to get.

I still have hope that a more wiki-like model could work for developing software libraries though. The problems of technical design in software and hardware are similar, but software can be developed more fluidly and rapidly due to the lower barrier to entry and non-existent manufacturing costs. Essentially all electronics are designed and simulated with software prior to constructing physical prototypes these days.

I've thought about the integration problem some, but I haven't ironed out how it would all work yet. I think standard object-oriented programming and modeling techniques have been absorbed by enough programmers that it might be worth a shot though. Essentially, each article would have a standard class and supporting data structures or file formats for the inputs and outputs of its algorithms. It would be like the typical flow chart or visual programming languages you can use with libraries like Modelica, but on a larger scale and the formats would often be more complex. So, like, you would have a class representing a cloud, with flags for different representations (density values in a cubic grid, collections of point particles, polygonal shape approximations, etc) which are used for different algorithms. So then you would have code that can convert between all of the representations, code for generating random clouds (with potentially lots of optional parameters to specify atmospheric conditions), code for outputting images of the generated clouds in different styles, and algorithms for manipulating them through time. Then if I wanted to see the effects on a specific cloud I've made drifting over the ocean in different atmospheric conditions, I could grab the code to instantiate 3D Euclidean space with a virtual camera, add some gravity, add some ground, add some water, add an atmosphere, add my cloud, and then simulate it with adjustable parameters for the accuracy and speed of computation. Now, there are a lot of details that leaves out, but I don't know of another way to easily mix capabilities from high-end graphics software and various specialized simulation algorithms in lots of ways. Graphics software typically gives you some simulation capabilities, and simulation software typically gives you some graphics functionality, but I want lots of both.

I think having more semantic annotation tools will be great, but I don't spend most of my time doing searches. There is an astounding amount of information, data, and media on the internet, but it's not hard to find the edge if you really try. It's pretty crazy if you search for images of "blue bear" how many results come up, but if you search for "blue bear and green gorilla" you don't get anything useful. Then you get to face the craziness of how many options you have for combining a picture of a blue bear and a different picture of a green gorilla into one picture. I think it's interesting what they are trying with the Wolfram Alpha website, but they will always have to limit the time of the computations they allow you to do on their servers, so that's why I think we need better libraries to more easily program the computers we have direct control over.

From: dacuetu@gmail.com Date: Sat, 6 Jul 2013 17:49:41 -0400 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Thanks for sharing your thoughts Michael, it is also something that has been bothering me for a while and not only in programming, also in other technical domains like electronics.

In my opinion, the reason why programming (or technical design in general) couldn't follow the wiki world is because it has some structural differences that require a different approach. To start with, there is the problem of integration, where code solutions are usually part of a larger system and they cannot be isolated or combined with others blocks as easily as you would combine text fragments like in Wikipedia. I'm sure that all those 10 open file examples have some particularities about the operative system, method, supporting libraries, etc. The part of scavenging and gluing will be always there unless you follow the approach used in hardware design (wp: semiconductor intellectual property core).

Since that kind of modularity trend is hard to set up at large scale other than what is already stablished, it would be more practical to focus on what can be improved more easily, which is the scavenging. Instead of copying code fragments, it would be better to point to the fragment in the source code project itself, while at the same time providing the semantic tags necessary for describing that fragment. This can be done (more or less) with current existing semantic annotation technology (see thepund.it and Dbpedia Spotlight).

If this has not been done before it is maybe because semantic tools are now in the transition from "adaptation of an emerging technology" into "social appropriation of that technology". For the wiki concept it took 6 years for it to be transformed into wikipedia, more or less the same amount of years between SMW and Wikidata. Semantic annotation of code will eventually happen, how fast it will depend on interest in such a tool and the success of the supporting technologies.

Micru

On Sat, Jul 6, 2013 at 3:10 PM, Michael Hale hale.michael.jr@live.com wrote:

I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well.

Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this.

If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing.

I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Etiamsi omnes, ego non _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Jane Darnell

8 Jul 8 Jul

1:13 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...

I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Michael Hale

3:13 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

There are lots of code snippets scattered around the internet, but most of them can't be wired together in a simple flowchart manner. If you look at object libraries that are designed specifically for that purpose, like Modelica, you can do all sorts of neat engineering tasks like simulate the thermodynamics and power usage of a new refrigerator design. Then if your company is designing a new insulation material you would make a new "block" with the experimentally determined properties of your material to include in the programmatic flowchart to quickly calibrate other aspects of the refrigerator's design. To my understanding, Modelica is as big and good as it gets for code libraries that represent physically accurate objects. Often, the visual representation of those objects needs to be handled separately. As far as general purpose, standard programming libraries go, Mathematica is the best one I've found for quickly prototyping new functionality. A typical "web mashup" app or site will combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically use the phone's functionality, an extra library for better graphics support, a proprietary library or two made by the company, and a couple of web APIs. A similar story for desktop media-editing programs, business software, and high-end games except the libraries are often larger. But there aren't many software libraries that I would describe as huge. And there are even fewer that manage to scale the usefulness of the library equally with the size it occupies on disk.

Platform fragmentation (increase in number and popularity of smart phones and tablets) has proven to be a tremendous challenge for continuing to improve libraries. I now just have 15 different ways to draw a circle on different screens. The attempts to provide virtual machines with write-once run-anywhere functionality (Java and .NET) have failed, often due to customer lock-in reasons as much as platform fragmentation. Flash isn't designed to grow much beyond its current scope. The web standards can only progress as quickly as the least common denominator of functionality provided by other means, which is better than nothing I suppose. Mathematica has continued to improve their library (that's essentially what they sell), but they don't try to cover a lot of platforms. They also aren't open source and don't attempt to make the entire encyclopedia interactive and programmable. Open source attempts like the Boost C++ library don't seem to grow very quickly. But I think using Wikipedia articles as a scaffold for a massive open source, object-oriented library might be what is needed.

I have a few approaches I use to decide what code to write next. They can be arranged from most useful as an exercise to stay sharp in the long term to most immediately useful for a specific project. Sometimes I just write code in a vacuum. Like, I will just choose a simple task like making a 2D ball bounce around some stairs interactively and I will just spend a few hours writing it and rewriting it to be more efficient and easier to expand. It always gives me a greater appreciation for the types of details that can be specified to a computer (and hence the scope of the computational universe, or space of all computer programs). Like with the ball bouncing example you can get lost defining interesting options for the ball and the ground or in the geometry logic for calculating the intersections (like if the ball doesn't deform or if the stairs have certain constraints on their shape there are optimizations you can make). At the end of the exercise I still just have a ball bouncing down some stairs, but my mind feels like it has been on a journey. Sometimes I try to write code that I think a group of people would find useful. I will browse the articles in the areas of computer science category by popularity and start writing the first things I see that aren't already in the libraries I use. So I'll expand Mathematica's FindClusters function to support density based methods or I'll expand the RandomSample function to support files that are too large to fit in memory with a reservoir sampling algorithm. Finally, I write code for specific projects. I'm trying to genetically engineer turf grass that doesn't need to be cut, so I need to automate some of the work I do for GenBank imports and sequence comparisons. For all of those, if there was an organized place to put my code afterwards so it would fit into a larger useful library I would totally be willing to do a little bit of gluing work to help fit it all together.

...

Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Michael Hale

3:30 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Just a quick add-on to Jane and Paul about the scope of data in Wikidata. I think it is inevitable that Wikidata will start holding excess data that isn't being used in Wikipedia. Take the climate boxes that are on many city pages that show the average high and low per month for the last 5 years and whatnot. If we make a Lua module and template that generates those tables each time a new statement is added to specific properties in Wikidata then over time Wikidata will have a lot of historical weather data that isn't currently being displayed in Wikipedia. I think that's a good thing. No one deletes stuff these days. They just let the databases grow because storage is so cheap.

From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Mon, 8 Jul 2013 15:13:21 -0400 Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

...

Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Martynas Jusevičius

3:47 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Here's my approach to software code problems: we need less of it, not more. We need to remove domain logic from source code and move it into data, which can be managed and on which UI can be built. In that way we can build generic scalable software agents. That is the way to Semantic Web.

Martynas graphityhq.com

On Mon, Jul 8, 2013 at 10:13 PM, Michael Hale hale.michael.jr@live.com wrote:

...

There are lots of code snippets scattered around the internet, but most of them can't be wired together in a simple flowchart manner. If you look at object libraries that are designed specifically for that purpose, like Modelica, you can do all sorts of neat engineering tasks like simulate the thermodynamics and power usage of a new refrigerator design. Then if your company is designing a new insulation material you would make a new "block" with the experimentally determined properties of your material to include in the programmatic flowchart to quickly calibrate other aspects of the refrigerator's design. To my understanding, Modelica is as big and good as it gets for code libraries that represent physically accurate objects. Often, the visual representation of those objects needs to be handled separately. As far as general purpose, standard programming libraries go, Mathematica is the best one I've found for quickly prototyping new functionality. A typical "web mashup" app or site will combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically use the phone's functionality, an extra library for better graphics support, a proprietary library or two made by the company, and a couple of web APIs. A similar story for desktop media-editing programs, business software, and high-end games except the libraries are often larger. But there aren't many software libraries that I would describe as huge. And there are even fewer that manage to scale the usefulness of the library equally with the size it occupies on disk.

Platform fragmentation (increase in number and popularity of smart phones and tablets) has proven to be a tremendous challenge for continuing to improve libraries. I now just have 15 different ways to draw a circle on different screens. The attempts to provide virtual machines with write-once run-anywhere functionality (Java and .NET) have failed, often due to customer lock-in reasons as much as platform fragmentation. Flash isn't designed to grow much beyond its current scope. The web standards can only progress as quickly as the least common denominator of functionality provided by other means, which is better than nothing I suppose. Mathematica has continued to improve their library (that's essentially what they sell), but they don't try to cover a lot of platforms. They also aren't open source and don't attempt to make the entire encyclopedia interactive and programmable. Open source attempts like the Boost C++ library don't seem to grow very quickly. But I think using Wikipedia articles as a scaffold for a massive open source, object-oriented library might be what is needed.

I have a few approaches I use to decide what code to write next. They can be arranged from most useful as an exercise to stay sharp in the long term to most immediately useful for a specific project. Sometimes I just write code in a vacuum. Like, I will just choose a simple task like making a 2D ball bounce around some stairs interactively and I will just spend a few hours writing it and rewriting it to be more efficient and easier to expand. It always gives me a greater appreciation for the types of details that can be specified to a computer (and hence the scope of the computational universe, or space of all computer programs). Like with the ball bouncing example you can get lost defining interesting options for the ball and the ground or in the geometry logic for calculating the intersections (like if the ball doesn't deform or if the stairs have certain constraints on their shape there are optimizations you can make). At the end of the exercise I still just have a ball bouncing down some stairs, but my mind feels like it has been on a journey. Sometimes I try to write code that I think a group of people would find useful. I will browse the articles in the areas of computer science category by popularity and start writing the first things I see that aren't already in the libraries I use. So I'll expand Mathematica's FindClusters function to support density based methods or I'll expand the RandomSample function to support files that are too large to fit in memory with a reservoir sampling algorithm. Finally, I write code for specific projects. I'm trying to genetically engineer turf grass that doesn't need to be cut, so I need to automate some of the work I do for GenBank imports and sequence comparisons. For all of those, if there was an organized place to put my code afterwards so it would fit into a larger useful library I would totally be willing to do a little bit of gluing work to help fit it all together.

...
Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org

...
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Michael Hale

3:57 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

In the functional programming language family (think Lisp) there is no fundamental distinction between code and data.

...

Date: Mon, 8 Jul 2013 22:47:46 +0300 From: martynas@graphity.org To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Here's my approach to software code problems: we need less of it, not more. We need to remove domain logic from source code and move it into data, which can be managed and on which UI can be built. In that way we can build generic scalable software agents. That is the way to Semantic Web.

Martynas graphityhq.com

On Mon, Jul 8, 2013 at 10:13 PM, Michael Hale hale.michael.jr@live.com wrote:

...
There are lots of code snippets scattered around the internet, but most of them can't be wired together in a simple flowchart manner. If you look at object libraries that are designed specifically for that purpose, like Modelica, you can do all sorts of neat engineering tasks like simulate the thermodynamics and power usage of a new refrigerator design. Then if your company is designing a new insulation material you would make a new "block" with the experimentally determined properties of your material to include in the programmatic flowchart to quickly calibrate other aspects of the refrigerator's design. To my understanding, Modelica is as big and good as it gets for code libraries that represent physically accurate objects. Often, the visual representation of those objects needs to be handled separately. As far as general purpose, standard programming libraries go, Mathematica is the best one I've found for quickly prototyping new functionality. A typical "web mashup" app or site will combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically use the phone's functionality, an extra library for better graphics support, a proprietary library or two made by the company, and a couple of web APIs. A similar story for desktop media-editing programs, business software, and high-end games except the libraries are often larger. But there aren't many software libraries that I would describe as huge. And there are even fewer that manage to scale the usefulness of the library equally with the size it occupies on disk.

Platform fragmentation (increase in number and popularity of smart phones and tablets) has proven to be a tremendous challenge for continuing to improve libraries. I now just have 15 different ways to draw a circle on different screens. The attempts to provide virtual machines with write-once run-anywhere functionality (Java and .NET) have failed, often due to customer lock-in reasons as much as platform fragmentation. Flash isn't designed to grow much beyond its current scope. The web standards can only progress as quickly as the least common denominator of functionality provided by other means, which is better than nothing I suppose. Mathematica has continued to improve their library (that's essentially what they sell), but they don't try to cover a lot of platforms. They also aren't open source and don't attempt to make the entire encyclopedia interactive and programmable. Open source attempts like the Boost C++ library don't seem to grow very quickly. But I think using Wikipedia articles as a scaffold for a massive open source, object-oriented library might be what is needed.

I have a few approaches I use to decide what code to write next. They can be arranged from most useful as an exercise to stay sharp in the long term to most immediately useful for a specific project. Sometimes I just write code in a vacuum. Like, I will just choose a simple task like making a 2D ball bounce around some stairs interactively and I will just spend a few hours writing it and rewriting it to be more efficient and easier to expand. It always gives me a greater appreciation for the types of details that can be specified to a computer (and hence the scope of the computational universe, or space of all computer programs). Like with the ball bouncing example you can get lost defining interesting options for the ball and the ground or in the geometry logic for calculating the intersections (like if the ball doesn't deform or if the stairs have certain constraints on their shape there are optimizations you can make). At the end of the exercise I still just have a ball bouncing down some stairs, but my mind feels like it has been on a journey. Sometimes I try to write code that I think a group of people would find useful. I will browse the articles in the areas of computer science category by popularity and start writing the first things I see that aren't already in the libraries I use. So I'll expand Mathematica's FindClusters function to support density based methods or I'll expand the RandomSample function to support files that are too large to fit in memory with a reservoir sampling algorithm. Finally, I write code for specific projects. I'm trying to genetically engineer turf grass that doesn't need to be cut, so I need to automate some of the work I do for GenBank imports and sequence comparisons. For all of those, if there was an organized place to put my code afterwards so it would fit into a larger useful library I would totally be willing to do a little bit of gluing work to help fit it all together.

...
Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org

...
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Michael Hale

4 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

http://en.wikipedia.org/wiki/Homoiconicity

From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Mon, 8 Jul 2013 15:57:44 -0400 Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

In the functional programming language family (think Lisp) there is no fundamental distinction between code and data.

...

Date: Mon, 8 Jul 2013 22:47:46 +0300 From: martynas@graphity.org To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Here's my approach to software code problems: we need less of it, not more. We need to remove domain logic from source code and move it into data, which can be managed and on which UI can be built. In that way we can build generic scalable software agents. That is the way to Semantic Web.

Martynas graphityhq.com

On Mon, Jul 8, 2013 at 10:13 PM, Michael Hale hale.michael.jr@live.com wrote:

...
There are lots of code snippets scattered around the internet, but most of them can't be wired together in a simple flowchart manner. If you look at object libraries that are designed specifically for that purpose, like Modelica, you can do all sorts of neat engineering tasks like simulate the thermodynamics and power usage of a new refrigerator design. Then if your company is designing a new insulation material you would make a new "block" with the experimentally determined properties of your material to include in the programmatic flowchart to quickly calibrate other aspects of the refrigerator's design. To my understanding, Modelica is as big and good as it gets for code libraries that represent physically accurate objects. Often, the visual representation of those objects needs to be handled separately. As far as general purpose, standard programming libraries go, Mathematica is the best one I've found for quickly prototyping new functionality. A typical "web mashup" app or site will combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically use the phone's functionality, an extra library for better graphics support, a proprietary library or two made by the company, and a couple of web APIs. A similar story for desktop media-editing programs, business software, and high-end games except the libraries are often larger. But there aren't many software libraries that I would describe as huge. And there are even fewer that manage to scale the usefulness of the library equally with the size it occupies on disk.

Platform fragmentation (increase in number and popularity of smart phones and tablets) has proven to be a tremendous challenge for continuing to improve libraries. I now just have 15 different ways to draw a circle on different screens. The attempts to provide virtual machines with write-once run-anywhere functionality (Java and .NET) have failed, often due to customer lock-in reasons as much as platform fragmentation. Flash isn't designed to grow much beyond its current scope. The web standards can only progress as quickly as the least common denominator of functionality provided by other means, which is better than nothing I suppose. Mathematica has continued to improve their library (that's essentially what they sell), but they don't try to cover a lot of platforms. They also aren't open source and don't attempt to make the entire encyclopedia interactive and programmable. Open source attempts like the Boost C++ library don't seem to grow very quickly. But I think using Wikipedia articles as a scaffold for a massive open source, object-oriented library might be what is needed.

I have a few approaches I use to decide what code to write next. They can be arranged from most useful as an exercise to stay sharp in the long term to most immediately useful for a specific project. Sometimes I just write code in a vacuum. Like, I will just choose a simple task like making a 2D ball bounce around some stairs interactively and I will just spend a few hours writing it and rewriting it to be more efficient and easier to expand. It always gives me a greater appreciation for the types of details that can be specified to a computer (and hence the scope of the computational universe, or space of all computer programs). Like with the ball bouncing example you can get lost defining interesting options for the ball and the ground or in the geometry logic for calculating the intersections (like if the ball doesn't deform or if the stairs have certain constraints on their shape there are optimizations you can make). At the end of the exercise I still just have a ball bouncing down some stairs, but my mind feels like it has been on a journey. Sometimes I try to write code that I think a group of people would find useful. I will browse the articles in the areas of computer science category by popularity and start writing the first things I see that aren't already in the libraries I use. So I'll expand Mathematica's FindClusters function to support density based methods or I'll expand the RandomSample function to support files that are too large to fit in memory with a reservoir sampling algorithm. Finally, I write code for specific projects. I'm trying to genetically engineer turf grass that doesn't need to be cut, so I need to automate some of the work I do for GenBank imports and sequence comparisons. For all of those, if there was an organized place to put my code afterwards so it would fit into a larger useful library I would totally be willing to do a little bit of gluing work to help fit it all together.

...
Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org

...
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Martynas Jusevičius

4:01 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Yes, that is one of the reasons functional languages are getting popular: https://www.fpcomplete.com/blog/2012/04/the-downfall-of-imperative-programmi... With PHP and JavaScript being the most widespread (and still misused) languages we will not get there soon, however.

On Mon, Jul 8, 2013 at 10:57 PM, Michael Hale hale.michael.jr@live.com wrote:

...

In the functional programming language family (think Lisp) there is no fundamental distinction between code and data.

...
Date: Mon, 8 Jul 2013 22:47:46 +0300 From: martynas@graphity.org

...
To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Here's my approach to software code problems: we need less of it, not more. We need to remove domain logic from source code and move it into data, which can be managed and on which UI can be built. In that way we can build generic scalable software agents. That is the way to Semantic Web.

Martynas graphityhq.com

On Mon, Jul 8, 2013 at 10:13 PM, Michael Hale hale.michael.jr@live.com wrote:

...
There are lots of code snippets scattered around the internet, but most of them can't be wired together in a simple flowchart manner. If you look at object libraries that are designed specifically for that purpose, like Modelica, you can do all sorts of neat engineering tasks like simulate the thermodynamics and power usage of a new refrigerator design. Then if your company is designing a new insulation material you would make a new "block" with the experimentally determined properties of your material to include in the programmatic flowchart to quickly calibrate other aspects of the refrigerator's design. To my understanding, Modelica is as big and good as it gets for code libraries that represent physically accurate objects. Often, the visual representation of those objects needs to be handled separately. As far as general purpose, standard programming libraries go, Mathematica is the best one I've found for quickly prototyping new functionality. A typical "web mashup" app or site will combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically use the phone's functionality, an extra library for better graphics support, a proprietary library or two made by the company, and a couple of web APIs. A similar story for desktop media-editing programs, business software, and high-end games except the libraries are often larger. But there aren't many software libraries that I would describe as huge. And there are even fewer that manage to scale the usefulness of the library equally with the size it occupies on disk.

Platform fragmentation (increase in number and popularity of smart phones and tablets) has proven to be a tremendous challenge for continuing to improve libraries. I now just have 15 different ways to draw a circle on different screens. The attempts to provide virtual machines with write-once run-anywhere functionality (Java and .NET) have failed, often due to customer lock-in reasons as much as platform fragmentation. Flash isn't designed to grow much beyond its current scope. The web standards can only progress as quickly as the least common denominator of functionality provided by other means, which is better than nothing I suppose. Mathematica has continued to improve their library (that's essentially what they sell), but they don't try to cover a lot of platforms. They also aren't open source and don't attempt to make the entire encyclopedia interactive and programmable. Open source attempts like the Boost C++ library don't seem to grow very quickly. But I think using Wikipedia articles as a scaffold for a massive open source, object-oriented library might be what is needed.

I have a few approaches I use to decide what code to write next. They can be arranged from most useful as an exercise to stay sharp in the long term to most immediately useful for a specific project. Sometimes I just write code in a vacuum. Like, I will just choose a simple task like making a 2D ball bounce around some stairs interactively and I will just spend a few hours writing it and rewriting it to be more efficient and easier to expand. It always gives me a greater appreciation for the types of details that can be specified to a computer (and hence the scope of the computational universe, or space of all computer programs). Like with the ball bouncing example you can get lost defining interesting options for the ball and the ground or in the geometry logic for calculating the intersections (like if the ball doesn't deform or if the stairs have certain constraints on their shape there are optimizations you can make). At the end of the exercise I still just have a ball bouncing down some stairs, but my mind feels like it has been on a journey. Sometimes I try to write code that I think a group of people would find useful. I will browse the articles in the areas of computer science category by popularity and start writing the first things I see that aren't already in the libraries I use. So I'll expand Mathematica's FindClusters function to support density based methods or I'll expand the RandomSample function to support files that are too large to fit in memory with a reservoir sampling algorithm. Finally, I write code for specific projects. I'm trying to genetically engineer turf grass that doesn't need to be cut, so I need to automate some of the work I do for GenBank imports and sequence comparisons. For all of those, if there was an organized place to put my code afterwards so it would fit into a larger useful library I would totally be willing to do a little bit of gluing work to help fit it all together.

...
Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org

...
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Michael Hale

4:10 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

All positive change is gradual. In the meantime, for those of us with ample free time for coding, it'd be nice to have a place to check in code and unit tests that are organized roughly in the same way as Wikipedia. Maybe such a project already exists and I just haven't found it yet.

...

Date: Mon, 8 Jul 2013 23:01:50 +0300 From: martynas@graphity.org To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Yes, that is one of the reasons functional languages are getting popular: https://www.fpcomplete.com/blog/2012/04/the-downfall-of-imperative-programmi... With PHP and JavaScript being the most widespread (and still misused) languages we will not get there soon, however.

On Mon, Jul 8, 2013 at 10:57 PM, Michael Hale hale.michael.jr@live.com wrote:

...
In the functional programming language family (think Lisp) there is no fundamental distinction between code and data.

...
Date: Mon, 8 Jul 2013 22:47:46 +0300 From: martynas@graphity.org

...
To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Here's my approach to software code problems: we need less of it, not more. We need to remove domain logic from source code and move it into data, which can be managed and on which UI can be built. In that way we can build generic scalable software agents. That is the way to Semantic Web.

Martynas graphityhq.com

On Mon, Jul 8, 2013 at 10:13 PM, Michael Hale hale.michael.jr@live.com wrote:

...
There are lots of code snippets scattered around the internet, but most of them can't be wired together in a simple flowchart manner. If you look at object libraries that are designed specifically for that purpose, like Modelica, you can do all sorts of neat engineering tasks like simulate the thermodynamics and power usage of a new refrigerator design. Then if your company is designing a new insulation material you would make a new "block" with the experimentally determined properties of your material to include in the programmatic flowchart to quickly calibrate other aspects of the refrigerator's design. To my understanding, Modelica is as big and good as it gets for code libraries that represent physically accurate objects. Often, the visual representation of those objects needs to be handled separately. As far as general purpose, standard programming libraries go, Mathematica is the best one I've found for quickly prototyping new functionality. A typical "web mashup" app or site will combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically use the phone's functionality, an extra library for better graphics support, a proprietary library or two made by the company, and a couple of web APIs. A similar story for desktop media-editing programs, business software, and high-end games except the libraries are often larger. But there aren't many software libraries that I would describe as huge. And there are even fewer that manage to scale the usefulness of the library equally with the size it occupies on disk.

Platform fragmentation (increase in number and popularity of smart phones and tablets) has proven to be a tremendous challenge for continuing to improve libraries. I now just have 15 different ways to draw a circle on different screens. The attempts to provide virtual machines with write-once run-anywhere functionality (Java and .NET) have failed, often due to customer lock-in reasons as much as platform fragmentation. Flash isn't designed to grow much beyond its current scope. The web standards can only progress as quickly as the least common denominator of functionality provided by other means, which is better than nothing I suppose. Mathematica has continued to improve their library (that's essentially what they sell), but they don't try to cover a lot of platforms. They also aren't open source and don't attempt to make the entire encyclopedia interactive and programmable. Open source attempts like the Boost C++ library don't seem to grow very quickly. But I think using Wikipedia articles as a scaffold for a massive open source, object-oriented library might be what is needed.

I have a few approaches I use to decide what code to write next. They can be arranged from most useful as an exercise to stay sharp in the long term to most immediately useful for a specific project. Sometimes I just write code in a vacuum. Like, I will just choose a simple task like making a 2D ball bounce around some stairs interactively and I will just spend a few hours writing it and rewriting it to be more efficient and easier to expand. It always gives me a greater appreciation for the types of details that can be specified to a computer (and hence the scope of the computational universe, or space of all computer programs). Like with the ball bouncing example you can get lost defining interesting options for the ball and the ground or in the geometry logic for calculating the intersections (like if the ball doesn't deform or if the stairs have certain constraints on their shape there are optimizations you can make). At the end of the exercise I still just have a ball bouncing down some stairs, but my mind feels like it has been on a journey. Sometimes I try to write code that I think a group of people would find useful. I will browse the articles in the areas of computer science category by popularity and start writing the first things I see that aren't already in the libraries I use. So I'll expand Mathematica's FindClusters function to support density based methods or I'll expand the RandomSample function to support files that are too large to fit in memory with a reservoir sampling algorithm. Finally, I write code for specific projects. I'm trying to genetically engineer turf grass that doesn't need to be cut, so I need to automate some of the work I do for GenBank imports and sequence comparisons. For all of those, if there was an organized place to put my code afterwards so it would fit into a larger useful library I would totally be willing to do a little bit of gluing work to help fit it all together.

...
Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org

...
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to build my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of datacentric programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the history of global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com:

...
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well. Are you satisfied with our progress in increasing software sophistication as compared to, say, increasing the size of datacenters? Personally, I think there is still too much "reinventing the wheel" going on, and the best way to get to software that is complex enough to do things like high-fidelity simulations of virtual worlds is to essentially crowd-source the translation of Wikipedia into code. The existing structure of the Wikipedia articles would serve as a scaffold for a large, consistently designed, open-source software library. Then, whether I was making software for weather prediction and I needed code to slowly simulate physically accurate clouds or I was making a game and I needed code to quickly draw stylized clouds I could just go to the article for clouds, click on C++ (or whatever programming language is appropriate) and then find some useful chunks of code. Every article could link to useful algorithms, data structures, and interface designs that are relevant to the subject of the article. You could also find data-centric programs too. Like, maybe a JavaScript weather statistics browser and visualizer that accesses Wikidata. The big advantage would be that constraining the design of the library to the structure of Wikipedia would handle the encapsulation and modularity aspects of the software engineering so that the components could improve independently. Creating a simulation or visualization where you zoom in from a whole cloud to see its constituent microscopic particles is certainly doable right now, but it would be a lot easier with a function library like this. If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing. I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

David Cuenca

7:32 p.m.

New subject: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Wikidata seems like a good platform for functional computing, it "just" needs Lisp-like lists (which would be an expansion of queries/tree-searches) and processing capabilities. What you say it is also true, it would be ahead of the times, because high-level computing languages never expanded as much as imperative languages (probably because the processing power and the need was not there yet).

Wikidata as an AI... how far away is that singularity? :)

Micru

On Mon, Jul 8, 2013 at 4:10 PM, Michael Hale hale.michael.jr@live.comwrote:

...

All positive change is gradual. In the meantime, for those of us with ample free time for coding, it'd be nice to have a place to check in code and unit tests that are organized roughly in the same way as Wikipedia. Maybe such a project already exists and I just haven't found it yet.

...
Date: Mon, 8 Jul 2013 23:01:50 +0300

...
From: martynas@graphity.org To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata

and improved Wikicode

...
Yes, that is one of the reasons functional languages are getting popular:

https://www.fpcomplete.com/blog/2012/04/the-downfall-of-imperative-programmi...

...
With PHP and JavaScript being the most widespread (and still misused) languages we will not get there soon, however.

On Mon, Jul 8, 2013 at 10:57 PM, Michael Hale hale.michael.jr@live.com

wrote:

...
...
In the functional programming language family (think Lisp) there is no fundamental distinction between code and data.

...
Date: Mon, 8 Jul 2013 22:47:46 +0300 From: martynas@graphity.org

...
To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Accelerating software innovation with

Wikidata

...
...
...
and improved Wikicode

Here's my approach to software code problems: we need less of it, not more. We need to remove domain logic from source code and move it into data, which can be managed and on which UI can be built. In that way we can build generic scalable software agents. That is the way to Semantic Web.

Martynas graphityhq.com

On Mon, Jul 8, 2013 at 10:13 PM, Michael Hale <

hale.michael.jr@live.com>

...
...
...
wrote:

...
There are lots of code snippets scattered around the internet, but

most

...
...
...
...
of them can't be wired together in a simple flowchart manner. If you

look

...
...
...
...
at object libraries that are designed specifically for that purpose,

like

...
...
...
...
Modelica, you can do all sorts of neat engineering tasks like

simulate

...
...
...
...
the thermodynamics and power usage of a new refrigerator design. Then if your company is designing a new insulation material you would make a new "block" with the experimentally determined properties of your material to include in the programmatic flowchart to quickly calibrate other aspects of the refrigerator's design. To my understanding, Modelica is as big and

good

...
...
...
...
as it gets for code libraries that represent physically accurate

objects.

...
...
...
...
Often, the visual representation of those objects needs to be

handled

...
...
...
...
separately. As far as general purpose, standard programming

libraries

...
...
...
...
go, Mathematica is the best one I've found for quickly prototyping new functionality. A typical "web mashup" app or site will combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically use the

phone's

...
...
...
...
functionality, an extra library for better graphics support, a proprietary library or two made by the company, and a couple of web APIs. A

similar

...
...
...
...
story for desktop media-editing programs, business software, and high-end games except the libraries are often larger. But there aren't many software libraries that I would describe as huge. And there are even fewer

that

...
...
...
...
manage to scale the usefulness of the library equally with the size

it

...
...
...
...
occupies on disk.

Platform fragmentation (increase in number and popularity of smart phones and tablets) has proven to be a tremendous challenge for continuing

to

...
...
...
...
improve libraries. I now just have 15 different ways to draw a

circle on

...
...
...
...
different screens. The attempts to provide virtual machines with write-once run-anywhere functionality (Java and .NET) have failed, often due to customer lock-in reasons as much as platform fragmentation. Flash

isn't

...
...
...
...
designed to grow much beyond its current scope. The web standards

can

...
...
...
...
only progress as quickly as the least common denominator of functionality provided by other means, which is better than nothing I suppose. Mathematica has continued to improve their library (that's essentially what they sell), but they don't try to cover a lot of platforms. They also aren't

open

...
...
...
...
source and don't attempt to make the entire encyclopedia interactive and programmable. Open source attempts like the Boost C++ library don't

seem

...
...
...
...
to grow very quickly. But I think using Wikipedia articles as a

scaffold

...
...
...
...
for a massive open source, object-oriented library might be what is

needed.

...
...
...
...
I have a few approaches I use to decide what code to write next.

They

...
...
...
...
can be arranged from most useful as an exercise to stay sharp in the long

term

...
...
...
...
to most immediately useful for a specific project. Sometimes I just

write

...
...
...
...
code in a vacuum. Like, I will just choose a simple task like making a 2D ball bounce around some stairs interactively and I will just spend a few hours writing it and rewriting it to be more efficient and easier to

expand.

...
...
...
...
It always gives me a greater appreciation for the types of details

that can

...
...
...
...
be specified to a computer (and hence the scope of the computational universe, or space of all computer programs). Like with the ball bouncing

example

...
...
...
...
you can get lost defining interesting options for the ball and the

ground or

...
...
...
...
in the geometry logic for calculating the intersections (like if the

ball

...
...
...
...
doesn't deform or if the stairs have certain constraints on their

shape

...
...
...
...
there are optimizations you can make). At the end of the exercise I still just have a ball bouncing down some stairs, but my mind feels like

it

...
...
...
...
has been on a journey. Sometimes I try to write code that I think a

group of

...
...
...
...
people would find useful. I will browse the articles in the areas of computer science category by popularity and start writing the first things I see that aren't already in the libraries I use. So I'll expand Mathematica's FindClusters function to support density based methods or I'll

expand

...
...
...
...
the RandomSample function to support files that are too large to fit in memory with a reservoir sampling algorithm. Finally, I write code for

specific

...
...
...
...
projects. I'm trying to genetically engineer turf grass that doesn't need to be cut, so I need to automate some of the work I do for GenBank

imports

...
...
...
...
and sequence comparisons. For all of those, if there was an organized

place

...
...
...
...
to put my code afterwards so it would fit into a larger useful library

I

...
...
...
...
would totally be willing to do a little bit of gluing work to help fit it

all

...
...
...
...
together.

...
Date: Mon, 8 Jul 2013 19:13:54 +0200 From: jane023@gmail.com To: wikidata-l@lists.wikimedia.org

...
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

I am all for a "dictionary of code snippets", but as with all dictionaries, you need a way to group them, either by alphabetical order or "birth date". It sounds like you have an idea how to group those code samples, so why don't you share it? I would love to

build

...
...
...
...
...
my own "pipeline" from a series of algorithms that someone else published for me to reuse. I am also for more sharing of

datacentric

...
...
...
...
...
programs, but where would the data be stored? Wikidata is for data that can be used by Wikipedia, not by other projects, though maybe someday we will find the need to put actual weather measurements in Wikidata for some oddball Wikisource project tp do with the

history of

...
...
...
...
...
global warming or something like that.

I just don't quite see how your idea would translate in the Wiki(p/m)edia world into a project that could be indexed.

But then I never felt the need for "high-fidelity simulations of virtual worlds" either.

2013/7/6, Michael Hale hale.michael.jr@live.com: > I have been pondering this for some time, and I would like some > feedback. I > figure there are many programmers on this list, but I think

others

...
...
...
...
...
> might > find it interesting as well. > Are you satisfied with our progress in increasing software > sophistication as > compared to, say, increasing the size of datacenters?

Personally, I

...
...
...
...
...
> think > there is still too much "reinventing the wheel" going on, and the > best > way > to get to software that is complex enough to do things like > high-fidelity > simulations of virtual worlds is to essentially crowd-source the > translation > of Wikipedia into code. The existing structure of the Wikipedia > articles > would serve as a scaffold for a large, consistently designed, > open-source > software library. Then, whether I was making software for weather > prediction > and I needed code to slowly simulate physically accurate clouds

or I

...
...
...
...
...
> was > making a game and I needed code to quickly draw stylized clouds I > could > just > go to the article for clouds, click on C++ (or whatever

programming

...
...
...
...
...
> language > is appropriate) and then find some useful chunks of code. Every > article > could link to useful algorithms, data structures, and interface > designs > that > are relevant to the subject of the article. You could also find > data-centric > programs too. Like, maybe a JavaScript weather statistics

browser and

...
...
...
...
...
> visualizer that accesses Wikidata. The big advantage would be

that

...
...
...
...
...
> constraining the design of the library to the structure of

Wikipedia

...
...
...
...
...
> would > handle the encapsulation and modularity aspects of the software > engineering > so that the components could improve independently. Creating a > simulation or > visualization where you zoom in from a whole cloud to see its > constituent > microscopic particles is certainly doable right now, but it

would be

...
...
...
...
...
> a > lot > easier with a function library like this. > If you look at the existing Wikicode and Rosetta Code the code > samples > are > small and isolated. They will show, for example, how to open a

file

...
...
...
...
...
> in > 10 > different languages. However, the search engines already do a

great

...
...
...
...
...
> job > of > helping us find those types of code samples across blog posts of > people > who > have had to do that specific task before. However, a problem

that I

...
...
...
...
...
> run > into > frequently that the search engines don't help me solve is if I

read a

...
...
...
...
...
> nanoelectronics paper and I want to do a simulation of the

physical

...
...
...
...
...
> system > they describe I often have to go to the websites of several

different

...
...
...
...
...
> professors and do a fair bit of manual work to assemble their > different > programs into a pipeline, and then the result of my hacking is

not

...
...
...
...
...
> easy > to > expand to new scenarios. We've made enough progress on Wikipedia

that

...
...
...
...
...
> I > can > often just click on a couple of articles to get an understanding

of

...
...
...
...
...
> the > paper, but if I want to experiment with the ideas in a software > context > I > have to do a lot of scavenging and gluing. > I'm not yet convinced that this could work. Maybe Wikipedia

works so

...
...
...
...
...
> well > because the internet reached a point where there was so much > redundant > knowledge listed in many places that there was immense social and > economic > pressure to utilize knowledgeable people to summarize it in a

free

...
...
...
...
...
> encyclopedia. Maybe the total amount of software that has been > written > is > still too small, there are still too few programmers, and it's

still

...
...
...
...
...
> too > difficult compared to writing natural languages for the

crowdsourcing

...
...
...
...
...
> dynamics to work. There have been a lot of successful open-source > software > projects of course, but most of them are focused on creating

software

...
...
...
...
...
> for a > specific task instead of library components that cover all of the > knowledge > in the encyclopedia.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Etiamsi omnes, ego non

4188

Age (days ago)

4190

Last active (days ago)

wikidata@lists.wikimedia.org

16 comments

5 participants

tags (0)

participants (5)

David Cuenca
Jane Darnell
Jona Christopher Sahnwaldt
Martynas Jusevičius
Michael Hale