Bootstrap.  You always have to bootstrap a system, so having the need for labels, aliases, and documentation for now makes sense to me.  It will be nice possibly to get to the point we no longer need to bootstrap, but have a running system that can abstractly describe itself, but I wouldn't make that a solid goal.

Regarding short descriptions (docstrings), I think they might be needed indeed.
But let me play devil's advocate with a scenario that I've run across in 1 such language that shall be left unnamed. NODE.js ;-) ...

Imagine that we have our tools in place, and code completion working, etc. whatever.  We type in "fingerprint" and hit CTRL-ENTER and we get matches on about 22 different kinds of functions... ALL labeled "fingerprint" or "hash" either with an alias, a label, etc.  Some are using Rabin's algorithm, some are using Rabin-Karp algorithm, some others using Knuth-Morris-Pratt algorithm, or Boyer-Moore but all have different internal keys and so each one is satisfying the "uniqueness" constraint?  But you have to click and read 22 full docs to quickly disambiguate?  I think I'll grab some tea, thank you.  Help me understand this part a bit more.  For example if we have 2 Boyer-Moore search functions, each unique, for example if one imposes the Galil rule and the other does not (type is the same, arguments are the same).   Please example how this would work to quickly assess which one uses Galil rule or not... I think a nice docstring would work wonderfully, no?



On Thu, Oct 22, 2020 at 3:15 PM Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
A version of this newsletter with links and formatting is available on-wiki here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-10-22

In this edition of the weekly posts, I want to discuss the different places where we will use unstructured text in the objects of the wiki of functions. The following is a plan and a request for comments. Besides the labels, nothing is implemented yet, so we would really appreciate your feedback.

Labels. Every object in the wiki of functions will be identified by a Z-ID, similar to Q-IDs identifying items in Wikidata. But just like Q-IDs, we don't expect Z-IDs to be widely visible and used. Instead, every object will have labels, one per language.

But unlike Wikidata items, every object will be an instance of a specific type. For example, there might be an object representing the addition of two integer numbers. So the English label for this object could be “add”. Other good labels for this object could be “addition”, “sum”, or “plus”. An inappropriate name for the function would be “multiplication”, as that would be terribly confusing.

Uniqueness. There would likely also be other objects representing functions that do addition, for example the addition of two floating point numbers, or of two complex numbers, or of two matrices. In the wiki of functions labels will not need to be unique overall - but they will need to be unique for each type. So there can only be one function with the label “add” that takes two integers and returns one integer. Or only one type with the label “integer”. Per type, each label must be unique.

Now does every object need a label? No. There will be many objects where a label won’t be strictly necessary. Not every test for a function will need a label, nor will every implementation of a function. They may have labels, but they won’t be necessary - this is another difference to Wikidata, where items without labels are almost always problematic.

One more note on labels - labels do not have to be direct translations across languages. So in one language, two functions might have the same label if they have a different type, but in another language the two functions might have different labels as well. For example, in English “length” might be an appropriate label for both a function that returns the number of elements in a list, but also for a function that returns the length of a river, or a function that returns the duration of a movie. In Croatian, on the other hand, all three of these might have different names (“broj elemenata”, “duljina”, “trajanje”). Each language can decide what pattern works best for them. Whether verbs in the imperative mood (“add”) or a description of the result (“sum”) or the name of the operation (“addition”) work best can be decided from language to language, and independent style guides for each language may evolve.

Aliases. Besides labels, every object can have additional aliases per language. Aliases are helpful when searching for an object. The above function, labeled “add” in English, could have all the other alternative names given above - “sum”, “plus”, “addition” - as aliases, so that when someone searches for one function by a different name they still find the right one as a result.

Documentation. Every object may also have documentation in each language. Documentation is some wikitext that further describes the given object. Many objects will not have any documentation at all, but many will have some. If you ever had the opportunity to read The Art of Computer Programming, you will know that there is often a lot to say about a function! But besides this kind of background story, we can also have some documentation describing a given implementation, or some explanation for why a given test is useful. It could also link to a Phabricator task describing an error that used to be there and that this test is checking for it in order to catch it before it resurfaces, or a link to other resources such as a textbook on algorithms.

Keys and arguments. At least types and functions will also have labels for each key of the type and for each argument of the function. These will be used in creating the user interface to display and edit values and function calls.

Short descriptions. One specific question we have is - should we have optional short descriptions for each object? In many IDEs, and sometimes even directly in the programming language (think docstrings in Python) there is support for short one-liners that give a bit of extra information for a function, beyond just the name of the function and the arguments and the types.

We are going to have a strong type system, and we will have plenty of space for the documentation. So do we really also need a place for short descriptions? What would their use case be? How would they be used in the UX? On the Website, something akin to the pop-up previews in Wikipedia seem to be even more useful than a one-liner, previewing the whole documentation.

Originally I assumed per default that we would have short descriptions, given their importance and usefulness in Wikidata, and so they also feature in the AbstractText prototype. But they were never useful there. Also, the type took over the role of the disambiguator, as described above, so there was really no technical need for a short description. I currently lean towards not having them, but I would like to hear more input.

The good thing is that no decision will be carved in stone. The data model of the wiki of functions will be much more flexible than the data model of Wikidata, and if we figure out that we do need short descriptions, we can just introduce them later. It is much harder to remove things though, because almost everything that is there gets used some way or the other, so I am more wary of introducing features without a good reason.

Dogfooding. One obvious question is: hey, we are developing this architecture to create multilingual content, why even have all this documentation and labels and all that in actual languages, why not use our own functions to build all of this up?

And yes, agreed, that would be best. It’s just, we’re not there yet, and until then, we will still need labels and documentation and aliases. So eventually I would very much look forward to using abstract content to describe the objects of the wiki of functions themselves. It will also be interesting to see if we can then roll back some of the local content, and how open we will be to do so - as this will give us a lot of interesting insight into how to approach similar goals in the other Wikimedia projects, from descriptions (and maybe even labels? or sense glosses?) to stubby Wikipedia articles, there will be plenty of potential to make our content across projects be easier to maintain and to provide a more uniform baseline coverage across languages.

New video introducing Abstract Wikipedia. A new presentation, given for Wikidata’s Eighth Birthday, organized by the community of WikiProject:India, is available: https://www.youtube.com/watch?v=GAb1HylGemA

Naming contest. Next week, Tuesday 27 October, the second round of voting for the name of the new wiki of functions will begin. The proposals are currently being vetted by legal, and we should know which names will be in the final round soon.

_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia