Tpt did take a few datasets that have a high-enough quality from the Freebase dataset and uploaded it directly. These numbers do not appear in the Primary Sources tool, because they were uploaded directly - each set going through the normal community process.
The Primary Sources Tool is left with the datasets where we were not able to establish a high enough threshold of quality. For any dataset where this quality can be demonstrated to the community, I assume they will agree with a direct upload.
I am not sure what else to do here.
I am very thankful to Nemo for his rephrasing of the discussion and to pull it to a constructive and actionable level.
Gerard, regarding your arguments:
- why would someone work on data in the primary sources tool when it is more effective to add data directly
Can you explain what you mean with "add data directly". I am really not sure what you mean with this argument. Are you suggesting to upload the whole dataset without further review?
- why is data that is over 90% good denied access to Wikidata (ie as good as Wikidata itself)
But it is not over 90% good! We have a rejection rate of almost 20%. Also, 10% errors means more than 1 Million errors. I yet need to see consensus to upload this.
- how do you justify the pst when so little data was included in Wikidata
The tool has been used to add thousands of statements and references to Wikidata, and that by a rather small set of people (because you need to intentionally install it). I would think that if we switch it on per default, the throughput should grow considerably. Nemo identified a few issues for that, and it would be good if we would work on these. Everyone is invited to help out with that.
- why not have Kian learn from the data set of Freebase and Wikidata and have smart suggestions
Kian is free to learn from the datasets. The data of Freebase has been available for years, and Kian would by far not be the first ML tool to use it for training purposes. If there is anything hindering Kian to use the Freebase data, let me know, I will try to fix it.
- why waste people's time adding one item/statement at a time when you can focus on the statements that are in doubt (either in Freebase or in Wikidata
Because we don't know which ones are which. If you could tell me which of the 12 Million statements are good and which ones are not, and if there is consensus about that assessment, I'd be happy to upload them.
I hope that this answers your arguments.
Again, I do not understand what your proposal is. I am going through the process to release the data in an easy to use way. If the community agrees with that, it can then be directly imported to Wikidata - I certainly won't stop anyone from doing so and never had.
My feeling is that you are frustrated by what you perceive as slow progress. You keep yelling at people that their ideas and work are not good. I remember how much you attacked me about Wikidata and all the things I have been doing wrong about it. Gerard, if you think you are motivating me with your constant attacks, I have to tell you, you are not. I am not speaking for anyone else, but I am getting tired of this. I appreciate a critical voice, but not in the tone you are often delivering it.
So, instead of telling everyone how we are supposed to spend our volunteer time in order to get things done better, and how we are doing things wrong, why don't you lead by example, and do it right? All the data, all the tools, for anything you want to get done are available to you for free. It is a pretty amazing world - all you need is at click away. So go ahead and do what you want to get done.