Hello Sophie,

Thank you for reaching out and sharing this interesting project with us. I am intrigued by the potential of using Wikidata to generate content in different languages and appreciate your efforts in developing open-source systems for Natural Language Generation.

I am particularly interested in the tool you are designing to assist people in writing Wikipedia articles. The idea of providing seed texts generated from structured data sources like DBpedia and Wikidata aligns well with the goals of Wikipedia to provide accurate and reliable information.

I would love to learn more about the project and explore potential collaborations. 

Thank you again for sharing this exciting initiative. I look forward to hearing from you soon.

Best regards,
Aliyu Shaba.


On Tue, Mar 26, 2024, 4:50 PM Sophie Fitzpatrick <sophiefitzpatrick@wikimedia.ie> wrote:
Hello Wikimedians,

My name is Sophie and I am the Project and Communications Manager at Wikimedia Community Ireland. 

I am reaching out to draw your attention to an interesting Natural Language Processing Project that uses Wikidata to generate content in different languages, at DCU here in Ireland. We have been collaborating with Simon Mille from the Adapt Centre recently and I thought it might be good to make some connections with the wider Wiki Community who are specifically interested in or involved with  AI. 

Please let me introduce the project below and if you would like to learn more or connect with Simon I would be delighted to introduce you. 

Kind regards,
Sophie Fitzpatrick 

Project description: At DCU-NLG, one of our main research topics is the automatic generation of text from structured data. We work with structured repositories such as DBpedia and Wikidata (among other resources), which contain millions of triples that can be used to generate texts about targeted entities in a particular language. A lot of techniques exist for generating text from triple sets, the most famous (and probably best) one being prompting a GPT model. However, closed-source models such as the GPT series have some important drawbacks: they are very much resource-hungry, they are not easily controllable, and they do not give researchers access to their code. At DCU-NLG, we develop open-source systems that aim to address these issues in the domain of Natural Language Generation. We build (i) generators based on Large Language Models (LLMs), which can achieve very high-quality results but still require a large amount if energy to work, (ii) fully rule-based systems, which are extremely energy-efficient but struggle to get to the quality level of LLMs, and (iii) hybrid systems, which aim at combining the strengths of LLMs, rule-based systems and neural systems. We are also interested in the real-world use of these systems, and are currently making a tool that could help people write Wikipedia articles: we are designing an interface that, given an entity and a language, returns small seed texts generated using several techniques mentioned above, always using DBpedia or Wikidata information to ensure the traceability of the source. People can then use these seed texts as a starting point for editing a new Wikipedia page.

Some resources:
  • RTE brainstorm article (by the way it's funny how they use the word "translate" in their title, knowing the time I spend talking about how NLG is not translation xD)
  • Papers from our group about using GPT and  a rule-based system  for the generation of Irish text from DBpedia.
  • The GEM shared task about generation from DBpedia and WIkidata, which I co-organise.


--
Sophie Fitzpatrick
Project and Communications Manager
Wikimedia Community Ireland Bi-Lingual Logo 
Pobal Wikimedia na hÉireann | Wikimedia Community Ireland
_______________________________________________
Languages mailing list -- languages@lists.wikimedia.org
To unsubscribe send an email to languages-leave@lists.wikimedia.org