Hello Wikimedians,
My name is Sophie and I am the Project and Communications Manager at
Wikimedia Community Ireland.
I am reaching out to draw your attention to an interesting Natural Language
Processing Project that uses Wikidata to generate content in different
languages, at DCU here in Ireland. We have been collaborating with Simon
Mille from the Adapt Centre <https://www.adaptcentre.ie/> recently and I
thought it might be good to make some connections with the wider Wiki
Community who are specifically interested in or involved with AI.
Please let me introduce the project below and if you would like to learn
more or connect with Simon I would be delighted to introduce you.
Kind regards,
Sophie Fitzpatrick
*Project description:* At DCU-NLG <https://dcu-nlg.github.io/>, one of our
main research topics is the automatic generation of text from structured
data. We work with structured repositories such as DBpedia and Wikidata (among
other resources), which contain millions of triples that can be used to
generate texts about targeted entities in a particular language. A lot of
techniques exist for generating text from triple sets, the most famous (and
probably best) one being prompting a GPT model. However, closed-source
models such as the GPT series have some important drawbacks: they are very
much resource-hungry, they are not easily controllable, and they do not
give researchers access to their code. At DCU-NLG, we develop open-source
systems that aim to address these issues in the domain of Natural Language
Generation. We build (i) generators based on Large Language Models (LLMs),
which can achieve very high-quality results but still require a large
amount if energy to work, (ii) fully rule-based systems, which are
extremely energy-efficient but struggle to get to the quality level of
LLMs, and (iii) hybrid systems, which aim at combining the strengths of
LLMs, rule-based systems and neural systems. We are also interested in the
real-world use of these systems, and are currently making a tool that could
help people write Wikipedia articles: we are designing an interface that,
given an entity and a language, returns small seed texts generated using
several techniques mentioned above, always using DBpedia or Wikidata
information to ensure the traceability of the source. People can then use
these seed texts as a starting point for editing a new Wikipedia page.
*Some resources:*
- RTE brainstorm article
<https://www.rte.ie/brainstorm/2023/1206/1420417-gaeilge-irish-translation-a…>
(by
the way it's funny how they use the word "translate" in their title,
knowing the time I spend talking about how NLG is not translation xD)
- Papers from our group about using GPT
<https://aclanthology.org/2023.mmnlg-1.9/> and a rule-based system
<https://aclanthology.org/2023.pandl-1.4/> for the generation of Irish
text from DBpedia.
- The GEM shared task <https://gem-benchmark.com/shared_task> about
generation from DBpedia and WIkidata, which I co-organise.
--
Sophie Fitzpatrick
*Project and Communications Manager*
[image: Wikimedia Community Ireland Bi-Lingual Logo]
Pobal Wikimedia na hÉireann | Wikimedia Community Ireland
https://wikimedia.ie/