Wikidata Lexicodays, 28-30 June 2024

Discussions related to the Lexicodays are welcome here in the language of your choice. If you have any question or need support (for example, to propose a session), feel free to contact Lea Lacroix (WMDE) directly ( or @Auregann on Telegram). Announcements about the event are talking place here as well as on the Lexicographical Data Telegram group.

Lexicodays, online event dedicated to Lexicographical Data, on June 28-30, 2024


- Indonesian version below -

Hello all,

Have you ever wondered how Wikidata stores and models words? How to create and improve Lexemes in your languages? Or even why it is useful and which projects could benefit from it?

The Lexicodays 2024 will answer these questions, and many more. During this online event, you will be able to learn more about Lexicographical Data on Wikidata, to discover how to model words in your languages, and to try out various tools that make it easier to work on Lexemes. It offers a space for editors involved in creating and maintaining Lexemes to discuss their ideas, challenges and best practices.

The online event will take place on June 28, 29 and 30, with sessions replicated in different languages and at different times across time zones. It is co-organized by Wikimedia Deutschland and the Software Collaboration Team in Indonesia, and we will focus on the languages of Indonesia and the Wikidata community in Indonesia. The event is open to everyone regardless of their knowledge of Lexemes. Most sessions will be recorded and published after the event.

On the main event page, you can discover the structure of the program, which will keep evolving in the upcoming weeks. We are also welcoming proposals for the program until June 20th - we are particularly interested in introductions to Lexicographical Data in different languages, and discussions run by community members on how to improve modelling and documentation in a specific language.

We will launch registration for the event in the upcoming days - if you’re interested, stay tuned by following the talk page or joining the Lexicographical Data Telegram group.

If you have any questions, feel free to write on the talk page of the event. See you soon, Léa (Lea Lacroix (WMDE)) and Raisha (Fexpr).


Halo, teman-teman!

Pernahkah Anda bertanya-tanya bagaimana Wikidata menyimpan dan memodelkan kata-kata? Bagaimana cara membuat dan meningkatkan Leksem dalam bahasa yang Anda tuturkan? Kenapa Leksem itu bermanfaat? Proyek-proyek apa yang akan terbantu dengan adanya Leksem ini?

Lexicodays 2024 akan menjawab pertanyaan-pertanyaan tersebut, dan masih banyak lagi. Selama acara daring ini, Anda akan dapat mempelajari lebih lanjut mengenai Data Leksikografis di Wikidata, menemukan cara memodelkan kata-kata dalam bahasa Anda, dan mencoba berbagai perkakas yang memudahkan Anda dalam menyunting Leksem. Acara ini membuka ruang bagi para penyunting yang terlibat dalam pembuatan dan pemeliharaan Leksem untuk saling berdiskusi mengenai ide, tantangan, maupun praktik-praktik terbaik.

Acara daring ini akan berlangsung pada tanggal 28, 29, dan 30 Juni, dengan waktu penyelenggaraan yang tersebar dalam beberapa zona waktu dan sesi-sesi serupa yang diantarkan dalam bahasa-bahasa yang berbeda. Acara ini diselenggarakan bersama oleh Wikimedia Deutschland dan Tim Kolaborasi Perangkat Lunak di Indonesia. Fokus dari acara ini adalah untuk bahasa-bahasa yang dituturkan di Indonesia dan komunitas Wikidata di Indonesia. Acara ini terbuka untuk siapa saja, terlepas dari seberapa akrab Anda dengan Leksem. Kami akan merekam sebagian besar sesi dan mempublikasikannya setelah acara selesai.

Anda dapat mengakses jadwal kegiatan pada halaman beranda acara, yang akan terus kami perbarui dalam beberapa pekan ke depan. Kami juga mengadakan panggilan terbuka untuk pengajuan proposal kegiatan hingga tanggal 20 Juni. Kami sangat tertarik dengan pengenalan Data Leksikografis dalam berbagai bahasa, dan diskusi yang dilakukan oleh anggota komunitas mengenai cara meningkatkan pemodelan dan dokumentasi dalam bahasa tertentu.

Kami akan membuka pendaftaran untuk acara ini dalam beberapa hari mendatang. Apabila Anda tertarik, silakan pantau terus laman pembicaraan ini atau bergabunglah dengan grup Telegram Data Leksikografis.

Jika Anda memiliki pertanyaan, jangan ragu untuk menulis di laman pembicaraan acara Lexicodays 2024. Sampai jumpa, Léa Lea Lacroix (WMDE) dan Raisha Fexpr. Lea Lacroix (WMDE) (talk) 08:56, 3 June 2024 (UTC)[reply]

Ordia Indonesian text-to-lexeme


For some reason Indonesian (language code id) was not enabled in Ordia (Q63379419)'s text-to-lexeme tool. I have now fixed that. Example:

Hi @Fnielsen: Thanks a lot for your message and for adding Indonesian to Text-to-Lexeme! Would you be interested in giving a quick demo of this tool during the event?
Ideally, it should be a pre-recorded video of 5-10min that you would send me this week so we have time to add captions in English and Indonesian. But if it doesn't work for you, we could also do it live in English, for example during one of the tools session on June 29th at 16:00 UTC or June 30th at 16:00 UTC.
Let me know if that would work for you! Thanks a lot, Lea Lacroix (WMDE) (talk) 05:53, 10 June 2024 (UTC)[reply]
Hi @Lea Lacroix (WMDE): Sorry, I did not see your response before. I will see whether I can pre-record a small session. BTW I have just fixed an embarrassing error for Indonesian in Ordia's text-to-lexemes. Finn Årup Nielsen (fnielsen) (talk) 19:14, 19 June 2024 (UTC)[reply]

Template for session proposal


Here's a template you can use for proposing a session for the Lexicodays 2024. Feel free to copy its content and create a new section below! (use the session title as the section title). Please note that sessions are not automatically accepted: because we have limited slots in the schedule, we will make a selection of sessions that can make it to the program of the event. You can propose a session until June 20th.

  • Session title:
  • Format:
  • Speaker(s) or facilitator(s):
  • Short description of the session:
  • Prerequisite knowledge:
  • Suggestions of time and date:
  • Language(s):

How many lexemes does a language really need?

  • Speaker: Mahir256
  • Short description of the session: This session will discuss the question of how many lexemes a language really needs to have in order to be sufficient for natural language generation purposes. (Some spoilers: 1) it depends on how the language works, and 2) depending on what other languages can bring to the table, this number may actually be smaller than you think!)
  • Suggestions of time and date: 28 June, 19 UTC
  • Language(s): English

Modeling proverbs and sayings as lexemes

  • Speaker: Mahir256
  • Short description of the session: This session will discuss how sentences (specifically proverbs and sayings) in different languages can be modeled on Wikidata, using information on their "combines lexeme" statements to capture syntactic and semantic information.
  • Suggestions of time and date: 28 June, 20 UTC
  • Language(s): English

Modeling compound word lexemes (and helping other languages in the process)

  • Speaker: Mahir256
  • Short description of the session: This session will discuss how lexemes for compound words (particularly nouns composed of other nouns) can be set up so that other languages can take advantage of the information in them.
  • Suggestions of time and date: 28 June, 21 UTC
  • Language(s): English

Modeling predicate lexemes on Wikidata

  • Speaker: Mahir256 and عُثمان
  • Short description of the session: In this session, we will look at some senses for verbs or other verb phrases in some under-resourced languages and then determine what information would be necessary to model a proposition item for them.
  • Suggestions of time and date: 29 June, 17 UTC
  • Language(s): English

Fun generating sentences with lexemes!

  • Speaker: Mahir256
  • Short description of the session: This is a live coding session in which Ninai/Udiron, a natural language generation system using Wikidata lexemes and items, will be worked on. Anything could be accomplished here: the functionality for some grammatical constructs might be fixed; some new abstract content types might be introduced; even a new language might be added! Feel free to hop on or hop off the session when you like; I will greatly appreciate your live inputs and feedback! If there are no scheduled sessions after 20:00 UTC, then this session may be extended beyond that time.
  • Suggestions of time and date: 29 June, 19 UTC
  • Language(s): English, with some text in Norwegian and Turkish (and possibly other languages?)