A Semantic Search Toolbox for the Retrieve of Similar Patterns in Luxembourgish Documents

Funded by the University of Luxembourg

From Left to Right: C. Schommer, J. Sirajzade, D. Gierschek, C. Purschke, P. Gilles.


What is it about?

The aim of STRIPS is to develop a toolbox of semantic search algorithms for Luxembourgish. We want to implement search algorithms to retrieve and to monitor, e.g., temporal patterns of named entities in Luxembourgish texts. The term semantic, hereby, does not only refer to the usage of keywords or Bag-of-Words like names or geographic identifiers, but fosters also on more complex structures like, for example, on concepts (e.g., topics or themes) and a document’s sentiment (e.g., a positive or a negative polarity of the document). The main focus of STRIPS lies in the linguistic processing of texts written in Luxembourgish (particularly stemming, use of phonetic dictionaries and tagged word list for Luxembourgish; Part-of-speech-tagged text corpus), in similarity learning aspects to allow fuzziness in search queries, and in the identification of temporal cross-dependencies inside the Luxembourgish text corpus. To validate the project, we have given heterogeneous text sources (official news items and user-contributed comments) by RTL.

Project Members

Former participants

  • Elisabeth Joy (Department of Computer Science)
  • Elida van Nierop (Department of Mathematics)
  • Rik Lamesch (Department of Mathematics)


  • Joshgun Sirajzyade, C. Schommer The LuNa Open Toolbox for the Luxembourgish Language. In Conference Proceedings Advances in Data Mining, Applications and Theoretical Aspects. New York (2019).
  • Joshgun Sirajzade, Daniela Gierschek, Christoph Schommer and Peter Gilles. Component analysis of adjectives in Luxembourgish for detecting sentiments. Computational Linguistics in the Netherlands (CLIN 29) (2019).
  • Daniela Gierschek. Automatic Detection of Sentiment in Luxembourgish User Comments. CL-Postersession at the 41st Annual Conference of the German Linguistic Society (2019).
  • Daniela Gierschek, Peter Gilles, Christoph Purschke, Christoph Schommer, Joshgun Sirajzade. A Temporal Warehouse for Modern Luxembourgish Text Collections. DH Benelux (2019).
  • Elida van Nierop. Improving LDA Topic Modelling using word embeddings. Master Thesis (2018).
  • Joshgun Sirajzade, Christoph Schommer. Mind and Language. AI in an Example of Similar Patterns of Luxembourgish Language. Proceedings International Conference on Artificial Intelligence and Humanities. Seoul, Korea (2018).
  • Daniela Gierschek. Automatic Detection of Emotions in Luxembourgish User Comments. PhD Forum at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 2018.
  • Ekaterina Kamlovskaya, Christoph Schommer, Joshgun Sirajzade. A Dynamic Associative Memory for Distant Reading. Proceedings International Conference on Artificial Intelligence and Humanities. Seoul, Korea (2018).
  • Joshgun Sirajzade. Korpusbasierte Untersuchung der Wortbildungsaffixe im Luxemburgischen. Technische Herausforderungen und linguistische Analyse am Beispiel der Produktivität. Zeitschrift für Wortbildung = Journal of Word Formation (2018), 2(1).

In the press