Projects

 

PERSEUS: Personalized concept-based Sentiment Analysis

In the research project PERSEUS, we aim at discovering individualities in expressing sentiments in text. To study the diversity between individuals and the consistency in each individual, we have build a personalized framework that takes user-related text from social platforms, such as Twitter and Facebook, and investigates and improves sentiment categorization by applying Deep Learning techniques. This project researches beyond purely understanding the meaning of text, and focuses on integrating the preference and tendency of users to provide user-sensitive predictions. Cooperation Partners: Lenovo AI Research Beijing.

 

Approaching Indigenous Australian History With Text Mining Methods

Despite their remarkable value, autobiographies appear to remain one of the most under-utilized historical resources. The proposed research project in digital humanities will apply computational Distant Reading-methods (natural language processing in general and topic modeling in particular) as a complement to traditional ”close reading” of Indigenous Australian autobiographies, aiming to identify meaningful language use patterns in the context of social environment and historical events. Cooperation Partners: C2DH, University of Sydney.

 

STRIPS: A Semantic Search Toolbox for the Retrieve of Similar Patterns in Luxembourgish Documents

The aim of STRIPS is to develop a toolbox of semantic search algorithms for Luxembourgish. We want to implement search algorithms to retrieve and to monitor, e.g., temporal patterns of named entities in Luxembourgish texts. The term ‘semantic’, hereby, does not only refer to the usage of keywords or Bag-of-Words (for example: names, geographic identifiers), but fosters also on more complex structures like, for example, on concepts (e.g., topics or themes) and a document’s sentiment (e.g., a positive or a negative polarity of the document). The main focus of STRIPS lies in the linguistic processing of texts written in Luxembourgish (particularly stemming, use of phonetic dictionaries and tagged word list for Luxembourgish; Part-of-speech-tagged text corpus), in similarity learning aspects to allow fuzziness in search queries, and in the identification of temporal cross-dependencies inside the Luxembourgish text corpus. To validate the project, we have given heterogeneous text sources (official news items and user-contributed comments). Cooperation Partner: RTL.