FinUgRevita
Project

Cover: FinUgRevita Logo

Research Focus:

The FinUgRevita Project develops computational tools to aid in language learning and supporting endangered languages.
The main goals of our tools:
  1. Stimulate active linguistic competence: rather than passive absorption of linguistic knowledge, the learners exercise their ability to construct correct word forms based on the given context. The context is provided by a story. Stories can be chosen by the learnerfrom different genres: fiction, journalism, poetry, etc.
  2. Provide an unlimited variety of learning material: the stories can come from our on-line library of pre-selected texts, or from any other source that the user specifies. The goal is that the user learns from stories that s/he finds interesting.
  3. Model the state of user competence and the user's progress toward complete fluency. The principal challenge for the computational learning environment is to present questions to the student that are at the appropriate level of difficulty -- the questions are not too easy and not too difficult.
    • If they are too easy, the user will become bored and will quit learning.
    • If they are too difficult, the user will become discouraged and will quit learning.

Thus the system must assess the learner's current state of competence and provide questions at exactly the correct level.

The Project is developed by the Computational Linguistics and Language Technology Research Group, at the Department of Computer Science, University of Helsinki.
The Project is jointly supported by:

  1. FinUgRevita Project of the Academy of Finland (Project number: AKA 267097)
  2. Hungarian National Research Fund: OTKA (Project number: OTKA FNN107883)

People:

  • Vuokko Hangaslahti
    University of Helsinki
    Master's student (2014-)
  • Javad Nouri
    University of Helsinki
    MSc student
  • Kirill Reshetnikov,
    Russian Academy of Sciences,
    Institute of Linguistics, Moscow
  • Marjo Sutinen
    University of Helsinki
    Master's student (2014-)
  • Roman Yangarber
    University of Helsinki
    Project Lead

Collaboration:

  • The Hungarian partner team:
    University of Szeged, Hungary.
    Project Lead: Prof Anna Fenyvesi.
  • Elävä Kieli -- Живой Язык -- Living Language:
    Helsinki, Finland.
    Organization for revitalization and
    strengthening of minor Finno-Ugric Languages
  • Tibidiscis, Oy:
    Helsinki company specializing in computer assisted language learning
  • Giellatekno: Platform for language technology tools,
    focusing on Saami and other Uralic languages;
    University of Tromsø, Norway.
  • The Russian Academy of Sciences (RAS), Institute of Linguistics. The StarLing Project, a collection of large etymological databases for many language families of the world.
  • KOTUS: analysis and enhancement of data in the Finnish etymological dictionary "Suomen Sanojen Alkuperä". (This database is proprietary, and will be released for public access soon. Please contact us or KOTUS to request permission to access.)

Resources:

Cover: Family network

Project Publications:

    Conference and Journal Papers, Book Chapters, Theses

  1. Measuring Language Closeness by Modeling Regularity   (pdf)
    Javad Nouri, Roman Yangarber
    In Proceedings of the EMNLP 2014 Workshop on Language Technology for Closely Related Languages and Language Variants
    (2014) Doha, Qatar

  2. MDL-based Models for Transliteration Generation   (pdf)
    Javad Nouri, Lidia Pivovarova, Roman Yangarber
    SLSP 2013: International Conference on Statistical Language and Speech Processing
    Springer Verlag, Lecture Notes in Artificial Intelligence (LNAI) Volume 7978, (2013) Tarragona, Spain

  3. Information-theoretic modeling of etymological sound change   (abstract)
    Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber
    Invited chapter in Approaches to measuring linguistic differences (Lars Borin, Anju Saxena, eds.) Trends in Linguistics Series, Volume 265.
    (2013) Mouton de Gruyter

  4. Probabilistic, Information-Theoretic Models for Etymological Alignment   (pdf)
    Ph.D. Thesis:
    Hannes Wettig
    (2013) University of Helsinki, Department of Computer Science

  5. Information-theoretic Methods for Analysis and Inference in Etymology   (pdf)
    Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber
    In Proceedings of WITMSE-2012: the Fifth Workshop on Information-theoretic Methods in Science and Engineering   (Steven de Rooij, Wojciech Kotłowski, Jorma Rissanen, Petri Myllymäki, Teemu Roos & Kenji Yamanishi, eds.)
    (2012) Amsterdam, the Netherlands

  6. Minimum Description Length Modeling of Etymological Data    (pdf)
    Master's Thesis:
    Suvi Hiltunen
    (2012) University of Helsinki, Department of Computer Science

  7. Using Context and Phonetic Features in Models of Etymological Sound Change   (pdf)
    Hannes Wettig, Kirill Reshetnikov and Roman Yangarber.
    In Conference of the European Chapter of the Association for Computational Linguistics (EACL) Workshop on Visualization of Linguistic Patterns and Uncovering Language History from Multilingual Resources
    (2012) Avignon, France

  8. MDL-based models for alignment of etymological data   (pdf)
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
    RANLP-2011: Conference on Recent Advances in Natural Language Processing
    (2011) Hissar, Bulgaria

  9. MDL-based modeling of etymological sound change in the Uralic language family   
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
    WITMSE-2011: The Fourth Workshop on Information Theoretic Methods in Science and Engineering
    (2011) Helsinki, Finland

  10. Probabilistic models for alignment of etymological data   (pdf)
    Hannes Wettig, Roman Yangarber.
    Nodalida-2011: Nordic Conference on Computational Linguistics
    (2011) Riga, Latvia

  11. Hidden Markov models for induction of morphological structure of natural language   
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
    WITMSE-2010: Workshop on Information Theoretic Methods in Science and Engineering
    (2010) Tampere, Finland

  12. A Database of the Uralic language family for etymological research   
    Yangarber, R., Salmenkivi, M., Välisalo, M.
    University of Helsinki, Technical Report Series C; C-2008-38.
    (2008) Helsinki, Finland