FinUgRevita
Project

Research Focus:

The FinUgRevita Project develops computational tools to aid in language learning and supporting endangered languages.
The main goals of our tools:

Stimulate active linguistic competence: rather than passive absorption of linguistic knowledge, the learners exercise their ability to construct correct word forms based on the given context. The context is provided by a story. Stories can be chosen by the learnerfrom different genres: fiction, journalism, poetry, etc.
Provide an unlimited variety of learning material: the stories can come from our on-line library of pre-selected texts, or from any other source that the user specifies. The goal is that the user learns from stories that s/he finds interesting.
Model the state of user competence and the user's progress toward complete fluency. The principal challenge for the computational learning environment is to present questions to the student that are at the appropriate level of difficulty -- the questions are not too easy and not too difficult.
- If they are too easy, the user will become bored and will quit learning.
- If they are too difficult, the user will become discouraged and will quit learning.

Thus the system must assess the learner's current state of competence and provide questions at exactly the correct level.

The Project is developed by the Computational Linguistics and Language Technology Research Group, at the Department of Computer Science, University of Helsinki.
The Project is jointly supported by:

FinUgRevita Project of the Academy of Finland (Project number: AKA 267097)
Hungarian National Research Fund: OTKA (Project number: OTKA FNN107883)

People:

Vuokko Hangaslahti
University of Helsinki
Master's student (2014-)
Javad Nouri
University of Helsinki
MSc student
Kirill Reshetnikov,
Russian Academy of Sciences,
Institute of Linguistics, Moscow
Marjo Sutinen
University of Helsinki
Master's student (2014-)
Roman Yangarber
University of Helsinki
Project Lead

Collaboration:

The Hungarian partner team:
University of Szeged, Hungary.
Project Lead: Prof Anna Fenyvesi.
Elävä Kieli -- Живой Язык -- Living Language:
Helsinki, Finland.
Organization for revitalization and
strengthening of minor Finno-Ugric Languages
Tibidiscis, Oy:
Helsinki company specializing in computer assisted language learning
Giellatekno: Platform for language technology tools,
focusing on Saami and other Uralic languages;
University of Tromsø, Norway.
The Russian Academy of Sciences (RAS), Institute of Linguistics. The StarLing Project, a collection of large etymological databases for many language families of the world.
KOTUS: analysis and enhancement of data in the Finnish etymological dictionary "Suomen Sanojen Alkuperä". (This database is proprietary, and will be released for public access soon. Please contact us or KOTUS to request permission to access.)

Resources:

Project Wiki
(Internal use, registration needed)
Wiki: Computational historical linguistics
and modeling population history
Web-based etymological database,
Suomen Sanojen Alkuperä:
Finnish Etymological Dictionary
(To be published on the Web by KOTUS;
please contact us to request access)

Project Publications:

Conference and Journal Papers, Book Chapters, Theses

Measuring Language Closeness by Modeling Regularity (pdf)

Javad Nouri, Roman Yangarber
In Proceedings of the EMNLP 2014 Workshop on Language Technology for Closely Related Languages and Language Variants
(2014) Doha, Qatar
MDL-based Models for Transliteration Generation (pdf)

Javad Nouri, Lidia Pivovarova, Roman Yangarber
SLSP 2013: International Conference on Statistical Language and Speech Processing
Springer Verlag, Lecture Notes in Artificial Intelligence (LNAI) Volume 7978, (2013) Tarragona, Spain
Information-theoretic modeling of etymological sound change (abstract)

Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber
Invited chapter in Approaches to measuring linguistic differences (Lars Borin, Anju Saxena, eds.) Trends in Linguistics Series, Volume 265.
(2013) Mouton de Gruyter
Probabilistic, Information-Theoretic Models for Etymological Alignment (pdf)
Ph.D. Thesis:
Hannes Wettig
(2013) University of Helsinki, Department of Computer Science
Information-theoretic Methods for Analysis and Inference in Etymology (pdf)

Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber
In Proceedings of WITMSE-2012: the Fifth Workshop on Information-theoretic Methods in Science and Engineering (Steven de Rooij, Wojciech Kotłowski, Jorma Rissanen, Petri Myllymäki, Teemu Roos & Kenji Yamanishi, eds.)
(2012) Amsterdam, the Netherlands
Minimum Description Length Modeling of Etymological Data (pdf)
Master's Thesis:
Suvi Hiltunen
(2012) University of Helsinki, Department of Computer Science
Using Context and Phonetic Features in Models of Etymological Sound Change (pdf)

Hannes Wettig, Kirill Reshetnikov and Roman Yangarber.
In Conference of the European Chapter of the Association for Computational Linguistics (EACL) Workshop on Visualization of Linguistic Patterns and Uncovering Language History from Multilingual Resources
(2012) Avignon, France
MDL-based models for alignment of etymological data (pdf)

Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
RANLP-2011: Conference on Recent Advances in Natural Language Processing
(2011) Hissar, Bulgaria
MDL-based modeling of etymological sound change in the Uralic language family

Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
WITMSE-2011: The Fourth Workshop on Information Theoretic Methods in Science and Engineering
(2011) Helsinki, Finland
Probabilistic models for alignment of etymological data (pdf)

Hannes Wettig, Roman Yangarber.
Nodalida-2011: Nordic Conference on Computational Linguistics
(2011) Riga, Latvia
Hidden Markov models for induction of morphological structure of natural language

Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
WITMSE-2010: Workshop on Information Theoretic Methods in Science and Engineering
(2010) Tampere, Finland
A Database of the Uralic language family for etymological research

Yangarber, R., Salmenkivi, M., Välisalo, M.
University of Helsinki, Technical Report Series C; C-2008-38.
(2008) Helsinki, Finland