Computerised Bilingual Oral Corpus

This project has to do with the following aspects:

  • Corpus Linguistics: the building of corpora for linguistic research is useful to understand linguistic phenomena using real language data. Due to its computarization the material is always available for machine processing and thus saves a lot of work.
  • Study of Speech: the oral corpus is built from real recorded face to face conversations. The varieties that can be found there are mainly colloquial.
  • Bilingualism: Our main interest is bilingual speech. In galician oral varieties it is frecuent to observe speakers addopting a bilingual conversational style. Hibrid structural varieties are also frequent.
  • Language Contact: the corpus may be a very usefull data base to understand process of contact between spanish and galician.
  • LIDES: the LIPPS group (Language Interaction in Plurilingual and Plurilectal Speakers) have been designing the bases of the LIDES system, which we will adopt in order to present the data with the more or less standard conventions proposed by them (actually CHAT from CHILDES). It will be very usefull also to share our corpus in the future international database and to use the tools the whole community wil be developing.