Computerised Bilingual
Oral Corpus
This project
has to do with the following aspects:
- Corpus Linguistics:
the building of corpora for linguistic research is useful to understand
linguistic phenomena using real language data. Due to its computarization
the material is always available for machine processing and thus
saves a lot of work.
- Study of Speech:
the oral corpus is built from real recorded face to face conversations.
The varieties that can be found there are mainly colloquial.
- Bilingualism:
Our main interest is bilingual speech. In galician oral varieties
it is frecuent to observe speakers addopting a bilingual conversational
style. Hibrid structural varieties are also frequent.
- Language Contact:
the corpus may be a very usefull data base to understand process
of contact between spanish and galician.
- LIDES: the
LIPPS
group (Language Interaction in Plurilingual and Plurilectal Speakers)
have been designing the bases of the LIDES system, which we will
adopt in order to present the data with the more or less standard
conventions proposed by them (actually CHAT from CHILDES).
It will be very usefull also to share our corpus in the future
international database and to use the tools the whole community
wil be developing.