Linguistics Stimuli

The following is forked with permission from (and almost identical to) The Language Goldmine.

URLTitleDescriptionTagsLanguagesAssociated Publication 9611 concepts from 51 different concept lists to 2206 different concept sets, 243 relations between concepts are definedsemantics, concepts, lexicon structure, vocabularymultilingual of Cross-Linguistic ColexificationsGives polysemy information for 221 different languages covering 64 families (more than 300000 words and 10000 concepts)semantics, concpts, polysemy, lexicon structure, vocabulary, typologymultilingualList, J.-M., Terhalle, A., & Urban, M. (2013). Using network approaches to enhance the analysis of cross-linguistic polysemies. Proceedings of the 10th International Conference on Computational Semantics (pp. 347-353). Association for Computational Linguistics. TV News ArchiveContains more than 705,000 captioned and searchable news programs from over 4 years of U.S. television networkssemantics, gesture, phonetics, corpus, TV, media, politics, news, multimodal corpusEnglish, Spanish word frequencies based on 44 million words from film and television subtitlesword frequency, contextual diversityDutchKeuleers, E., Brysbaert, M. & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods, 42(3), 643-650. word frequencies based on 33.5 million words from film and television subtitlesword frequency, part of speech (POS), lexical decision task, reaction times (RT), response latencyChineseCai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS One, 5(6), e10729. Greek word frequencies based on 23 million words from film and television subtitlesword frequency, orthographic neighborhood density, orthgraphic levensthein distance, contextual diversityGreekDimitropoulou, M., Duñabeitia, J., Avilés, A., Corral, J.& Carreiras, M. (2010). Subtitle-based word frequencies as the best estimate of reading behaviour: the case of Greek.Frontiers in Psychology, 1:218, 1-12. word frequencies based on 101 million words from film and television subtitlesword frequencyPolishMandera, P., Keuleers, E., Wodniecka, Z., & Brysbaert, M. (2014). Subtlex-pl: subtitle-based word frequency estimates for Polish. Behavior research methods, 47(2), 471-483. English word frequencies based on 201.3 million words from 45,099 BBC broadcastsword frequency, contexutal diversity, word frequency in childrens programs, part of speech (POS), bigram frequenciesEnglishVan Heuven, W.J.B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). Subtlex-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176-1190. word frequencies of 25.4 million words from film and television subtitlesword frequencyGermanBrysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A.M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412-424 Woerterbuch der deutschen Sprache (dlexDB)Over 100 million German word tokens, neighborhood densities and bigram and trigram probabilities based on different registersword frequency, bigram probability, trigram probability, neighborhood density, conditional probabilityGermanHeister, J., Wuerzner, K. M., Bubenzer, J., Pohl, E., Hanneforth, T., Geyken, A., & Kliegl, R. (2011). dlexDB-A lexical database for the psychological and linguistic research. Psychologische Rundschau, 62(1), 10-20. Wortschatz LexiconGerman thesaurus and lexical networkthesaurus, lexical networkGerman Corpus CollectionContains frequencies and co-occurrence information for 219 languagesword frequency, corpusmultilingualQuasthoff, U., Richter, M., Biemann, C. (2006). Corpus Portal for Search in Monolingual Corpora. Proceedings of the fifth international conference on Language Resources and Evaluation, LREC 2006, Genoa, pp. 1799-1802. English Age-of-acquisition ratingsAge-of-acquisition ratings for 30,000 English words.age of acquisition (AOA)EnglishKuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990. English Affective RatingsValence, arousal and dominance ratings for 13,915 English wordsemotion, valence, dominance, arousal, affect, positive, negativeEnglishWarriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207. English Concreteness RatingsConcreteness ratings for 40,000 English wordsconcretenessEnglishBrysbaert, M., Warriner, A.B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904-911. Dutch Age-of-acquisition & Concreteness ratingsAge-of-acquisition and concreteness ratings for 30,000 Dutch wordsconcreteness, age of acquisition (AOA)DutchBrysbaert, M., Stevens, M., De Deyne, S., Voorspoels, W., & Storms, G. (2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica, 150, 80-84. Dutch Affective RatingsValence, arousal and dominance ratings for 4,300 Dutch wordsemotion, valence, dominance, arousal, affect, positive, negativeDutchMoors, A., De Houwer, J., Hermans, D., Wanmaker, S., van Schie, K., Van Harmelen, A. L., De Schryver, M., De Winne, J., & Brysbaert, M. (2013). Norms of valence, arousal, dominance, and age of acquisition for 4,300 Dutch words. Behavior research methods, 45(1), 169-177. Lexicon ProjectLexical decision data for 38,840 French words and 38,840 pseudowordslexicon project, psycholinguistic database, reaction times (RT), response latency, word frequencyFrenchFerrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., MĂ©ot, A., Augustinova, M., & Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42, 488-496. lexical database for 135, 000 wordslexicon project, psycholinguistic database, reaction times (RT), response latency, word frequencyFrench Lexicon ProjectMalay lexical database for 9,592 wordslexicon project, psycholinguistic database, reaction times (RT), response latency, word frequencyMalayYap, M. J., Liow, S. J. R., Jalil, S. B., & Faizal, S. S. B. (2010). The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior research methods, 42(4), 992-1003. Lexicon Project (ELP)English lexical database for 40,481 wordslexicon project, psycholinguistic database, reaction times (RT), response latency, lexical decision task, word naming, contextual diversity, neighborhood density, bigram probability, part of speech (POS), levenshtein distance, SUBTLEXEnglishBalota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English lexicon project. Behavior research methods, 39(3), 445-459. Lexicon ProjectDutch lexical database for 14,000 wordslexicon project, psycholinguistic database, reaction times (RT), response latency, word frequencyDutchKeuleers, E., Diependaele, K., & Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 Dutch mono-and disyllabic words and nonwords. Frontiers in Psychology, 1, 174. Lexicon Project (BLP)British English lexical database for 28,000 wordslexicon project, psycholinguistic database, reaction times (RT), response latency, bigram probability, trigram probability, word frequencyEnglishKeuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287-304. Word Knowledge & PrevalenceWord prevalence values for 54,319 Dutch words from nearly 300,000 participantsword prevalence, word knowledge, lexical knowledgeDutchKeuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. The Quarterly Journal of Experimental Psychology, (ahead-of-print), 1-28. of Semantic Shifts in the Languages of the World3,690 semantic connections in the world's languages (polysemy, semantic changes)polysemy, semantic change, semanticsmultilingualZalizniak, A. A., Bulakh, M., Ganenkov, D., Gruntov, I., Maisak, T., & Russo, M. (2012). The catalogue of semantic shifts as a database for lexical semantic typology. Linguistics, 50, 633-669. Word Association NormsFree word association data for 72,000 word pairsword association, semanticsEnglishNelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36, 402-407. Exclusivity Norms for AdjectivesHow much 423 adjectives are associated with different modalities/senses (e.g., vision, hearing)modality exclusivity, senses, semantics, adjectivesEnglishLynott, D., & Connell, L. (2009). Modality exclusivity norms for 423 object properties. Behavior Research Methods, 41, 558-564. Psycholinguistics DatabaseLexical database for Englishlexicon project, psycholinguistic database, reaction times (RT), response latency, word frequency, concreteness, familiarity, imageability, meaningfulness, part of speech (POS), lexical category, part of speech, number of phonemes, syllables, letters, stress-marked phonetic transcriptionEnglishColtheart, M. (1981). The MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology, 33A, 497-505.; Wilson, M.D. (1988). The MRC Psycholinguistic Database: Machine Readable Dictionary, Version 2. Behavioural Research Methods, Instruments and Computers, 20, 6-11. from 51 million word tokensword frequency, contextual diversityEnglishBrysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977-990.; Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior research methods, 44(4), 991-997. National Corpus (BNC)Corpus based with 100 million wordscorpusEnglish Atlas of Language Structures (WALS)Typological databasetypology, grammatical database, syntax, morphology, phonologymultilingualDryer, Matthew S. & Haspelmath, Martin (eds.) (2013). The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. World Loanword Database (WOLD)Loanword database with mini-dictionaries for 41 languages; words are coded for likelihood of being a loanwordloanwords, borrowing, language contactmultilingualHaspelmath, M., & Tadmor, U. (Eds.). (2009). Loanwords in the world's languages: a comparative handbook. Walter de Gruyter. Feature NormsSemantic feature norms for 541 concepts from 725 participantssemantic features, semantics, feature norms, distinctive features, objects and events, propertiesEnglishMcRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior research methods, 37(4), 547-559. NgramLarge corpora of books with word frequencies and ngram frequencies from English, German, French, Italian, Spanish, Russian, Chinese and Hebrew, POS-taggedngrams, word frequencymultilingualMichel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., ... & Aiden, E. L. (2011). Quantitative analysis of culture using millions of digitized books. science, 331(6014), 176-182. dictionary with semantic relations for Englishwordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchiesEnglishMiller GA. WordNet: a lexical database for English. Communications of the ACM. 1995;38:39-41.; Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press; 1998. Associative ThesaurusEnglish word association normsword association, semanticsEnglishKiss, G.R., Armstrong, C., Milroy, R., and Piper, J. (1973) An associative thesaurus of English and its computer analysis. In Aitken, A.J., Bailey, R.W. and Hamilton-Smith, N. (Eds.), The Computer and Literary Studies. Edinburgh: University Press. WordNetWordnets for several European languageswordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchiesmultilingual of Contemporary American English (COCA)440 million word corpus of contemporary American EnglishcorpusEnglish of Historical American English (COHA)385 million word corpus of historical American EnglishcorpusEnglishDavies, M. (2011). The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English. Literary and Linguistic Computing 25: 447-65. Dictionary Series (IDS)Comparative lexical databaselexicon, dictionary, vocabularymultilingual The world's largest database of phonological inventoriesCross-linguistic phoneme inventory datatypology, phonemes, phonetics, phonology phoneme inventorymultilingualMoran, S., & McCloy, D., & Wright, R. (eds.) 2014. PHOIBLE Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. of several thousand sound patterns in 500+ languagesphonetics, phonology, typologymultilingual Similarity databaseSimilarity ratings for 51 segmentsphonetics, phonologymultilingual databaselanguage family, language area, genealogy, geography, genealogical data, typology, language historymultilingualBickel, B. & J. Nichols, 2002. Autotypologizing databases and their use in fieldwork. In Proc. Int. LREC Workshop on Resources and Tools in Field Linguistics. Las Palmas, 25-26 May 2002. Korean Corpus70 million Korean eojeol corpuscorpusKorean Parallel Corpus (EPC)Parallel text with up to 60 million words for 20 languagsparallel corpus, corpus, translationmultilingual Bible Corpus (PBC)Parallel corpus of the bible with around 900 languages from around 80 different language familiesparallel corpus, corpus, translationmultilingual Declaration of Human RightsParallel corpus of the declaration of human rights for 400 languagesparallel corpus, corpus, translationmultilingual CorpusSyntactically annotated or POS-tagged corpora with up to 2 billion words for English, French, German and Italian, also includes Italian Wikipedia corpuscorpus, syntax, treebank, part of speech (POS), WikipediaEnglish, French, German, Italian Similarity Judgment Project (ASJP)Word lists of around 6000 languagesword list, typology, vocabularymultilingualWichmann, SĂžren, AndrĂ© MĂŒller, Annkathrin Wett, Viveka Velupillai, Julia Bischoffberger, Cecil H. Brown, Eric W. Holman, Sebastian Sauppe, Zarina Molochieva, Pamela Brown, Harald Hammarström, Oleg Belyaev, Johann-Mattis List, Dik Bakker, Dmitry Egorov, Matthias Urban, Robert Mailhammer, Agustina Carrizo, Matthew S. Dryer, Evgenia Korovina, David Beck, Helen Geyer, Patience Epps, Anthony Grant, and Pilar Valenzuela. 2013. The ASJP Database. catalogue of the world's languagescatalogue, typologymultilingual reference information for the world's languagescatalogue, typologymultilingual parallel corpusA collection of parallel corpora including 71 million sentences for about 30 languagesparallel corpus, corpus, translationmultilingual Collection in The Internet ArchiveMedia files and documents about the languages from the world collected by the Rosetta foundationreference grammar, word listmultilingual Bank Open DataDemographic and geographical data on the world's countriesdemographic data, country, migration, bilingualism, language usenon-linguistic Exclusivity Norms for NounsModality norms from for 400 English nounsperceptual attributes, modality exclusivity, senses, semantics, vision, sight, hearing, touch, taste, smellEnglishLynott, D., & Connell, L. (2013). Modality exclusivity norms for 400 nouns: The relationship between perceptual experience and surface word form. Behavior research methods, 45(2), 516-526. Exclusivity Norms for AdjectivesModality norms from 400 American English participants for 387 adjectivesperceptual attributes, modality exclusivity, senses, semantics, vision, sight, hearing, touch, taste, smellEnglishvan Dantzig, S., Cowell, R. A., Zeelenberg, R., & Pecher, D. (2011). A sharp image or a sharp knife: Norms for the modality-exclusivity of 774 concept-property items. Behavior Research Methods, 43(1), 145-154. and motor attribute ratingsPerceptual and motor attribute ratings for 559 concepts based on 376 American English participantsgraspability, perceptual attributes, semanticsEnglishAmsel, B. D., Urbach, T. P., & Kutas, M. (2012). Perceptual and motor attribute ratings for 559 object concepts. Behavior research methods, 44(4), 1028-1041. experience ratingsSensory experience ratings for 5857 English words based on 63 participantsperceptual attributes, semanticsEnglishJuhasz, B. J., & Yap, M. J. (2013). Sensory experience ratings for over 5,000 mono-and disyllabic words. Behavior research methods, 45(1), 160-168. and naming norms for photographsManipulability ratings and naming RT norms for photographsperceptual attributes, manipulabilitynon-linguistic, familiarity and AOA for photographsManipulability, familiarity and AOA for photographsperceptual attributes, manipulability, age of acquisition (AOA), familiaritynon-linguisticSalmon, J. P., McMullen, P. A., & Filliter, J. H. (2010). Norms for two types of manipulability (graspability and functional usage), familiarity, and age of acquisition for 320 photographs of objects. Behavior Research Methods, 42(1), 82-95. norms for photographs140 color images that have been normed by 106 Spanish speakers on age of acquisition, familiarity, manipulability and other measuresage of acquisition (AOA), perceptual attributes, manipulabilitySpanishMoreno-MartĂ­nez, F. J., Montoro, P. R., & Laws, K. R. (2011). A set of high quality colour images with Spanish norms for seven relevant psycholinguistic variables: The Nombela naming test. Aging, Neuropsychology, and Cognition, 18(3), 293-327. acronym normsPsycholinguistic norms for French acronymsacronyms, reading time (RT), age of acquisition ratings (AOA), subjective frequency, imageabilityFrenchBonin, P., MĂ©ot, A., Millotte, S., & Bugaiska, A. (2014). Norms and reading times for acronyms in French. Behavior research methods, 47(1), 251-267. AOA normsSubjective age-of-acquisition norms for 7,039 Spanish wordsage of acquisition (AOA)SpanishAlonso, M. A., Fernandez, A., & DĂ­ez, E. (2015). Subjective age-of-acquisition norms for 7,039 Spanish words. Behavior research methods, 47(1), 268-274. emotional speechEmotional speech from 470 sentences normed by 1,126 Persian native speakersemotion, emotional speechPersianKeshtiari, N., Kuhlmann, M., Eslami, M., & Klann-Delius, G. (2015). Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD). Behavior research methods, 47(1), 275-294. Property NormsFeature norms for 866 concrete concepts by 123 native speakers of British Englishsemantic features, semanticsEnglishDevereux, B. J., Tyler, L. K., Geertzen, J., & Randall, B. (2014). The Centre for Speech, Language and the Brain (CSLB) concept property norms. Behavior research methods, 46(4), 1119-1127. German affectiveness ratingsValence, arousal, dominance and other ratings for 1,003 German wordsemotion, valence, dominance, arousal, affect, positive, negativeGermanSchmidtke, D. S., Schröder, T., Jacobs, A. M., & Conrad, M. (2014). ANGST: Affective norms for German sentiment terms, derived from the affective norms for English words. Behavior Research Methods, 46(4), 1108-1118. affective normsAffective norms for 1,031 French words by 469 French speakersemotion, valence, dominance, arousal, affect, positive, negativeFrenchMonnier, C., & Syssau, A. (2014). Affective norms for French words (FAN). Behavior research methods, 46(4), 1128-1137. stereotypicality normsGender stereotypicality norms for role nouns in 7 European languagesgender, stereotypes, stereotypicality, sociolinguisticsCzech, English, French, German, Italian, Norwegian, SlovakMisersky, J., Gygax, P. M., Canal, P., Gabriel, U., Garnham, A., Braun, F., ... & Sczesny, S. (2014). Norms on the gender perception of role nouns in Czech, English, French, German, Italian, Norwegian, and Slovak. Behavior research methods, 46(3), 841-871. representatons for 120,000 Italian word formsphonological representation, phonology, transcriptionItalianGoslin, J., Galluzzi, C., & Romani, C. (2014). PhonItalia: a phonological lexicon for Italian. Behavior research methods, 46(3), 872-886. affective normsAffective norms for 1,121 Italian wordsemotion, valence, dominance, arousal, affect, positive, negativeItalianMontefinese, M., Ambrosini, E., Fairfield, B., & Mammarella, N. (2014). The adaptation of the affective norms for english words (ANEW) for Italian. Behavior research methods, 46(3), 887-903. ASL frequencySubjective frequency ratings for 432 ASL signs from 59 native deaf signerssubjective frequency, familiarity, American Sign Language (ASL)ASLMayberry, R. I., Hall, M. L., & Zvaigzne, M. (2014). Subjective frequency ratings for 432 ASL signs._Behavior research methods,_46(2), 526-539. similarity for translation equivalents193 Japanese-English word pairs are rated for phonological and semantic similarityphonological similarity, phonetics, phonology, semantics, semantic similiarity, translation, translation equivalentJapanese;EnglishAllen, D., & Conklin, K. (2014). Cross-linguistic similarity norms for Japanese–English translation equivalents. Behavior research methods, 46(2), 540-563. Free Association normsFree association norms for 139 Portuguse words from children of various ageschildren, language acquisition, word association, free associationPortugueseComesaña, M., Fraga, I., Moreira, A. J., Frade, C. S., & Soares, A. P. (2014). Free associate norms for 139 European Portuguese words for children from different age groups. Behavior research methods, 46(2), 564-574. image normsTurkish AOA, familiarity and other norms for 260 pictures from 277 native Turkish speakersfamiliarity, age of acquisition (AOA), word frequencyTurkishRaman, I., Raman, E., & Mertan, B. (2014). A standardized set of 260 pictures for Turkish: Norms of name and image agreement, age of acquisition, visual complexity, and conceptual familiarity. Behavior research methods, 46(2), 588-595. Lexicon projectReaction times for 2500 single characters and associated lexical norms (frequency, contextual diversity etc.)contextual diversity, word frequency, lexical decision task, reaction times (RT), response latencyChineseSze, W. P., Liow, S. J. R., & Yap, M. J. (2014). The Chinese Lexicon Project: A repository of lexical decision behavioral responses for 2,500 Chinese characters. Behavior research methods, 46(1), 263-273. action normsDutch AOA, word frequency and other norms for 124 line drawingsage of acquisition (AOA), perceptual attributes, actionDutchShao, Z., Roelofs, A., & Meyer, A. S. (2014). Predicting naming latencies for action pictures: Dutch norms. Behavior research methods, 46(1), 274-283. object normsManipulability, graspability and pantomimability norms by French speakers for 560 photographsiconicity, manipulability, movability, perceptual attributes, graspability, semanticsnon-linguisticGuérard, K., Lagacé, S., & Brodeur, M. B. (2014). Four types of manipulability ratings and naming latencies for a set of 560 photographs of objects. Behavior research methods, 47(2), 443-470. Basic Vocabulary Database210 vocabulary items in almost 1000 Austronesian languagesbasic vocabulary, dictionary, AustronesianAustronesianGreenhill, S. J., Blust, R., & Gray, R. D. (2008). The Austronesian basic vocabulary database: from bioinformatics to lexomics. Evolutionary bioinformatics online, 4, 271-283. Basic Vocabulary Database430 vocabulary items from 10 Bantu languagesbasic vocabulary, dictionary, BantuBantu Lexical Cognacy Database207 vocabulary items in 150 Indo-European languagesbasic vocabulary, dictionary, Indo-European, cognatesIndo-European A digital library of language relationshipsResource for language relatedness and genealogy; contains trees for many language familieslanguage family, genealogy, linguistic history, reconstruction, protolanguagemultilingual - Reference LexiconAround 60,000 lexical entries for around 500 African languages with phonotactic and cognacy codingAfrica, cognacy, basic vocabulary, dictionary, language historymultilingual World Phonotactics DatabasePhonotactic data for over 2000 languages and segmental data for around 4700 languagestypology, phonemes, phoneme inventory, phonology, phonotactics, segmentsmultilingualDonohue, M., Hetherington, R., McElvenny, J., & Dawson, V. (2013). World phonotactics database. Department of Linguistics, The Australian National University. Value SurveyData on socioeconomic and demographic variables, including language background, for over 85,000 respondents in 57 countrieslanguage use, bilingualism, demographic datamultilingualWorld Values Survey Association (2009). World Values Survey 1981-2008 Official Aggregate v. 20090901. Madrid: ASEP/JDS. Corpus of Casual French35 hours of orthographically annoted high-quality recordings with 46 French speakers conversing among friends.phonetics, video, speech, annotated corpusFrenchTorreira, F., Adda-Decker, M., & Ernestus, M. (2010). The Nijmegen Corpus of Casual French. Speech Communication, 52, 201-221. LexiconMulti-accent dictionary of EnglishEnglish dialects, lexicon, accentsEnglishFitt, S. (2002). Unisyn lexicon release. The Center for Speech Technology Research, University of Edinburgh. speech technology lexiconMulti-accent dictionary of EnglishEnglish dialects, lexicon, accentsEnglishFitt, S., & Richmond, K., & Clark, R. Combilex. datasetArticulatory EMA, MRI, video, audio and 3D scan data frome one British male speakerarticulation, articulatory data, MRI, EMA, video, phonetics, speech productionEnglishRichmond, K. (2011). Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus. In Proc. Interspeech, pages 1505-1508, Florence, Italy, August 2011.; Steiner, I., Richmond, K., Marshall, I., & Gray, C. D. (2012). The magnetic resonance imaging subset of the mngu0 articulatory corpus. Journal of the Acoustical Society of America, 131(2), 106-111. adjective norms306 words that are categorized for various haptic properties such as roughness and weightperceptual attributes, semantics, feelingEnglishStadtlander, L. M., & Murdoch, L. D. (2000). Frequency of occurrence and rankings for touch-related adjectives. Behavior Research Methods, Instruments, & Computers, 32(4), 579-587. Communicative Devleopment Inventories (MCDI)Database of children's early vocabulary development and gesturesdevelopmental, language acquisition, early vocabulary, age of acquisition (AOA), gesture, multimodalEnglish, Danish, Norwegian, Turkish, Spanish, Russian, Mandarin, Swedish, German, Cantonese, Italian, Croatian, HebrewJĂžrgensen, R. N., Dale, P. S., Bleses, D., & Fenson, L. (2010). CLEX: A cross-linguistic lexical norms database. Journal of child language, 37(02), 419-428. Phonotactic Online Dictionary (IPhOD)Collection of English words and pseudowords with respect to number of phonological variablesphonotactics, biphoneme probability, bigram probability, triphoneme probability, trigram probability, segments, phonemes, syllablesEnglishVaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0. database with stress and accent patterns 750 languagestypology, stress, accentmultilingual Phonology Lab Stress Pattern DatabaseDominant stress patterns of the world's languagestypology, stress, accentmultilingual Typology DatabaseAnaphora database with example sentencestypology, anaphora, syntaxmultilingualDimitriadis, A., Everaert, M., Reinhart, T., & Reuland, E. (2005). Anaphora Typology Database. Personal Pronoun System databasePersonal pronoun system's of the worlds languagestypology, syntax, morphology, personal pronoun, morphosyntaxmultilingual Database on ReduplicationDatabase that contains reduplication patterns of the world's languagestypology, reduplication, syntax, morphology, morphosyntaxmultilingualHurch, B. (2005-). Graz Database on Reduplication. Database of Intensifiers and ReflexivesDatabase that contains intensifiers and reflexive patterns of the world's languagestypology, intensifier, reflexives, syntaxmultilingualGast, V., D. Hole, E. König, P. Siemund, S. Töpper (2007). Typological Database of Intensifiers and Reflexives. Version 2.0. Phonological Segment Inventory Data (UPSID)Contains phonological inventories for 451 languagesphonology, phoneme inventory, typologymultilingualMaddieson, I. (1984). Patterns of sounds. Cambridge studies in speech science and communication. Cambridge: Cambridge University Press. of Pidgin and Creole Language Structures (APiCS)Grammatical and lexical structures of 75 pidgin and creole languagesphonology, lexicon, negation, syntax, morphology, morphosyntax, typology, pidgin & creole languages, word ordermultilingualMichaelis, S. M.,Maurer, P.,Haspelmath, M., & & Huber, M. (eds.) (2013). Atlas of Pidgin and Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Patterns Leipzig Online Database (ValPal)Valency patterns of 36 languagestypology, syntaxmultilingualHartmann, I., Haspelmath, M., & Taylor, B. (Eds.) (2013). Valency Patterns Leipzig. Leipzig: Max Planck Institute for Evolutionary Anthropology. World Atlas of Varieties of EnglishOver 235 linguistic features mapped for 50 varieties of EnglishEnglish dialects, phonology, lexicon, morphology, syntax, discourse, word order, tense, aspectmultilingualKortmann, B., & Lunkenheimer, K. (Eds.) (2013). The Electronic World Atlas of Varieties of English. Leipzig: Max Planck Institute for Evolutionary Anthropology. System's of the World's languagesData on numeral systems for about 4000 languages of the worldtypology, numeral systemsmultilingual world-wide survey of affix borrowingA database of 101 languages where affixes have been borrowed (total of 657 affixed)typology, affixes, morphology, morphosyntaxmultilingualSeifart, F. (2013). AfBO: A world-wide survey of affix borrowing. Leipzig: Max Planck Institute for Evolutionary Anthropology. American Indigenous Language Structures (SAILS)A database of 604 linguistic features from 167 American Indigenous languagestypology, syntax, morphology, morphosyntax, phonology, tense, aspect, evidentiality, word order, agreementmultilingualMuysken, Pieter, Harald Hammarström, Olga Krasnoukhova, Neele MĂŒller, Joshua Birchall, Simon van de Kerke, Loretta O'Connor, Swintha Danielsen, Rik van Gijn & George Saad. 2014. South American Indigenous Language Structures (SAILS) Online. Leipzig: Online Publication of the Max Planck Institute for Evolutionary Anthropology. (Available at list of English homophoneshomophones, lexiconmultilingual SquaredPolitical debates with transcripts and votes by audience memberspolitics, debate, argument, corpusEnglish Affective Word List (NAWL) for PolishEmotional valence, arousal and imageability ratings for 2,902 Polish wordsemotion, valence, arousal, positive, negative, imageability, word frequency, part of speech (POS), word lengthPolishRiegel, M., Wierzba, M., Wypych, M., Ć»urawski, Ɓ., JednorĂłg, K., Grabowska, A., & Marchewka, A. (2015). Nencki Affective Word List (NAWL): the cultural adaptation of the Berlin Affective Word List–Reloaded (BAWL-R) for Polish. Behavior research methods, 1-15. emotion norms for German (DENN-BAWL)Discrete emotion ratings for for about 2000 German nounsemotion, valence, arousal, positive, negativeGermanBriesemeister, B. B., Kuchinke, L., & Jacobs, A. M. (2011). Discrete emotion norms for nouns: Berlin affective word list (DENN–BAWL). Behavior research methods, 43(2), 441-448. Affective Word List Reloaded (BAWL-R)Emotional arousal and valence ratings for about 2900 German nounsemotion, valence, arousal, positive, negativeGermanVĂ”, M. L., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M. J., & Jacobs, A. M. (2009). The Berlin affective word list reloaded (BAWL-R). Behavior research methods, 41(2), 534-538. Language Acquisition ResourcesContains transcribed corpora with audio from several languages relevant for second language acquisition researchsecond language acquisition (SLA), L2, bilingualismmultilingual Database for Phonological DevelopmentContains corpora and phonological information on child language developmentlanguage development, language acquisition, phonetics, phonology, clinical corporaEnglish, French, Portuguese, German, Swedish, Dutch, Indonesian, Japanese, Taiwanese, Cantonese, Greek, Arabic, Berber, Romanian, Polish Language Data Exchange System (CHILDES)Transcribed and annotated child language corpora for several languageslanguage development, language acquisition, annotated corpus, corpora, clinical corporaCeltic, Irish, Welsh, Cantonese, Chinese, Indonesian, Japanese, Korean, Taiwanese, Thai, English, Afrikaans, Dutch, Danish, German, Icelandic, Norwegian, Swedish, Catalan, Spanish, French, Italian, Portuguese, Romanian, Croatian, Polish, Russian, Serbian, Slovenian CHILDES frequency toolOnline access tool to CHILDES word frequency datalanguage development, language acquisition, word frequencyEnglishBaath, R. (2014). ChildFreq: An online tool to explore word frequencies in child language. SubtitlesOver three million subtitle files for data from several languagessubtitles, corpus, parallel corpusmultilingual Arabic CorpusMorphological annotation, syntactic treebank and semantic ontology for the entire Holy QuranQuran, corpus, treebank, morphological annotation, syntax, semantics, ontologyArabic A Representative Corpus of Historical English RegistersA multi-genre English corpus ranging from 1600 to 1999language history, historical corpus, registersEnglish Corpus of Late Modern English Texts34 million words of running text from 1710 to 1920language history, historical corpus, registersEnglishDiller, H., De Smet, H., Tyrkkö, J. (2011). A European database of descriptors of English electronic texts. The European English Messenger 19, 21-35. Corpus of English TextsContains 1.5 million words from English texts ranging from 730 AD to 1710 ADOld English, Middle English, Early Modern English, language history, historical corpusEnglish Corpus of Older Scots (HCOS)Contains 0.8 million words of Scottish English from 1450 AD to 1700 ADScottish, language history, historical corpusEnglishThe Helsinki Corpus of Older Scots (1995). Department of Modern Languages, University of Helsinki. Compiled by Anneli Meurman-Solin. Parsed Corpus of Old EnglishSyntactically annotated and POS-tagged corpus of Old EnglishOld English, language history, historical corpus, syntaxEnglish Parsed Corpus of Old English Prose (YCOE)A corpus of 1.5 million words of Old English texts, syntactically annotated and POS-taggedOld English, prose, language history, historical corpus, syntaxEnglish Parsed Corpus of Old English PoetryCorpus of Old English poetry, syntactically annotated and POS-taggedOld English, poetry, language history, historical corpus, syntaxEnglish Corpora of Historical English (PPCME2, PPCEME, PPCMBE)Middle English, Early Modern English and Modern English corpora, syntactically annotated and POS-taggedMiddle English, Early Modern English, language history, historical corpus, syntaxEnglish University of Oxford Text Archive (OTA)Text archives (with some audio and video data) for lots of English texts from many different time periodshistorical corpus, text archive, corpora, language historyEnglish of Early English Correspondence (CEEC)Compiled with historical sociolinguistics in mind, a more than 6 million word corpus of English correspondences (1410-1800) from thousands of writershistorical corpus, letters, language history, corrspondence, sociolinguisticsEnglish Lampeter Corpus of Early Modern English TextsEnglish texts from 1640 to 1740 within the categories religion, politics, economy, science, law and miscellaneoushistorical corpus, language history, Early Modern EnglishEnglish of Contemporary Arabic (CCA)A corpus of 0.8 million Arabic wordscorpusArabic Latin LibraryA collection of Latin texts from several authorscorpusLatin Web as Corpus (NoWaC)A web-based corpus of 700 million Norwegian wordscorpusNorwegianGuevara, Emiliano Raul (2010). NoWaC: a large web-based corpus for Norwegian. In Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop, Association for Computational Linguistics, 1 - 7. CorpusGerman news corpus of 0.9 tokens from the Frankfurter Rundschaucorpus, news, mediaGermanBrants, S., Dipper, S., Eisenberg, P., Hansen, S., König, E., Lezius, W., Rohrer, C., Smith, G., & Uszkoreit, H. (2004). TIGER: Linguistic Interpretation of a German Corpus. Journal of Language and Computation, 2004 (2), 597-620. Russian CorpusRussian corpus of different genres with transliterationscorpusRussian Larner Corpus (ALC)0.2 million Arabic words produced from 942 students from 66 different L1 backgroundslearner corpus, second language acquisition (SLA), bilingualismArabic Gazette de RenaudotHistorical corpus of French gazettes/newspapershistorical corpus, language historyFrench Student English Corpus (USE)Corpus of 1500 essays written by 440 Swedish university studentslearner corpus, second language acquisition (SLA), bilingualismSwedish of English Dialogues (CED)Corpus of 1.1 million words of English dialogues (spoken interactions) from 1560-1760historical corpus, dialogue, language history, corpora, Early Modern EnglishEnglish GutenbergA collection of 50000 free ebooksbook collection, text archiveEnglish York Times Article ArchiveA collection of all New York Times articles starting with 1851 to presenttext archive, news, media, newspaperEnglish Age of Acquisition and Imageability ratingsImageability and age of acquisition norms for a set of 2645 English wordsage of acquisition (AOA), imageabilityEnglish LexiconA list of 6800 positive and negative English opinion wordsopinion mining, sentiment analysis, emotional valence, positive, negativeEnglish Product Review DataMore than 5.8 million reviews of Amazon productscorpus, media, Amazon, reviewsEnglish Challenge Review DataAbout 1.6 million reviews from 360000 Yelp userscorpus, media, Yelp, reviewsEnglish lexical resource for opinion miningemotional valence, affect, opinion mining, sentiment analysis, wordnet, positive, negativeEnglish of Dialect TopograhyCross-regional dialect topography, largely focused on Canadasociolinguistics, dialects, Canada, Canadian English, regional variantsEnglish Australian Indigeneous Languages DatabaseClassification and language information on Australian languages, including mapsAustralian languages, classification, geography, map, speaker information, language usemultilingual Bayrische DialektdatenbankDatabase of Bavarian German dialectsdialects, Bavaria, Germany, sociolinguistics, mapGerman Collection PanglossDatabase of audio materials from several of the world's languagesaudio recordings, typology, world's languagesmultilingual Magazine Corpus100 million word corpus of TIME magazinecorpus, media, news, magazineEnglish corpus1.9 billion word corpus from Wikipedia (4.4 million articles)corpus, WikipediaEnglish of Global Web-Based English (GloWbE)1.9 billion word corpus from 1.8 million web pagescorpus, web language, web corpusEnglish of Canadian English (STRATHY)50 million word corpus of Canadian English ranging from 1920 to 2000corpus, Canada, historical corpus, language historyEnglish DEL ESPAÑOL100 million word corpus from 20000 Spanish texts spanning a time range from 1200 to the 1900scorpus, language history, historical corporaSpanish CORPUS DO PORTUGUES45 million word corpus of Portuguese spanning from 1300 to 1900corpus, language history, historical corporaPortuguese Corpus of Sanskrit (DCS)3.2 million words of Sanskrit with collocatescorpus, language history, historical corporaSanskrit Tonal Database (Xtone)Information on lexical tone from 82 different languagestypology, phonology, lexical tone, tonal systemsmultilingual Linguae SericaeA historical and comparative encyclopedia of Chinese conceptual schemes, with corpora and semantic relationslanguage history, historical corpus, semantic relations, encyclopedia, historical phonologyChinese and Tibeto-Burman language corporaTranscribed and translated texts of Tibeto-Burman languagesTibeto-Burman, corpus, corpora, endangered languagesAhom, Aiton, Khamti, Khamyang, Singpho, Turung, Tangsa TreebankParsed corpus of 360000 Chinese wordstreebank, corpus, syntaxChinese Morpho-Syntax Database (RMS)Database of linguistic features of Romanigrammatical database, syntax, morphologyRomani English-Irish corpus of legal texts4.5 English words of legal texts with Irish translationsparallel corpus, corpus, translation, law, legal textsEnglish, Irish Nouveau Corpus d'AmsterdamOld French literary texts between 11th and 14th centurylanguage history, historical corpus, Old FrenchFrench Familienamenbank300000 Dutch names and their locations in the Netherlandsonomastics, names, geographyDutch Synchronous Corpus550 million word Chinese corpuscorpusChinese wordnet of the languages of Indiawordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchiesHindi, Assamese, Bengali, Bodo, Gujarati, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Odiya, Punjabi, Sanskrit, Tamil, Telugu, Urdu WordNetA wordnet of Hindiwordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchiesHindi 150 million word corpus of HebrewcorpusHebrew wordnet of Germanwordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchiesGerman Languages DatabaseFrisian database containing audio and written corpora, including historical oneshistorical corpus, spoken corpus, audio, language history, Germanic languagesFrisian Archives ProgrammeA text archive of endangered languagestext archive, endangered languagesmultilingual and translations for 21 million expressions in about 10,000 language varieties, including Swadesh lists for about 2000 language varietiesvocabulary, word list, dictionary, Swadesh listmultilingual Language Materials ProjectContains teaching and learning materials for over 150 less commonly taught languages, including speaker and other information about the languagesspeaker information, bilingualism, language use, demographic datamultilingual Buckeye Speech CorpusCorpus of high-quality recordings from 40 speakers in Columbus, Ohio, orthographically transcribed and phonetically labelledaudio corpus, annotated corpus, phonetics, phonology, speechEnglish Switchboard Corpus in NXTUpdated annotations of the Switchboard corpus of telephone conversations, annotatedannotated corpus, prosody, syntax, speech, conversational speech, telephone conversationEnglish CorpusCorpus of English, Dutch and German with additional lexical informationcorpus, word frequencyEnglish, Dutch, German Acoustic-Phonetic Continuous Speech CorpusAudio corpus of 630 speakers of eight American English dialects with time-aligned orthographic, phonetic, and word transcriptionsannotated corpus, speech, phonetics, phonology, audio corpus, English dialectsEnglish FSD First Story Detection CorpusCorpus of \first stories\ (new events) from twittercorpus, web language, social media, web corpusEnglish Twitter N-Gram CorpusN-grams (up to 6-grams!) for 75 million English tweetscorpus, ngrams, word frequency, web language, social mediaEnglish corpusA corpus of tweets collected January and February 2011corpus, web languageEnglish FSD First Story Detection CorpusA corpus of \first stories\ (new events) from newswirecorpus, web language, newspaper, media, political languageEnglish Repubblica CorpusA corpus of 380 million tokens of Italian newspaper texts, POS-tagged, lemmatized and genre categorizedcorpus, genre, topic, syntax, part of speech (POS), newspaper, media, political languageItalian of New Zealand English (ONZE) CorpusA corpus of various stages of New Zealand Englishaudio corpus, phonetics, phonology, language history, historical corpusEnglish of noise-induced Spanish misperceptions/confusionsA corpus of 3235 noise-induced robust misperceptions in Spanishcorpus, phonetics, phonology, speech perceptionSpanish evaluation benchmarksHuman similarity ratings for over 3000 word pairs, including syntactic relationssemantics, similarity, semantic relatednessEnglish, German, French, Arabic, Romanian, Spanish English word vectorsLarge lexicon with thesaurus, antonyms, color, connotations and valence information extracted through NLP proceduressemantics, lexicon, sentiment analysis, affect, emotional valence, antonymsEnglish Relations from WikipediaA dataset of automatically extracted semantic relations from the multilingual Wikipedia corpussemantics, semantic relationsFrench, Russian, Chinese, Arabic, Hindi, Indonesian, Tagalog, Latvian, Swahili, Georgian Formal/Informal Address CorpusCorpus of English and German sentences from novels tagged for formal and informal connotations, tokenized, lemmatized, POS-taggedannotated corpus, politeness, formal languageGerman, English SALSA CorpusA large frame-based lexicon for German with semantic rolessemantic roles, frames, framenetGerman projection of semantic rolesParallel corpora annotated for semantic rolesparallel corpus, corpus, translation, semantic rolesGerman, English lexical database of English that specifies semantic frames and semantic roles, more than 10000 sensesframenet, lexical database, dictionary, vocabulary, semantic relationsEnglish Reference CorpusPragmatically annotated corpus with information about coreference and bridgingreference, discourse, pragmatics, annotated corpus, entailment inference, coreferenceEnglish Memory semantic databaseSemantic database of English based on distributional informationlexicon, semantic relatedness, relations, corpus-based semantics, co-occurrenceEnglish Entailment Search Task Dataset for GermanA corpus of 3000 text/hypothesis pairs derived from web forum poststextual entailment, semantic inference, pragmatics, corpus, web languageEnglish German Derivational LexiconA derivational lexicon for Germanmorphology, lexicon, dictionary, lemmaGerman Memory for CroatianSemantic database of Croatian based on distributional informationlexicon, semantic relatedness, relations, corpus-based semanticsCroatian Semantic Relatedness DatasetA dataset of dataset of normed semantic similarity (rather than just word associations)semantics, semantic similarity, semantic relatedness, relations, concreteness, word associationEnglish Word AssociationsA dataset of word associations in Dutchsemantics, word associationDutch CorporaVariety of corpora and datasets built into the NLTK python librarynatural language processing, python, brown, australian broadcasting, alpino dutch treebank, treebank, CONLL, Europarl, Genesis, bible, gazeteer, C-Span, Gutenberg, KNB corpus, sentiment, NPS chat, opinion lexicon, multilingual wordnet, penn treebank, sentiwordnetEnglish, Portuguese, Spanish, Basque, Old English, Mandarin Chinese, Polish, Brazilian Portuguese recordings with EPG, laryngograph, nasal airflow, and audioarticulatory phonetics, articulation, speech production, Rhotenberg maskCatalan, English, French, German, Irish Gaelic, Italian, Swedish with audio, laryngograph and EMA recordings for English, constructed with the intention of training an automatic speech recognition systemarticulation, articulatory phonetics, speech production, electromagnetic articulography, tongue recordingEnglish Diphone Perceptual DatabasePhoneme categorizations based on a gated listening taskspeech perception, phonetics, phonology, psycholinguistics, phonetic information over timeEnglish Diphone Perceptual DatabaseA total of 488,520 phoneme categorizations based on a gated listening task of 1,179 Dutch diphonesspeech perception, phonetics, phonology, psycholinguistics, phonetic information over timeDutch / San Carlos CorporaCollection of corpora of contemporary Portuguese, with part of speech tags (POS-tagged)corpus, annotated corpusPortuguese Comparative Portuguese corpusLarge corpus containing texts from several varieties of Portuguese (European, Brazil, Angola, Cape Verde, Guinea-Bissau, Mozambique, Sao Tome and Principe, Goa, Macau, Timor-Leste)corpus, dialectal corpus, sociolinguisticsPortuguese Medieval Portuguese corpusHistorical corpus of medieval Portuguesehistorical corpus, language history, classical & medieval PortuguesePortuguese Spoken Portuguese CorpusSpoken corpus of Brazilian Portuguesespoken corpus, audio recordings, phonetics, phonologyPortuguese Historical Portuguese corpusOfficial historical corpus of the \A history of Brazilian Portuguese\ projecthistorical corpus, language historyPortuguese Little Red Hen Lab DatabasesResource directory for the UCLA NewsScape Library of International Television News; a TV News Archive that contains news programssemantics, gesture, phonetics, corpus, television (TV), media, politics, news, multimodal corpusEnglish Corpus of Spoken ChineseSpoken corpus of Chinesespoken corpus, audio recordings, phonetics, phonologyMandarin Chinese National Database of Spoken Language (ANDOSL)Phonetically annotated spoken language corpus of Australian Englishspoken corpus, audio recordings, phonetics, phonology, phonetically annotatedAustralian English Pedagogic Corpus of Video-Recorded InterviewsSpoken interviews with video recordings for several European languages, including second language recordingsspoken corpus, multimodal corpus, video, second language acquisition (SLA), bilingualismEnglish, French, German, Polish, Spanish, Turkish Lessicale Toscano (ATL Lexical Atlas of Tuscany)Lexical atlas and demographic data; dialectal resource for Tuscan dialects in Italysociolinguistics, dialects, lexical atlas, language geography, dialectology, Italian dialectsItalian 1T 5-gram ngrams for 10 European languagesN-grams (up to 5-grams) and frequency counts for 10 European languagesn-grams, word frequency, GoogleSwedish, Spanish, Romanian, Portuguese, Polish, Dutch, Italian, French, German, Czech 1T 5-gram database for DutchN-grams and frequency counts for Dutchn-grams, word frequency, GoogleDutch Twitter CorpusDutch twitter corpus containing approximately 2.6 billion tweets and 28 billion tokens collected between January 2014 and December 2014, n-gram parsed up to 5-gramsn-grams, twitter, web corpus, web language, social mediaDutch Barbara Corpus of Spoken American English (SBCSAE)249,000 words with transcriptions, audio and timestampsspoken corpus, audio recordings, phonetics, phonology, phonetically annotatedEnglish GECO Phonetic Convergence database46 dialogs (ca. 25 min long) between female German speakers, in speaker-visible and speaker-invisible contexts for the study of phonetic convergencespoken corpus, audio recordings, phonetics, phonology, multimodal corpus, phonetic convergence, accommodation, interpersonal synchrony, sociolinguistics, sociophoneticsGerman;page=mbrowseMICASE Michigan Corpus of Academic Spoken English152 transcripts totaling 1.8 million words of academic spoken Englishspoken corpus, university language, registers, formal languageEnglish Authorship corpus681288 posts totaling 140 mio words from 19,320 bloggers, collected in 2004, balanced for gender; with age, gender and industry/occupation informationcorpus, web language, social media, web corpus, demographic data, sociolinguisticsEnglish Australian National CorpusCollection of Australian English corpora (including ACE, ART, AusLit, Braided Channels, COOEE, GCSAusE, ICE Corpus, MD Corpus, Monash Corpus); includes many registers and different time periods and transcribed speech from sociolinguistic interviews with gender informationcorpus, Australian English, dialects, spoken corpus, written language, literature, poetry, historical corpus, language history, varieties of EnglishEnglish Corpus of Child Mandarin (TCCM)Taiwan Corpus of Child Mandarin (TCCM)corpus, child language, language acquisition, L1, children, learner corpusChinese 2007 Danger and Usefulness NormsA published research article that includes ratings for the danger and usefulness of wordssemantics, danger, usefulness, semantic norms, meaning, perceptual attributesEnglish Feature Production NormsSemantic feature production norms for a 456 words (objects and events)semantic features, semantics, feature norms, distinctive features, objects and events, propertiesEnglish Perceptual Attribute Ratings Database (MCWisc)Perceptual attribute norms for four sensory domains: sound, color, manipulation, motion; for 1402 words, including emotion ratings reflecting intensity and valenceperceptual attributes, manipulability, semantics, concepts, perception, manipulability, valence, affect, feelingEnglish of Paivio normsExtension of Paivio et al. (1968) lexical normsgender ladenness, sexual language, stereotypes, age of acquisition (AOA), number of meanings, number of associates, emotionality, pleasantness, emotional valence, children's dictionaries, concreteness, meaningfulness, goodness, word frequency, imagery, imageability, language acquisition, children's word knowledge, lexical knowledge, word knowledgeEnglish marques personnelles dans les languages africainesDatabase of personal pronouns of African languagestypology, morphosyntax, morphology, syntax, personal pronouns, person markingmultilingual Konstanz Universals ArchiveA list of proposed typological universalslanguage typology, Greenbergian universals, morphology, syntax, morphosyntax, word ordermultilingual grammatische RaritÀtenkabinettInformal list of grammatical rarities / typologically rare featurestypology, universals, rare features, syntax, morphosyntaxmultilingual
http://www.soundcomparisons.comSound comparisonsComparative atlas and map with audio samples of Germanic, Romance, Slavic, Celtic, Andean and Mapudungunlanguage geography, comparative linguistics, dialectology, Indo-European languages, pronunciation, sound patterns, phonetics, phonology, cognates, cognacymultilingual, including English, German, French, Italian, Spanish and Portuguese map / linguistic atlas showing the location of languages and visualizing linguistic diversity across the globelanguage geography, linguistic diversity, typology, map, endangered languages, atlasmultilingual Syntactic Structures of the World's LanguagesTypological database with syntactic features for 250+ languages of the worldtypology, morphology, morphosyntax, syntax, word order, universalsmultilingual Mon-Khmer Etymological DictionaryDictionary for comparative and historical linguistics of Mon-Khmer languagesetymology, dictionary, lexical data, language history, historical linguistics, phylogenetics, comparative dictionaries, Asian languagesmultilingual Lexicon Project (Pollex Online)Large-scale comparativ dictionary of Polynesian languagesPolynesian, Austronesian, lexical data, comparative dictionary, cognacy, cognates, historical linguistics, Pacific languages, word listsmultilingual of languages from the Trans-New Guinea family and friends, encompassing 900+ languages and info on 1000+ wordsPacific languages, Trans-New Guinea family (TNG), Papua New Guinea (PNG), language history, historical linguistics, linguistic diversity, Austronesianmultilingual Global Lexicostatistical Database (GLD)Basic word lists for many of the world's languagescomparative linguistics, historical linguistics, phylogenetics, lexicostatistics, basic vocabulary, word lists, Swadesh listmultilingual Phonological Systems Database (LAPSyD)Searchable database of basic phonological information on a wide sample of the world's languagesphonological typology, phoneme inventory, phonology, phonetics, consonants, vowels, syllable structure, linguistic stress, lexical tonesmultilingual Glasgow Normsnormative ratings for 5,553 English words on nine psycholinguistic dimensions: arousal, valence, dominance, concreteness, imageability, familiarity, age of acquisition, semantic size, and gender associationEnglishScott, G. G., Keitel, A., Becirspahic, M., Yao, B., & Sereno, S.C. (2018). The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research Methods, 51, 1258–1270. Nonword Database358,534 nonwordsRastle, K., Harrington, J., & Coltheart, M. (2002). 358,534 nonwords: The ARC Nonword Database. Quarterly Journal of Experimental Psychology, 55A, 1339-1362. Canadian conceptual familiarity norms3,596 nouns and online data about them from 313 Canadian French speakersFrench World of WordsWord association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues in English, 12,571 cues in DutchDutch