Linguistics Stimuli

The following is forked with permission from (and almost identical to) The Language Goldmine.

URL	Title	Description	Tags	Languages	Associated Publication
http://concepticon.clld.org/	Concepticon	Links 9611 concepts from 51 different concept lists to 2206 different concept sets, 243 relations between concepts are defined	semantics, concepts, lexicon structure, vocabulary	multilingual
http://clics.lingpy.org/	Database of Cross-Linguistic Colexifications	Gives polysemy information for 221 different languages covering 64 families (more than 300000 words and 10000 concepts)	semantics, concpts, polysemy, lexicon structure, vocabulary, typology	multilingual	List, J.-M., Terhalle, A., & Urban, M. (2013). Using network approaches to enhance the analysis of cross-linguistic polysemies. Proceedings of the 10th International Conference on Computational Semantics (pp. 347-353). Association for Computational Linguistics.
https://archive.org/details/tv	The TV News Archive	Contains more than 705,000 captioned and searchable news programs from over 4 years of U.S. television networks	semantics, gesture, phonetics, corpus, TV, media, politics, news, multimodal corpus	English, Spanish
http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-nl	SUBTLEX-NL	Dutch word frequencies based on 44 million words from film and television subtitles	word frequency, contextual diversity	Dutch	Keuleers, E., Brysbaert, M. & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods, 42(3), 643-650.
http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-ch	SUBTLEX-CH	Chinese word frequencies based on 33.5 million words from film and television subtitles	word frequency, part of speech (POS), lexical decision task, reaction times (RT), response latency	Chinese	Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS One, 5(6), e10729.
http://www.bcbl.eu/databases/subtlex-gr/	SUBTLEX-GR	Modern Greek word frequencies based on 23 million words from film and television subtitles	word frequency, orthographic neighborhood density, orthgraphic levensthein distance, contextual diversity	Greek	Dimitropoulou, M., Duñabeitia, J., Avilés, A., Corral, J.& Carreiras, M. (2010). Subtitle-based word frequencies as the best estimate of reading behaviour: the case of Greek.Frontiers in Psychology, 1:218, 1-12.
http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-pl	SUBTLEX-PL	Polish word frequencies based on 101 million words from film and television subtitles	word frequency	Polish	Mandera, P., Keuleers, E., Wodniecka, Z., & Brysbaert, M. (2014). Subtlex-pl: subtitle-based word frequency estimates for Polish. Behavior research methods, 47(2), 471-483.
http://crr.ugent.be/archives/1423	SUBTLEX-UK	British English word frequencies based on 201.3 million words from 45,099 BBC broadcasts	word frequency, contexutal diversity, word frequency in childrens programs, part of speech (POS), bigram frequencies	English	Van Heuven, W.J.B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). Subtlex-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176-1190.
http://crr.ugent.be/archives/534	SUBTLEX-DE	German word frequencies of 25.4 million words from film and television subtitles	word frequency	German	Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A.M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412-424
http://www.dlexdb.de/	Digitales Woerterbuch der deutschen Sprache (dlexDB)	Over 100 million German word tokens, neighborhood densities and bigram and trigram probabilities based on different registers	word frequency, bigram probability, trigram probability, neighborhood density, conditional probability	German	Heister, J., Wuerzner, K. M., Bubenzer, J., Pohl, E., Hanneforth, T., Geyken, A., & Kliegl, R. (2011). dlexDB-A lexical database for the psychological and linguistic research. Psychologische Rundschau, 62(1), 10-20.
http://wortschatz.uni-leipzig.de/	Leipzig Wortschatz Lexicon	German thesaurus and lexical network	thesaurus, lexical network	German
http://corpora2.informatik.uni-leipzig.de/	Leipzig Corpus Collection	Contains frequencies and co-occurrence information for 219 languages	word frequency, corpus	multilingual	Quasthoff, U., Richter, M., Biemann, C. (2006). Corpus Portal for Search in Monolingual Corpora. Proceedings of the fifth international conference on Language Resources and Evaluation, LREC 2006, Genoa, pp. 1799-1802.
http://crr.ugent.be/archives/806	Kuperman English Age-of-acquisition ratings	Age-of-acquisition ratings for 30,000 English words.	age of acquisition (AOA)	English	Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990.
http://crr.ugent.be/archives/1003	Warriner English Affective Ratings	Valence, arousal and dominance ratings for 13,915 English words	emotion, valence, dominance, arousal, affect, positive, negative	English	Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207.
http://crr.ugent.be/archives/1330	Brysbaert English Concreteness Ratings	Concreteness ratings for 40,000 English words	concreteness	English	Brysbaert, M., Warriner, A.B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904-911.
http://crr.ugent.be/archives/1602	Brysbaert Dutch Age-of-acquisition & Concreteness ratings	Age-of-acquisition and concreteness ratings for 30,000 Dutch words	concreteness, age of acquisition (AOA)	Dutch	Brysbaert, M., Stevens, M., De Deyne, S., Voorspoels, W., & Storms, G. (2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica, 150, 80-84.
http://crr.ugent.be/archives/878	Moors Dutch Affective Ratings	Valence, arousal and dominance ratings for 4,300 Dutch words	emotion, valence, dominance, arousal, affect, positive, negative	Dutch	Moors, A., De Houwer, J., Hermans, D., Wanmaker, S., van Schie, K., Van Harmelen, A. L., De Schryver, M., De Winne, J., & Brysbaert, M. (2013). Norms of valence, arousal, dominance, and age of acquisition for 4,300 Dutch words. Behavior research methods, 45(1), 169-177.
https://sites.google.com/site/frenchlexicon/results	French Lexicon Project	Lexical decision data for 38,840 French words and 38,840 pseudowords	lexicon project, psycholinguistic database, reaction times (RT), response latency, word frequency	French	Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Méot, A., Augustinova, M., & Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42, 488-496.
http://www.lexique.org/	Lexique	French lexical database for 135, 000 words	lexicon project, psycholinguistic database, reaction times (RT), response latency, word frequency	French
http://link.springer.com/article/10.3758%2FBRM.42.4.992	Malay Lexicon Project	Malay lexical database for 9,592 words	lexicon project, psycholinguistic database, reaction times (RT), response latency, word frequency	Malay	Yap, M. J., Liow, S. J. R., Jalil, S. B., & Faizal, S. S. B. (2010). The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior research methods, 42(4), 992-1003.
http://elexicon.wustl.edu/	English Lexicon Project (ELP)	English lexical database for 40,481 words	lexicon project, psycholinguistic database, reaction times (RT), response latency, lexical decision task, word naming, contextual diversity, neighborhood density, bigram probability, part of speech (POS), levenshtein distance, SUBTLEX	English	Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English lexicon project. Behavior research methods, 39(3), 445-459.
http://journal.frontiersin.org/article/10.3389/fpsyg.2010.00174/abstract	Dutch Lexicon Project	Dutch lexical database for 14,000 words	lexicon project, psycholinguistic database, reaction times (RT), response latency, word frequency	Dutch	Keuleers, E., Diependaele, K., & Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 Dutch mono-and disyllabic words and nonwords. Frontiers in Psychology, 1, 174.
http://crr.ugent.be/programs-data/lexicon-projects	British Lexicon Project (BLP)	British English lexical database for 28,000 words	lexicon project, psycholinguistic database, reaction times (RT), response latency, bigram probability, trigram probability, word frequency	English	Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287-304.
http://crr.ugent.be/programs-data/word-prevalence-values	Dutch Word Knowledge & Prevalence	Word prevalence values for 54,319 Dutch words from nearly 300,000 participants	word prevalence, word knowledge, lexical knowledge	Dutch	Keuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. The Quarterly Journal of Experimental Psychology, (ahead-of-print), 1-28.
http://semshifts.iling-ran.ru/	Database of Semantic Shifts in the Languages of the World	3,690 semantic connections in the world's languages (polysemy, semantic changes)	polysemy, semantic change, semantics	multilingual	Zalizniak, A. A., Bulakh, M., Ganenkov, D., Gruntov, I., Maisak, T., & Russo, M. (2012). The catalogue of semantic shifts as a database for lexical semantic typology. Linguistics, 50, 633-669.
http://w3.usf.edu/FreeAssociation/	USF Word Association Norms	Free word association data for 72,000 word pairs	word association, semantics	English	Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36, 402-407.
http://link.springer.com/article/10.3758/BRM.41.2.558	Modality Exclusivity Norms for Adjectives	How much 423 adjectives are associated with different modalities/senses (e.g., vision, hearing)	modality exclusivity, senses, semantics, adjectives	English	Lynott, D., & Connell, L. (2009). Modality exclusivity norms for 423 object properties. Behavior Research Methods, 41, 558-564.
http://www.psych.rl.ac.uk/	MRC Psycholinguistics Database	Lexical database for English	lexicon project, psycholinguistic database, reaction times (RT), response latency, word frequency, concreteness, familiarity, imageability, meaningfulness, part of speech (POS), lexical category, part of speech, number of phonemes, syllables, letters, stress-marked phonetic transcription	English	Coltheart, M. (1981). The MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology, 33A, 497-505.; Wilson, M.D. (1988). The MRC Psycholinguistic Database: Machine Readable Dictionary, Version 2. Behavioural Research Methods, Instruments and Computers, 20, 6-11.
http://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/overview.htm	SUBTLEX-US	Frequencies from 51 million word tokens	word frequency, contextual diversity	English	Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977-990.; Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior research methods, 44(4), 991-997.
http://www.natcorp.ox.ac.uk/	British National Corpus (BNC)	Corpus based with 100 million words	corpus	English
http://wals.info/	World Atlas of Language Structures (WALS)	Typological database	typology, grammatical database, syntax, morphology, phonology	multilingual	Dryer, Matthew S. & Haspelmath, Martin (eds.) (2013). The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology.
http://wold.clld.org/	The World Loanword Database (WOLD)	Loanword database with mini-dictionaries for 41 languages; words are coded for likelihood of being a loanword	loanwords, borrowing, language contact	multilingual	Haspelmath, M., & Tadmor, U. (Eds.). (2009). Loanwords in the world's languages: a comparative handbook. Walter de Gruyter.
https://sites.google.com/site/kenmcraelab/norms-data	Semantic Feature Norms	Semantic feature norms for 541 concepts from 725 participants	semantic features, semantics, feature norms, distinctive features, objects and events, properties	English	McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior research methods, 37(4), 547-559.
https://books.google.com/ngrams	Google Ngram	Large corpora of books with word frequencies and ngram frequencies from English, German, French, Italian, Spanish, Russian, Chinese and Hebrew, POS-tagged	ngrams, word frequency	multilingual	Michel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., ... & Aiden, E. L. (2011). Quantitative analysis of culture using millions of digitized books. science, 331(6014), 176-182.
http://wordnet.princeton.edu/	WordNet	Machine-readable dictionary with semantic relations for English	wordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchies	English	Miller GA. WordNet: a lexical database for English. Communications of the ACM. 1995;38:39-41.; Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press; 1998.
http://www.eat.rl.ac.uk/	Edinburgh Associative Thesaurus	English word association norms	word association, semantics	English	Kiss, G.R., Armstrong, C., Milroy, R., and Piper, J. (1973) An associative thesaurus of English and its computer analysis. In Aitken, A.J., Bailey, R.W. and Hamilton-Smith, N. (Eds.), The Computer and Literary Studies. Edinburgh: University Press.
http://www.illc.uva.nl/EuroWordNet/	Euro WordNet	Wordnets for several European languages	wordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchies	multilingual
http://corpus.byu.edu/coca/	Corpus of Contemporary American English (COCA)	440 million word corpus of contemporary American English	corpus	English
http://corpus.byu.edu/coha/	Corpus of Historical American English (COHA)	385 million word corpus of historical American English	corpus	English	Davies, M. (2011). The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English. Literary and Linguistic Computing 25: 447-65.
http://lingweb.eva.mpg.de/ids/	Intercontinental Dictionary Series (IDS)	Comparative lexical database	lexicon, dictionary, vocabulary	multilingual
http://phoible.org/	PHOIBLE: The world's largest database of phonological inventories	Cross-linguistic phoneme inventory data	typology, phonemes, phonetics, phonology phoneme inventory	multilingual	Moran, S., & McCloy, D., & Wright, R. (eds.) 2014. PHOIBLE Online. Leipzig: Max Planck Institute for Evolutionary Anthropology.
http://137.122.133.199/~Jeff/pbase/index.html	P-base	Database of several thousand sound patterns in 500+ languages	phonetics, phonology, typology	multilingual
http://137.122.133.199/~Jeff/phonetic_similarity/index.html	Phonetic Similarity database	Similarity ratings for 51 segments	phonetics, phonology	multilingual
http://www.autotyp.uzh.ch/	AUTOTYP	Typological database	language family, language area, genealogy, geography, genealogical data, typology, language history	multilingual	Bickel, B. & J. Nichols, 2002. Autotypologizing databases and their use in fieldwork. In Proc. Int. LREC Workshop on Resources and Tools in Field Linguistics. Las Palmas, 25-26 May 2002.
http://semanticweb.kaist.ac.kr/home/index.php/KAIST_Corpus	KAIST Korean Corpus	70 million Korean eojeol corpus	corpus	Korean
http://www.statmt.org/europarl/	Europarl Parallel Corpus (EPC)	Parallel text with up to 60 million words for 20 languags	parallel corpus, corpus, translation	multilingual
http://paralleltext.info/data/	Parallel Bible Corpus (PBC)	Parallel corpus of the bible with around 900 languages from around 80 different language families	parallel corpus, corpus, translation	multilingual
http://unicode.org/udhr/	Universal Declaration of Human Rights	Parallel corpus of the declaration of human rights for 400 languages	parallel corpus, corpus, translation	multilingual
http://wacky.sslmit.unibo.it/doku.php?id=corpora	Wacky Corpus	Syntactically annotated or POS-tagged corpora with up to 2 billion words for English, French, German and Italian, also includes Italian Wikipedia corpus	corpus, syntax, treebank, part of speech (POS), Wikipedia	English, French, German, Italian
http://asjp.clld.org/	Automated Similarity Judgment Project (ASJP)	Word lists of around 6000 languages	word list, typology, vocabulary	multilingual	Wichmann, Søren, André Müller, Annkathrin Wett, Viveka Velupillai, Julia Bischoffberger, Cecil H. Brown, Eric W. Holman, Sebastian Sauppe, Zarina Molochieva, Pamela Brown, Harald Hammarström, Oleg Belyaev, Johann-Mattis List, Dik Bakker, Dmitry Egorov, Matthias Urban, Robert Mailhammer, Agustina Carrizo, Matthew S. Dryer, Evgenia Korovina, David Beck, Helen Geyer, Patience Epps, Anthony Grant, and Pilar Valenzuela. 2013. The ASJP Database.
https://www.ethnologue.com/	Ethnologue	Comprehensive catalogue of the world's languages	catalogue, typology	multilingual
http://glottolog.org/	Glottolog	Comprehensive reference information for the world's languages	catalogue, typology	multilingual
http://opus.lingfil.uu.se/	Open parallel corpus	A collection of parallel corpora including 71 million sentences for about 30 languages	parallel corpus, corpus, translation	multilingual
http://archive.org/browse.php?field=subject&mediatype=texts&collection=rosettaproject	Rosetta Collection in The Internet Archive	Media files and documents about the languages from the world collected by the Rosetta foundation	reference grammar, word list	multilingual
http://data.worldbank.org/	World Bank Open Data	Demographic and geographical data on the world's countries	demographic data, country, migration, bilingualism, language use	non-linguistic
http://link.springer.com/article/10.3758/s13428-012-0267-0	Modality Exclusivity Norms for Nouns	Modality norms from for 400 English nouns	perceptual attributes, modality exclusivity, senses, semantics, vision, sight, hearing, touch, taste, smell	English	Lynott, D., & Connell, L. (2013). Modality exclusivity norms for 400 nouns: The relationship between perceptual experience and surface word form. Behavior research methods, 45(2), 516-526.
http://link.springer.com/article/10.3758/s13428-010-0038-8	Modality Exclusivity Norms for Adjectives	Modality norms from 400 American English participants for 387 adjectives	perceptual attributes, modality exclusivity, senses, semantics, vision, sight, hearing, touch, taste, smell	English	van Dantzig, S., Cowell, R. A., Zeelenberg, R., & Pecher, D. (2011). A sharp image or a sharp knife: Norms for the modality-exclusivity of 774 concept-property items. Behavior Research Methods, 43(1), 145-154.
http://link.springer.com/article/10.3758/s13428-012-0215-z	Perceptual and motor attribute ratings	Perceptual and motor attribute ratings for 559 concepts based on 376 American English participants	graspability, perceptual attributes, semantics	English	Amsel, B. D., Urbach, T. P., & Kutas, M. (2012). Perceptual and motor attribute ratings for 559 object concepts. Behavior research methods, 44(4), 1028-1041.
http://link.springer.com/article/10.3758/s13428-012-0242-9	Sensory experience ratings	Sensory experience ratings for 5857 English words based on 63 participants	perceptual attributes, semantics	English	Juhasz, B. J., & Yap, M. J. (2013). Sensory experience ratings for over 5,000 mono-and disyllabic words. Behavior research methods, 45(1), 160-168.
http://link.springer.com/article/10.3758/s13428-014-0488-5	Manipulability and naming norms for photographs	Manipulability ratings and naming RT norms for photographs	perceptual attributes, manipulability	non-linguistic
http://link.springer.com/article/10.3758/BRM.42.1.82	Manipulability, familiarity and AOA for photographs	Manipulability, familiarity and AOA for photographs	perceptual attributes, manipulability, age of acquisition (AOA), familiarity	non-linguistic	Salmon, J. P., McMullen, P. A., & Filliter, J. H. (2010). Norms for two types of manipulability (graspability and functional usage), familiarity, and age of acquisition for 320 photographs of objects. Behavior Research Methods, 42(1), 82-95.
http://www.tandfonline.com/doi/abs/10.1080/13825585.2010.540849	Spanish norms for photographs	140 color images that have been normed by 106 Spanish speakers on age of acquisition, familiarity, manipulability and other measures	age of acquisition (AOA), perceptual attributes, manipulability	Spanish	Moreno-Martínez, F. J., Montoro, P. R., & Laws, K. R. (2011). A set of high quality colour images with Spanish norms for seven relevant psycholinguistic variables: The Nombela naming test. Aging, Neuropsychology, and Cognition, 18(3), 293-327.
http://link.springer.com/article/10.3758/s13428-014-0466-y	French acronym norms	Psycholinguistic norms for French acronyms	acronyms, reading time (RT), age of acquisition ratings (AOA), subjective frequency, imageability	French	Bonin, P., Méot, A., Millotte, S., & Bugaiska, A. (2014). Norms and reading times for acronyms in French. Behavior research methods, 47(1), 251-267.
http://link.springer.com/article/10.3758/s13428-014-0454-2	Spanish AOA norms	Subjective age-of-acquisition norms for 7,039 Spanish words	age of acquisition (AOA)	Spanish	Alonso, M. A., Fernandez, A., & Díez, E. (2015). Subjective age-of-acquisition norms for 7,039 Spanish words. Behavior research methods, 47(1), 268-274.
http://link.springer.com/article/10.3758/s13428-014-0467-x	Persian emotional speech	Emotional speech from 470 sentences normed by 1,126 Persian native speakers	emotion, emotional speech	Persian	Keshtiari, N., Kuhlmann, M., Eslami, M., & Klann-Delius, G. (2015). Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD). Behavior research methods, 47(1), 275-294.
http://www.csl.psychol.cam.ac.uk/propertynorms/	CSLB Property Norms	Feature norms for 866 concrete concepts by 123 native speakers of British English	semantic features, semantics	English	Devereux, B. J., Tyler, L. K., Geertzen, J., & Randall, B. (2014). The Centre for Speech, Language and the Brain (CSLB) concept property norms. Behavior research methods, 46(4), 1119-1127.
http://link.springer.com/article/10.3758/s13428-013-0426-y	ANGST German affectiveness ratings	Valence, arousal, dominance and other ratings for 1,003 German words	emotion, valence, dominance, arousal, affect, positive, negative	German	Schmidtke, D. S., Schröder, T., Jacobs, A. M., & Conrad, M. (2014). ANGST: Affective norms for German sentiment terms, derived from the affective norms for English words. Behavior Research Methods, 46(4), 1108-1118.
http://link.springer.com/article/10.3758/s13428-013-0431-1	French affective norms	Affective norms for 1,031 French words by 469 French speakers	emotion, valence, dominance, arousal, affect, positive, negative	French	Monnier, C., & Syssau, A. (2014). Affective norms for French words (FAN). Behavior research methods, 46(4), 1128-1137.
http://link.springer.com/article/10.3758/s13428-013-0409-z	Gender stereotypicality norms	Gender stereotypicality norms for role nouns in 7 European languages	gender, stereotypes, stereotypicality, sociolinguistics	Czech, English, French, German, Italian, Norwegian, Slovak	Misersky, J., Gygax, P. M., Canal, P., Gabriel, U., Garnham, A., Braun, F., ... & Sczesny, S. (2014). Norms on the gender perception of role nouns in Czech, English, French, German, Italian, Norwegian, and Slovak. Behavior research methods, 46(3), 841-871.
http://link.springer.com/article/10.3758/s13428-013-0400-8	PhonItalia	Phonological representatons for 120,000 Italian word forms	phonological representation, phonology, transcription	Italian	Goslin, J., Galluzzi, C., & Romani, C. (2014). PhonItalia: a phonological lexicon for Italian. Behavior research methods, 46(3), 872-886.
http://link.springer.com/article/10.3758/s13428-013-0405-3	Italian affective norms	Affective norms for 1,121 Italian words	emotion, valence, dominance, arousal, affect, positive, negative	Italian	Montefinese, M., Ambrosini, E., Fairfield, B., & Mammarella, N. (2014). The adaptation of the affective norms for english words (ANEW) for Italian. Behavior research methods, 46(3), 887-903.
http://link.springer.com/article/10.3758/s13428-013-0370-x	Subjetive ASL frequency	Subjective frequency ratings for 432 ASL signs from 59 native deaf signers	subjective frequency, familiarity, American Sign Language (ASL)	ASL	Mayberry, R. I., Hall, M. L., & Zvaigzne, M. (2014). Subjective frequency ratings for 432 ASL signs._Behavior research methods,_46(2), 526-539.
http://link.springer.com/article/10.3758/s13428-013-0389-z	Japanese-English similarity for translation equivalents	193 Japanese-English word pairs are rated for phonological and semantic similarity	phonological similarity, phonetics, phonology, semantics, semantic similiarity, translation, translation equivalent	Japanese;English	Allen, D., & Conklin, K. (2014). Cross-linguistic similarity norms for Japanese–English translation equivalents. Behavior research methods, 46(2), 540-563.
http://link.springer.com/article/10.3758/s13428-013-0388-0	Portuguese Free Association norms	Free association norms for 139 Portuguse words from children of various ages	children, language acquisition, word association, free association	Portuguese	Comesaña, M., Fraga, I., Moreira, A. J., Frade, C. S., & Soares, A. P. (2014). Free associate norms for 139 European Portuguese words for children from different age groups. Behavior research methods, 46(2), 564-574.
http://link.springer.com/article/10.3758/s13428-013-0376-4	Turkish image norms	Turkish AOA, familiarity and other norms for 260 pictures from 277 native Turkish speakers	familiarity, age of acquisition (AOA), word frequency	Turkish	Raman, I., Raman, E., & Mertan, B. (2014). A standardized set of 260 pictures for Turkish: Norms of name and image agreement, age of acquisition, visual complexity, and conceptual familiarity. Behavior research methods, 46(2), 588-595.
http://link.springer.com/article/10.3758/s13428-013-0355-9	Chinese Lexicon project	Reaction times for 2500 single characters and associated lexical norms (frequency, contextual diversity etc.)	contextual diversity, word frequency, lexical decision task, reaction times (RT), response latency	Chinese	Sze, W. P., Liow, S. J. R., & Yap, M. J. (2014). The Chinese Lexicon Project: A repository of lexical decision behavioral responses for 2,500 Chinese characters. Behavior research methods, 46(1), 263-273.
http://link.springer.com/article/10.3758/s13428-013-0358-6	Dutch action norms	Dutch AOA, word frequency and other norms for 124 line drawings	age of acquisition (AOA), perceptual attributes, action	Dutch	Shao, Z., Roelofs, A., & Meyer, A. S. (2014). Predicting naming latencies for action pictures: Dutch norms. Behavior research methods, 46(1), 274-283.
http://link.springer.com/article/10.3758/s13428-014-0488-5	French object norms	Manipulability, graspability and pantomimability norms by French speakers for 560 photographs	iconicity, manipulability, movability, perceptual attributes, graspability, semantics	non-linguistic	Guérard, K., Lagacé, S., & Brodeur, M. B. (2014). Four types of manipulability ratings and naming latencies for a set of 560 photographs of objects. Behavior research methods, 47(2), 443-470.
http://language.psy.auckland.ac.nz/austronesian/	Austronesian Basic Vocabulary Database	210 vocabulary items in almost 1000 Austronesian languages	basic vocabulary, dictionary, Austronesian	Austronesian	Greenhill, S. J., Blust, R., & Gray, R. D. (2008). The Austronesian basic vocabulary database: from bioinformatics to lexomics. Evolutionary bioinformatics online, 4, 271-283.
http://language.psy.auckland.ac.nz/bantu/	Bantu Basic Vocabulary Database	430 vocabulary items from 10 Bantu languages	basic vocabulary, dictionary, Bantu	Bantu
http://ielex.mpi.nl/	Indo-European Lexical Cognacy Database	207 vocabulary items in 150 Indo-European languages	basic vocabulary, dictionary, Indo-European, cognates	Indo-European
http://multitree.org/	MultiTree: A digital library of language relationships	Resource for language relatedness and genealogy; contains trees for many language families	language family, genealogy, linguistic history, reconstruction, protolanguage	multilingual
https://sites.google.com/site/referencelexicon/	RefLex - Reference Lexicon	Around 60,000 lexical entries for around 500 African languages with phonotactic and cognacy coding	Africa, cognacy, basic vocabulary, dictionary, language history	multilingual
http://phonotactics.anu.edu.au/	ANU World Phonotactics Database	Phonotactic data for over 2000 languages and segmental data for around 4700 languages	typology, phonemes, phoneme inventory, phonology, phonotactics, segments	multilingual	Donohue, M., Hetherington, R., McElvenny, J., & Dawson, V. (2013). World phonotactics database. Department of Linguistics, The Australian National University.
http://www.worldvaluessurvey.org/wvs.jsp	World Value Survey	Data on socioeconomic and demographic variables, including language background, for over 85,000 respondents in 57 countries	language use, bilingualism, demographic data	multilingual	World Values Survey Association (2009). World Values Survey 1981-2008 Official Aggregate v. 20090901. Madrid: ASEP/JDS.
http://www.mirjamernestus.nl/Ernestus/NCCFr/	Nijmegen Corpus of Casual French	35 hours of orthographically annoted high-quality recordings with 46 French speakers conversing among friends.	phonetics, video, speech, annotated corpus	French	Torreira, F., Adda-Decker, M., & Ernestus, M. (2010). The Nijmegen Corpus of Casual French. Speech Communication, 52, 201-221.
http://www.cstr.ed.ac.uk/projects/unisyn/	Unisyn Lexicon	Multi-accent dictionary of English	English dialects, lexicon, accents	English	Fitt, S. (2002). Unisyn lexicon release. The Center for Speech Technology Research, University of Edinburgh.
http://homepages.inf.ed.ac.uk/korin/sitenew/Research/Combilex/index.html	Combilex speech technology lexicon	Multi-accent dictionary of English	English dialects, lexicon, accents	English	Fitt, S., & Richmond, K., & Clark, R. Combilex.
http://www.mngu0.org/	mngu0 dataset	Articulatory EMA, MRI, video, audio and 3D scan data frome one British male speaker	articulation, articulatory data, MRI, EMA, video, phonetics, speech production	English	Richmond, K. (2011). Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus. In Proc. Interspeech, pages 1505-1508, Florence, Italy, August 2011.; Steiner, I., Richmond, K., Marshall, I., & Gray, C. D. (2012). The magnetic resonance imaging subset of the mngu0 articulatory corpus. Journal of the Acoustical Society of America, 131(2), 106-111.
http://link.springer.com/article/10.3758/BF03200831	Touch-related adjective norms	306 words that are categorized for various haptic properties such as roughness and weight	perceptual attributes, semantics, feeling	English	Stadtlander, L. M., & Murdoch, L. D. (2000). Frequency of occurrence and rankings for touch-related adjectives. Behavior Research Methods, Instruments, & Computers, 32(4), 579-587.
http://wordbank.stanford.edu/	MacArthur-Bates Communicative Devleopment Inventories (MCDI)	Database of children's early vocabulary development and gestures	developmental, language acquisition, early vocabulary, age of acquisition (AOA), gesture, multimodal	English, Danish, Norwegian, Turkish, Spanish, Russian, Mandarin, Swedish, German, Cantonese, Italian, Croatian, Hebrew	Jørgensen, R. N., Dale, P. S., Bleses, D., & Fenson, L. (2010). CLEX: A cross-linguistic lexical norms database. Journal of child language, 37(02), 419-428.
http://www.iphod.com/	Irvine Phonotactic Online Dictionary (IPhOD)	Collection of English words and pseudowords with respect to number of phonological variables	phonotactics, biphoneme probability, bigram probability, triphoneme probability, trigram probability, segments, phonemes, syllables	English	Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0.
http://st2.ullet.net/	StressTyp2	Typological database with stress and accent patterns 750 languages	typology, stress, accent	multilingual
http://phonology.cogsci.udel.edu/dbs/stress/	UD Phonology Lab Stress Pattern Database	Dominant stress patterns of the world's languages	typology, stress, accent	multilingual
http://languagelink.let.uu.nl/anatyp/	Anaphora Typology Database	Anaphora database with example sentences	typology, anaphora, syntax	multilingual	Dimitriadis, A., Everaert, M., Reinhart, T., & Reuland, E. (2005). Anaphora Typology Database.
http://languagelink.let.uu.nl/fpps/	Free Personal Pronoun System database	Personal pronoun system's of the worlds languages	typology, syntax, morphology, personal pronoun, morphosyntax	multilingual
http://reduplication.uni-graz.at/	Graz Database on Reduplication	Database that contains reduplication patterns of the world's languages	typology, reduplication, syntax, morphology, morphosyntax	multilingual	Hurch, B. (2005-). Graz Database on Reduplication. http://reduplication.uni-graz.at/
http://www.personal.uni-jena.de/~mu65qev/tdir/	Typological Database of Intensifiers and Reflexives	Database that contains intensifiers and reflexive patterns of the world's languages	typology, intensifier, reflexives, syntax	multilingual	Gast, V., D. Hole, E. König, P. Siemund, S. Töpper (2007). Typological Database of Intensifiers and Reflexives. Version 2.0. http://www.tdir.org.
http://web.phonetik.uni-frankfurt.de/upsid.html	UCLA Phonological Segment Inventory Data (UPSID)	Contains phonological inventories for 451 languages	phonology, phoneme inventory, typology	multilingual	Maddieson, I. (1984). Patterns of sounds. Cambridge studies in speech science and communication. Cambridge: Cambridge University Press.
http://apics-online.info/	Atlas of Pidgin and Creole Language Structures (APiCS)	Grammatical and lexical structures of 75 pidgin and creole languages	phonology, lexicon, negation, syntax, morphology, morphosyntax, typology, pidgin & creole languages, word order	multilingual	Michaelis, S. M.,Maurer, P.,Haspelmath, M., & & Huber, M. (eds.) (2013). Atlas of Pidgin and Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology.
http://valpal.info/	Valency Patterns Leipzig Online Database (ValPal)	Valency patterns of 36 languages	typology, syntax	multilingual	Hartmann, I., Haspelmath, M., & Taylor, B. (Eds.) (2013). Valency Patterns Leipzig. Leipzig: Max Planck Institute for Evolutionary Anthropology.
http://ewave-atlas.org/	Electronic World Atlas of Varieties of English	Over 235 linguistic features mapped for 50 varieties of English	English dialects, phonology, lexicon, morphology, syntax, discourse, word order, tense, aspect	multilingual	Kortmann, B., & Lunkenheimer, K. (Eds.) (2013). The Electronic World Atlas of Varieties of English. Leipzig: Max Planck Institute for Evolutionary Anthropology.
http://lingweb.eva.mpg.de/numeral/	Numeral System's of the World's languages	Data on numeral systems for about 4000 languages of the world	typology, numeral systems	multilingual
http://afbo.info/	A world-wide survey of affix borrowing	A database of 101 languages where affixes have been borrowed (total of 657 affixed)	typology, affixes, morphology, morphosyntax	multilingual	Seifart, F. (2013). AfBO: A world-wide survey of affix borrowing. Leipzig: Max Planck Institute for Evolutionary Anthropology.
http://sails.clld.org/	South American Indigenous Language Structures (SAILS)	A database of 604 linguistic features from 167 American Indigenous languages	typology, syntax, morphology, morphosyntax, phonology, tense, aspect, evidentiality, word order, agreement	multilingual	Muysken, Pieter, Harald Hammarström, Olga Krasnoukhova, Neele Müller, Joshua Birchall, Simon van de Kerke, Loretta O'Connor, Swintha Danielsen, Rik van Gijn & George Saad. 2014. South American Indigenous Language Structures (SAILS) Online. Leipzig: Online Publication of the Max Planck Institute for Evolutionary Anthropology. (Available at http://sails.clld.org)
http://www.homophone.com/	Homophone.com	Informal list of English homophones	homophones, lexicon	multilingual
http://intelligencesquaredus.org/	Intelligence Squared	Political debates with transcripts and votes by audience members	politics, debate, argument, corpus	English
http://link.springer.com/article/10.3758/s13428-014-0552-1	Nencki Affective Word List (NAWL) for Polish	Emotional valence, arousal and imageability ratings for 2,902 Polish words	emotion, valence, arousal, positive, negative, imageability, word frequency, part of speech (POS), word length	Polish	Riegel, M., Wierzba, M., Wypych, M., Żurawski, Ł., Jednoróg, K., Grabowska, A., & Marchewka, A. (2015). Nencki Affective Word List (NAWL): the cultural adaptation of the Berlin Affective Word List–Reloaded (BAWL-R) for Polish. Behavior research methods, 1-15.
http://link.springer.com/article/10.3758/s13428-011-0059-y	Discrete emotion norms for German (DENN-BAWL)	Discrete emotion ratings for for about 2000 German nouns	emotion, valence, arousal, positive, negative	German	Briesemeister, B. B., Kuchinke, L., & Jacobs, A. M. (2011). Discrete emotion norms for nouns: Berlin affective word list (DENN–BAWL). Behavior research methods, 43(2), 441-448.
http://www.lehoa.macmate.me/MelissaVo/BAWL-R.html	Berlin Affective Word List Reloaded (BAWL-R)	Emotional arousal and valence ratings for about 2900 German nouns	emotion, valence, arousal, positive, negative	German	Võ, M. L., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M. J., & Jacobs, A. M. (2009). The Berlin affective word list reloaded (BAWL-R). Behavior research methods, 41(2), 534-538.
http://talkbank.org/SLA/	Second Language Acquisition Resources	Contains transcribed corpora with audio from several languages relevant for second language acquisition research	second language acquisition (SLA), L2, bilingualism	multilingual
http://childes.psy.cmu.edu/phon/	PhonBank Database for Phonological Development	Contains corpora and phonological information on child language development	language development, language acquisition, phonetics, phonology, clinical corpora	English, French, Portuguese, German, Swedish, Dutch, Indonesian, Japanese, Taiwanese, Cantonese, Greek, Arabic, Berber, Romanian, Polish
http://childes.psy.cmu.edu/	Child Language Data Exchange System (CHILDES)	Transcribed and annotated child language corpora for several languages	language development, language acquisition, annotated corpus, corpora, clinical corpora	Celtic, Irish, Welsh, Cantonese, Chinese, Indonesian, Japanese, Korean, Taiwanese, Thai, English, Afrikaans, Dutch, Danish, German, Icelandic, Norwegian, Swedish, Catalan, Spanish, French, Italian, Portuguese, Romanian, Croatian, Polish, Russian, Serbian, Slovenian
http://childfreq.sumsar.net/	ChildFreq: CHILDES frequency tool	Online access tool to CHILDES word frequency data	language development, language acquisition, word frequency	English	Baath, R. (2014). ChildFreq: An online tool to explore word frequencies in child language.
http://www.opensubtitles.org/en/search	Open Subtitles	Over three million subtitle files for data from several languages	subtitles, corpus, parallel corpus	multilingual
http://corpus.quran.com/	Quranic Arabic Corpus	Morphological annotation, syntactic treebank and semantic ontology for the entire Holy Quran	Quran, corpus, treebank, morphological annotation, syntax, semantics, ontology	Arabic
http://www.alc.manchester.ac.uk/subjects/lel/research/projects/archer/	ARCHER: A Representative Corpus of Historical English Registers	A multi-genre English corpus ranging from 1600 to 1999	language history, historical corpus, registers	English
https://perswww.kuleuven.be/~u0044428/clmet3_0.htm	CLEMET: Corpus of Late Modern English Texts	34 million words of running text from 1710 to 1920	language history, historical corpus, registers	English	Diller, H., De Smet, H., Tyrkkö, J. (2011). A European database of descriptors of English electronic texts. The European English Messenger 19, 21-35.
http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/	Helsinki Corpus of English Texts	Contains 1.5 million words from English texts ranging from 730 AD to 1710 AD	Old English, Middle English, Early Modern English, language history, historical corpus	English
http://www.helsinki.fi/varieng/CoRD/corpora/HCOS/index.html	Helsinki Corpus of Older Scots (HCOS)	Contains 0.8 million words of Scottish English from 1450 AD to 1700 AD	Scottish, language history, historical corpus	English	The Helsinki Corpus of Older Scots (1995). Department of Modern Languages, University of Helsinki. Compiled by Anneli Meurman-Solin.
http://www-users.york.ac.uk/~sp20/corpus.html	Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English	Syntactically annotated and POS-tagged corpus of Old English	Old English, language history, historical corpus, syntax	English
http://www-users.york.ac.uk/~lang22/YcoeHome1.htm	York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE)	A corpus of 1.5 million words of Old English texts, syntactically annotated and POS-tagged	Old English, prose, language history, historical corpus, syntax	English
http://www-users.york.ac.uk/~lang18/pcorpus.html	York-Helsinki Parsed Corpus of Old English Poetry	Corpus of Old English poetry, syntactically annotated and POS-tagged	Old English, poetry, language history, historical corpus, syntax	English
http://www.ling.upenn.edu/hist-corpora/	Penn Corpora of Historical English (PPCME2, PPCEME, PPCMBE)	Middle English, Early Modern English and Modern English corpora, syntactically annotated and POS-tagged	Middle English, Early Modern English, language history, historical corpus, syntax	English
http://ota.ox.ac.uk/	The University of Oxford Text Archive (OTA)	Text archives (with some audio and video data) for lots of English texts from many different time periods	historical corpus, text archive, corpora, language history	English
http://www.helsinki.fi/varieng/domains/CEEC.html	Corpus of Early English Correspondence (CEEC)	Compiled with historical sociolinguistics in mind, a more than 6 million word corpus of English correspondences (1410-1800) from thousands of writers	historical corpus, letters, language history, corrspondence, sociolinguistics	English
https://www.tu-chemnitz.de/phil/english/sections/linguist/real/independent/lampeter/lamphome.htm	The Lampeter Corpus of Early Modern English Texts	English texts from 1640 to 1740 within the categories religion, politics, economy, science, law and miscellaneous	historical corpus, language history, Early Modern English	English
http://www.comp.leeds.ac.uk/eric/latifa/research.htm	Corpus of Contemporary Arabic (CCA)	A corpus of 0.8 million Arabic words	corpus	Arabic
http://www.thelatinlibrary.com/	The Latin Library	A collection of Latin texts from several authors	corpus	Latin
http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html	Norwegian Web as Corpus (NoWaC)	A web-based corpus of 700 million Norwegian words	corpus	Norwegian	Guevara, Emiliano Raul (2010). NoWaC: a large web-based corpus for Norwegian. In Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop, Association for Computational Linguistics, 1 - 7.
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.en.html	TIGER Corpus	German news corpus of 0.9 tokens from the Frankfurter Rundschau	corpus, news, media	German	Brants, S., Dipper, S., Eisenberg, P., Hansen, S., König, E., Lezius, W., Rohrer, C., Smith, G., & Uszkoreit, H. (2004). TIGER: Linguistic Interpretation of a German Corpus. Journal of Language and Computation, 2004 (2), 597-620.
http://www.moderna.uu.se/slaviska/ryska/corpus/	Uppsala Russian Corpus	Russian corpus of different genres with transliterations	corpus	Russian
http://www.arabiclearnercorpus.com/	Arabic Larner Corpus (ALC)	0.2 million Arabic words produced from 942 students from 66 different L1 backgrounds	learner corpus, second language acquisition (SLA), bilingualism	Arabic
http://www.unicaen.fr/gazette/	La Gazette de Renaudot	Historical corpus of French gazettes/newspapers	historical corpus, language history	French
http://www.engelska.uu.se/Research/English_Language/Research_Areas/Electronic_Resource_Projects/USE-Corpus/?languageId=1	Uppsala Student English Corpus (USE)	Corpus of 1500 essays written by 440 Swedish university students	learner corpus, second language acquisition (SLA), bilingualism	Swedish
http://www.engelska.uu.se/Research/English_Language/Research_Areas/Electronic_Resource_Projects/A_Corpus_of_English_Dialogues/	Corpus of English Dialogues (CED)	Corpus of 1.1 million words of English dialogues (spoken interactions) from 1560-1760	historical corpus, dialogue, language history, corpora, Early Modern English	English
http://www.gutenberg.org/	Project Gutenberg	A collection of 50000 free ebooks	book collection, text archive	English
http://www.nytimes.com/ref/membercenter/nytarchive.html	New York Times Article Archive	A collection of all New York Times articles starting with 1851 to present	text archive, news, media, newspaper	English
http://link.springer.com/article/10.3758/BF03195349	Bird Age of Acquisition and Imageability ratings	Imageability and age of acquisition norms for a set of 2645 English words	age of acquisition (AOA), imageability	English
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html	Opinion Lexicon	A list of 6800 positive and negative English opinion words	opinion mining, sentiment analysis, emotional valence, positive, negative	English
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html	Amazon Product Review Data	More than 5.8 million reviews of Amazon products	corpus, media, Amazon, reviews	English
http://www.yelp.com/dataset_challenge	Yelp Challenge Review Data	About 1.6 million reviews from 360000 Yelp users	corpus, media, Yelp, reviews	English
http://sentiwordnet.isti.cnr.it/	SentiWordNet	A lexical resource for opinion mining	emotional valence, affect, opinion mining, sentiment analysis, wordnet, positive, negative	English
http://dialect.topography.chass.utoronto.ca/dt_atlas.php	Atlas of Dialect Topograhy	Cross-regional dialect topography, largely focused on Canada	sociolinguistics, dialects, Canada, Canadian English, regional variants	English
http://austlang.aiatsis.gov.au/disclaimer.php	AUSTLANG: Australian Indigeneous Languages Database	Classification and language information on Australian languages, including maps	Australian languages, classification, geography, map, speaker information, language use	multilingual
http://www.baydat.uni-wuerzburg.de:8080/cocoon/baydat/baydat	BayDat: Bayrische Dialektdatenbank	Database of Bavarian German dialects	dialects, Bavaria, Germany, sociolinguistics, map	German
http://lacito.vjf.cnrs.fr/pangloss/	La Collection Pangloss	Database of audio materials from several of the world's languages	audio recordings, typology, world's languages	multilingual
http://corpus.byu.edu/time/	Time Magazine Corpus	100 million word corpus of TIME magazine	corpus, media, news, magazine	English
http://corpus.byu.edu/wiki/	Wikipedia corpus	1.9 billion word corpus from Wikipedia (4.4 million articles)	corpus, Wikipedia	English
http://corpus.byu.edu/glowbe/	Corpus of Global Web-Based English (GloWbE)	1.9 billion word corpus from 1.8 million web pages	corpus, web language, web corpus	English
http://corpus.byu.edu/can/	Corpus of Canadian English (STRATHY)	50 million word corpus of Canadian English ranging from 1920 to 2000	corpus, Canada, historical corpus, language history	English
http://www.corpusdelespanol.org/	CORPUS DEL ESPAÑOL	100 million word corpus from 20000 Spanish texts spanning a time range from 1200 to the 1900s	corpus, language history, historical corpora	Spanish
http://www.corpusdoportugues.org/	O CORPUS DO PORTUGUES	45 million word corpus of Portuguese spanning from 1300 to 1900	corpus, language history, historical corpora	Portuguese
http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php	Digital Corpus of Sanskrit (DCS)	3.2 million words of Sanskrit with collocates	corpus, language history, historical corpora	Sanskrit
http://xtone.linguistics.berkeley.edu/about.php	Cross-Linguistic Tonal Database (Xtone)	Information on lexical tone from 82 different languages	typology, phonology, lexical tone, tonal systems	multilingual
http://tls.uni-hd.de/home_en.lasso	Thesaurus Linguae Sericae	A historical and comparative encyclopedia of Chinese conceptual schemes, with corpora and semantic relations	language history, historical corpus, semantic relations, encyclopedia, historical phonology	Chinese
http://sealang.net/assam/	Tai and Tibeto-Burman language corpora	Transcribed and translated texts of Tibeto-Burman languages	Tibeto-Burman, corpus, corpora, endangered languages	Ahom, Aiton, Khamti, Khamyang, Singpho, Turung, Tangsa
http://turing.iis.sinica.edu.tw/treesearch/	Sinica Treebank	Parsed corpus of 360000 Chinese words	treebank, corpus, syntax	Chinese
http://romani.humanities.manchester.ac.uk/rms/	Romani Morpho-Syntax Database (RMS)	Database of linguistic features of Romani	grammatical database, syntax, morphology	Romani
http://www.gaois.ie/crp/en/	Parallel English-Irish corpus of legal texts	4.5 English words of legal texts with Irish translations	parallel corpus, corpus, translation, law, legal texts	English, Irish
http://www.uni-stuttgart.de/lingrom/stein/corpus/	Le Nouveau Corpus d'Amsterdam	Old French literary texts between 11th and 14th century	language history, historical corpus, Old French	French
http://www.meertens.knaw.nl/nfb/	Nederlandse Familienamenbank	300000 Dutch names and their locations in the Netherlands	onomastics, names, geography	Dutch
http://www.livac.org/	LIVAC Synchronous Corpus	550 million word Chinese corpus	corpus	Chinese
http://www.cfilt.iitb.ac.in/indowordnet/	IndoWordNet	A wordnet of the languages of India	wordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchies	Hindi, Assamese, Bengali, Bodo, Gujarati, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Odiya, Punjabi, Sanskrit, Tamil, Telugu, Urdu
http://www.cfilt.iitb.ac.in/wordnet/webhwn/	Hindi WordNet	A wordnet of Hindi	wordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchies	Hindi
http://hebrewcorpus.nmelrc.org/	HebrewCorpus	A 150 million word corpus of Hebrew	corpus	Hebrew
http://www.sfs.uni-tuebingen.de/GermaNet/	GermaNet	A wordnet of German	wordnet, lexical database, dictionary, vocabulary, semantic relations, semantics, semantic hierarchies	German
http://argyf.fryske-akademy.eu/files/tdb/	Frisian Languages Database	Frisian database containing audio and written corpora, including historical ones	historical corpus, spoken corpus, audio, language history, Germanic languages	Frisian
http://eap.bl.uk/	Endangered Archives Programme	A text archive of endangered languages	text archive, endangered languages	multilingual
http://www.panlex.org/	PanLex	Vocabulary and translations for 21 million expressions in about 10,000 language varieties, including Swadesh lists for about 2000 language varieties	vocabulary, word list, dictionary, Swadesh list	multilingual
http://www.lmp.ucla.edu/	UCLA Language Materials Project	Contains teaching and learning materials for over 150 less commonly taught languages, including speaker and other information about the languages	speaker information, bilingualism, language use, demographic data	multilingual
http://buckeyecorpus.osu.edu/	The Buckeye Speech Corpus	Corpus of high-quality recordings from 40 speakers in Columbus, Ohio, orthographically transcribed and phonetically labelled	audio corpus, annotated corpus, phonetics, phonology, speech	English
http://groups.inf.ed.ac.uk/switchboard/index.html	The Switchboard Corpus in NXT	Updated annotations of the Switchboard corpus of telephone conversations, annotated	annotated corpus, prosody, syntax, speech, conversational speech, telephone conversation	English
https://catalog.ldc.upenn.edu/LDC96L14	CELEX2 Corpus	Corpus of English, Dutch and German with additional lexical information	corpus, word frequency	English, Dutch, German
https://catalog.ldc.upenn.edu/LDC93S1	TIMIT Acoustic-Phonetic Continuous Speech Corpus	Audio corpus of 630 speakers of eight American English dialects with time-aligned orthographic, phonetic, and word transcriptions	annotated corpus, speech, phonetics, phonology, audio corpus, English dialects	English
http://demeter.inf.ed.ac.uk/cross/publications.html	Twitter FSD First Story Detection Corpus	Corpus of \first stories\ (new events) from twitter	corpus, web language, social media, web corpus	English
http://clic.cimec.unitn.it/amac/twitter_ngram/	Rovereto Twitter N-Gram Corpus	N-grams (up to 6-grams!) for 75 million English tweets	corpus, ngrams, word frequency, web language, social media	English
http://trec.nist.gov/data/tweets/	Tweets2011 corpus	A corpus of tweets collected January and February 2011	corpus, web language	English
http://demeter.inf.ed.ac.uk/cross/publications.html	Newswire FSD First Story Detection Corpus	A corpus of \first stories\ (new events) from newswire	corpus, web language, newspaper, media, political language	English
http://dev.sslmit.unibo.it/corpora/corpus.php?path=&name=Repubblica	La Repubblica Corpus	A corpus of 380 million tokens of Italian newspaper texts, POS-tagged, lemmatized and genre categorized	corpus, genre, topic, syntax, part of speech (POS), newspaper, media, political language	Italian
http://www.nzilbb.canterbury.ac.nz/onze.shtml	Origins of New Zealand English (ONZE) Corpus	A corpus of various stages of New Zealand English	audio corpus, phonetics, phonology, language history, historical corpus	English
http://laslab.org/resources/confusions/	Corpus of noise-induced Spanish misperceptions/confusions	A corpus of 3235 noise-induced robust misperceptions in Spanish	corpus, phonetics, phonology, speech perception	Spanish
http://www.cs.cmu.edu/~mfaruqui/suite.html	WordSim353 evaluation benchmarks	Human similarity ratings for over 3000 word pairs, including syntactic relations	semantics, similarity, semantic relatedness	English, German, French, Arabic, Romanian, Spanish
https://github.com/mfaruqui/non-distributional	Non-distributional English word vectors	Large lexicon with thesaurus, antonyms, color, connotations and valence information extracted through NLP procedures	semantics, lexicon, sentiment analysis, affect, emotional valence, antonyms	English
https://console.developers.google.com/storage/browser/wikipedia_multilingual_relations_v1/	Semantic Relations from Wikipedia	A dataset of automatically extracted semantic relations from the multilingual Wikipedia corpus	semantics, semantic relations	French, Russian, Chinese, Arabic, Hindi, Indonesian, Tagalog, Latvian, Swahili, Georgian
http://www.nlpado.de/~sebastian/data/tv_data.shtml	Bilingual Formal/Informal Address Corpus	Corpus of English and German sentences from novels tagged for formal and informal connotations, tokenized, lemmatized, POS-tagged	annotated corpus, politeness, formal language	German, English
http://www.coli.uni-saarland.de/projects/salsa/corpus/	German SALSA Corpus	A large frame-based lexicon for German with semantic roles	semantic roles, frames, framenet	German
http://www.nlpado.de/~sebastian/data/srl_data.shtml	Cross-lingual projection of semantic roles	Parallel corpora annotated for semantic roles	parallel corpus, corpus, translation, semantic roles	German, English
https://framenet.icsi.berkeley.edu/fndrupal/	FrameNet	A lexical database of English that specifies semantic frames and semantic roles, more than 10000 senses	framenet, lexical database, dictionary, vocabulary, semantic relations	English
http://u.cs.biu.ac.il/~nlp/resources/downloads/annotation-of-discourse-references-relevant-for-entailment-inference/	Discourse Reference Corpus	Pragmatically annotated corpus with information about coreference and bridging	reference, discourse, pragmatics, annotated corpus, entailment inference, coreference	English
http://clic.cimec.unitn.it/dm/	Distributional Memory semantic database	Semantic database of English based on distributional information	lexicon, semantic relatedness, relations, corpus-based semantics, co-occurrence	English
http://www.cl.uni-heidelberg.de/~zeller/res/te-ger/index.mhtml	Textual Entailment Search Task Dataset for German	A corpus of 3000 text/hypothesis pairs derived from web forum posts	textual entailment, semantic inference, pragmatics, corpus, web language	English
http://www.ims.uni-stuttgart.de/permalink/56cc6c89-c421-11e4-a5e6-000e0c3db68b.html	DErivBase German Derivational Lexicon	A derivational lexicon for German	morphology, lexicon, dictionary, lemma	German
http://takelab.fer.hr/data/dmhr/	Distributional Memory for Croatian	Semantic database of Croatian based on distributional information	lexicon, semantic relatedness, relations, corpus-based semantics	Croatian
http://www.cl.cam.ac.uk/~fh295/simlex.html	SimLex999 Semantic Relatedness Dataset	A dataset of dataset of normed semantic similarity (rather than just word associations)	semantics, semantic similarity, semantic relatedness, relations, concreteness, word association	English
http://www.kuleuven.be/semlab/interface/index.php	Dutch Word Associations	A dataset of word associations in Dutch	semantics, word association	Dutch
http://www.nltk.org/nltk_data/	NLTK Corpora	Variety of corpora and datasets built into the NLTK python library	natural language processing, python, brown, australian broadcasting, alpino dutch treebank, treebank, CONLL, Europarl, Genesis, bible, gazeteer, C-Span, Gutenberg, KNB corpus, sentiment, NPS chat, opinion lexicon, multilingual wordnet, penn treebank, sentiwordnet	English, Portuguese, Spanish, Basque, Old English, Mandarin Chinese, Polish, Brazilian Portuguese
http://www.cstr.ed.ac.uk/research/projects/artic/accor.html	EUR-ACCOR	Cross-language recordings with EPG, laryngograph, nasal airflow, and audio	articulatory phonetics, articulation, speech production, Rhotenberg mask	Catalan, English, French, German, Irish Gaelic, Italian, Swedish
http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html	MOCHA-TIMIT	Dataset with audio, laryngograph and EMA recordings for English, constructed with the intention of training an automatic speech recognition system	articulation, articulatory phonetics, speech production, electromagnetic articulography, tongue recording	English
http://www.u.arizona.edu/~nwarner/WarnerMcQueenCutler.html	English Diphone Perceptual Database	Phoneme categorizations based on a gated listening task	speech perception, phonetics, phonology, psycholinguistics, phonetic information over time	English
http://www.mpi.nl/world/dcsp/diphones/	Dutch Diphone Perceptual Database	A total of 488,520 phoneme categorizations based on a gated listening task of 1,179 Dutch diphones	speech perception, phonetics, phonology, psycholinguistics, phonetic information over time	Dutch
http://www.linguateca.pt/acesso/corpus.php?corpus=SAOCARLOS	NILC / San Carlos Corpora	Collection of corpora of contemporary Portuguese, with part of speech tags (POS-tagged)	corpus, annotated corpus	Portuguese
http://www.clul.ul.pt/pt/recursos/183-reference-corpus-of-contemporary-portuguese-crpc	CRPC Comparative Portuguese corpus	Large corpus containing texts from several varieties of Portuguese (European, Brazil, Angola, Cape Verde, Guinea-Bissau, Mozambique, Sao Tome and Principe, Goa, Macau, Timor-Leste)	corpus, dialectal corpus, sociolinguistics	Portuguese
http://cipm.fcsh.unl.pt/	CIPM Medieval Portuguese corpus	Historical corpus of medieval Portuguese	historical corpus, language history, classical & medieval Portuguese	Portuguese
http://www.letras.ufrj.br/nurc-rj/	NURC-RJ Spoken Portuguese Corpus	Spoken corpus of Brazilian Portuguese	spoken corpus, audio recordings, phonetics, phonology	Portuguese
http://www.letras.ufrj.br/laborhistorico/	LaborHistorico Historical Portuguese corpus	Official historical corpus of the \A history of Brazilian Portuguese\ project	historical corpus, language history	Portuguese
https://sites.google.com/site/distributedlittleredhen/home	Distributed Little Red Hen Lab Databases	Resource directory for the UCLA NewsScape Library of International Television News; a TV News Archive that contains news programs	semantics, gesture, phonetics, corpus, television (TV), media, politics, news, multimodal corpus	English
http://spokenchinesecorpus.nccu.edu.tw/	NCCU Corpus of Spoken Chinese	Spoken corpus of Chinese	spoken corpus, audio recordings, phonetics, phonology	Mandarin Chinese
http://andosl.anu.edu.au/andosl/	Australian National Database of Spoken Language (ANDOSL)	Phonetically annotated spoken language corpus of Australian English	spoken corpus, audio recordings, phonetics, phonology, phonetically annotated	Australian English
http://projects.ael.uni-tuebingen.de/backbone/moodle/	BACKBONE Pedagogic Corpus of Video-Recorded Interviews	Spoken interviews with video recordings for several European languages, including second language recordings	spoken corpus, multimodal corpus, video, second language acquisition (SLA), bilingualism	English, French, German, Polish, Spanish, Turkish
http://serverdbt.ilc.cnr.it/altweb/	Atlante Lessicale Toscano (ATL Lexical Atlas of Tuscany)	Lexical atlas and demographic data; dialectal resource for Tuscan dialects in Italy	sociolinguistics, dialects, lexical atlas, language geography, dialectology, Italian dialects	Italian
https://catalog.ldc.upenn.edu/LDC2009T25	Web 1T 5-gram ngrams for 10 European languages	N-grams (up to 5-grams) and frequency counts for 10 European languages	n-grams, word frequency, Google	Swedish, Spanish, Romanian, Portuguese, Polish, Dutch, Italian, French, German, Czech
http://www.let.rug.nl/gosse/bin/Web1T5_freq.perl	Web 1T 5-gram database for Dutch	N-grams and frequency counts for Dutch	n-grams, word frequency, Google	Dutch
http://rugtest16.service.rug.nl/gosse/Ngrams/	Groningen Twitter Corpus	Dutch twitter corpus containing approximately 2.6 billion tweets and 28 billion tokens collected between January 2014 and December 2014, n-gram parsed up to 5-grams	n-grams, twitter, web corpus, web language, social media	Dutch
http://www.linguistics.ucsb.edu/research/santa-barbara-corpus	Santa Barbara Corpus of Spoken American English (SBCSAE)	249,000 words with transcriptions, audio and timestamps	spoken corpus, audio recordings, phonetics, phonology, phonetically annotated	English
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/IMS-GECO.en.html	IMS GECO Phonetic Convergence database	46 dialogs (ca. 25 min long) between female German speakers, in speaker-visible and speaker-invisible contexts for the study of phonetic convergence	spoken corpus, audio recordings, phonetics, phonology, multimodal corpus, phonetic convergence, accommodation, interpersonal synchrony, sociolinguistics, sociophonetics	German
http://quod.lib.umich.edu/cgi/c/corpus/corpus?c=micase;page=mbrowse	MICASE Michigan Corpus of Academic Spoken English	152 transcripts totaling 1.8 million words of academic spoken English	spoken corpus, university language, registers, formal language	English
http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm	Blog Authorship corpus	681288 posts totaling 140 mio words from 19,320 bloggers, collected in 2004, balanced for gender; with age, gender and industry/occupation information	corpus, web language, social media, web corpus, demographic data, sociolinguistics	English
https://www.ausnc.org.au/	AusNC Australian National Corpus	Collection of Australian English corpora (including ACE, ART, AusLit, Braided Channels, COOEE, GCSAusE, ICE Corpus, MD Corpus, Monash Corpus); includes many registers and different time periods and transcribed speech from sociolinguistic interviews with gender information	corpus, Australian English, dialects, spoken corpus, written language, literature, poetry, historical corpus, language history, varieties of English	English
http://taiccm.org/	Taiwan Corpus of Child Mandarin (TCCM)	Taiwan Corpus of Child Mandarin (TCCM)	corpus, child language, language acquisition, L1, children, learner corpus	Chinese
http://link.springer.com/article/10.3758/BF03193116	Wurm 2007 Danger and Usefulness Norms	A published research article that includes ratings for the danger and usefulness of words	semantics, danger, usefulness, semantic norms, meaning, perceptual attributes	English
http://link.springer.com/article/10.3758/BRM.40.1.183#page-1	Semantic Feature Production Norms	Semantic feature production norms for a 456 words (objects and events)	semantic features, semantics, feature norms, distinctive features, objects and events, properties	English
http://www.neuro.mcw.edu/ratings/	Wisconsin Perceptual Attribute Ratings Database (MCWisc)	Perceptual attribute norms for four sensory domains: sound, color, manipulation, motion; for 1402 words, including emotion ratings reflecting intensity and valence	perceptual attributes, manipulability, semantics, concepts, perception, manipulability, valence, affect, feeling	English
http://link.springer.com/article/10.3758/BF03195584	Extension of Paivio norms	Extension of Paivio et al. (1968) lexical norms	gender ladenness, sexual language, stereotypes, age of acquisition (AOA), number of meanings, number of associates, emotionality, pleasantness, emotional valence, children's dictionaries, concreteness, meaningfulness, goodness, word frequency, imagery, imageability, language acquisition, children's word knowledge, lexical knowledge, word knowledge	English
http://sumale.vjf.cnrs.fr/pronoms/	Les marques personnelles dans les languages africaines	Database of personal pronouns of African languages	typology, morphosyntax, morphology, syntax, personal pronouns, person marking	multilingual
http://typo.uni-konstanz.de/archive/intro/	The Konstanz Universals Archive	A list of proposed typological universals	language typology, Greenbergian universals, morphology, syntax, morphosyntax, word order	multilingual
http://typo.uni-konstanz.de/rara/intro/index.php	Das grammatische Raritätenkabinett	Informal list of grammatical rarities / typologically rare features	typology, universals, rare features, syntax, morphosyntax	multilingual
http://www.soundcomparisons.com	Sound comparisons	Comparative atlas and map with audio samples of Germanic, Romance, Slavic, Celtic, Andean and Mapudungun	language geography, comparative linguistics, dialectology, Indo-European languages, pronunciation, sound patterns, phonetics, phonology, cognates, cognacy	multilingual, including English, German, French, Italian, Spanish and Portuguese
http://langscape.umd.edu/	Langscape	World map / linguistic atlas showing the location of languages and visualizing linguistic diversity across the globe	language geography, linguistic diversity, typology, map, endangered languages, atlas	multilingual
http://sswl.railsplayground.net/	SSWL Syntactic Structures of the World's Languages	Typological database with syntactic features for 250+ languages of the world	typology, morphology, morphosyntax, syntax, word order, universals	multilingual
http://sealang.net/monkhmer/dictionary/	SEAlang Mon-Khmer Etymological Dictionary	Dictionary for comparative and historical linguistics of Mon-Khmer languages	etymology, dictionary, lexical data, language history, historical linguistics, phylogenetics, comparative dictionaries, Asian languages	multilingual
http://pollex.org.nz/	Polynesian Lexicon Project (Pollex Online)	Large-scale comparativ dictionary of Polynesian languages	Polynesian, Austronesian, lexical data, comparative dictionary, cognacy, cognates, historical linguistics, Pacific languages, word lists	multilingual
http://transnewguinea.org/	TransNewGuinea.org	Database of languages from the Trans-New Guinea family and friends, encompassing 900+ languages and info on 1000+ words	Pacific languages, Trans-New Guinea family (TNG), Papua New Guinea (PNG), language history, historical linguistics, linguistic diversity, Austronesian	multilingual
http://starling.rinet.ru/new100/main.htm	The Global Lexicostatistical Database (GLD)	Basic word lists for many of the world's languages	comparative linguistics, historical linguistics, phylogenetics, lexicostatistics, basic vocabulary, word lists, Swadesh list	multilingual
http://www.lapsyd.ddl.ish-lyon.cnrs.fr/	Lyon-Albuquerque Phonological Systems Database (LAPSyD)	Searchable database of basic phonological information on a wide sample of the world's languages	phonological typology, phoneme inventory, phonology, phonetics, consonants, vowels, syllable structure, linguistic stress, lexical tones	multilingual
https://doi.org/10.3758/s13428-018-1099-3	The Glasgow Norms	normative ratings for 5,553 English words on nine psycholinguistic dimensions: arousal, valence, dominance, concreteness, imageability, familiarity, age of acquisition, semantic size, and gender association	English		Scott, G. G., Keitel, A., Becirspahic, M., Yao, B., & Sereno, S.C. (2018). The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research Methods, 51, 1258–1270.
https://www.cogsci.mq.edu.au/research/resources/nwdb/nwdb.html	ARC Nonword Database	358,534 nonwords			Rastle, K., Harrington, J., & Coltheart, M. (2002). 358,534 nonwords: The ARC Nonword Database. Quarterly Journal of Experimental Psychology, 55A, 1339-1362.
https://lingualab.ca/en/project/norms-familiarity-perceptual-strength	French Canadian conceptual familiarity norms	3,596 nouns and online data about them from 313 Canadian French speakers	French
https://smallworldofwords.org/en/project	Small World of Words	Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues in English, 12,571 cues in Dutch	Dutch