Electronic dictionary projects/corpus studies
The lab is working on creating two electronic dictionaries/corpora: one on Modern Hebrew, another on Maltese. Based on the most recognized authoritative Hebrew dictionary (Even-Shoshan), an electronic lexicon of Modern Hebrew has been created, using a transcription system designed to allow for easy electronic textual manipulation and analysis.
The goals of this electronic dictionary include the ability to run chi-square analyses for investigating consonant co-occurrence statistics, as well as the development of programs that calculate an item's neighborhood density, morphological family size, and lexical uniqueness point. This corpus will then provide a useful tool for specifying items to be used in psycholinguistic experiments. A similar project is planned for Maltese.
PsyCoLbot - Spider/WebCrawler
PsyCoLbot is a project of the Psycholinguistics and Computational Linguistics Laboratory at the University of Arizona. This Bot is based on GNU Wget, a utility for retrieving files using HTTP and FTP protocols. It works non-interactively, and can retrieve HTML pages and FTP trees recursively. It can be used for mirroring Web pages and FTP sites, or for traversing the Web gathering data. It is run by the end user or archive maintainer.
PsyCoLbot is a web-based tool designed to collect data on Maltese language usage via the World-Wide Web and to compile frequency-ranked lists of roots and words for purely academic purposes.
The PsyCoLBot should:
- follow robots.txt and robot conventions
- declare the proper User-Agent identifier (User-Agent: PsyCoLbot (http://psycol.sbs.arizona.edu/research/?v=4))
- download multiple text pages from one site
- make requests from the same IP address -- research.sbs.arizona.edu
The PsyCoLBot should not:
- make consecutive visits within a 3 second time-span
- download files not associated with language content (i.e. images, scripts, etc.)
To contact the project members, please email ussishki at email dot arizona dot edu