To my linguistics homepage

LING 388
Computers and Language
Spring 2019

This is a introductory course in computational linguistics for undergraduates. There are no prerequisites. There is no textbook. Student will learn to program using Python (3.x) and also learn to use computational tools such as NLTK for language analysis. Both classroom lectures and computer laboratory exercises will be used.

Software

We will use Python (freely available) in the computer laboratory classes.

Instructor: Sandiway Fong sandiway@email.arizona.edu
Office: 311 Douglass

Administrivia

Location McClelland Park, Rm 102 (uaccess is not right: ECE 102)
Time Tuesday-Thursday 12:30-1:45 pm

Syllabus

See lecture 1 slides

Lecture Notes

Available in both Adobe PDF and Microsoft Powerpoint formats.

January

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
1/10 lecture1.pdf lecture1.pptx 21 link Syllabus, Homework 1: install Python 3 on your computer.
pythonbook.pdf
1/15 lecture2.pdf lecture2.pptx 47 link link Numbers in Python. Berkeley Parser. Illinois Named Entity Recognizer. Google N-grams. Quick Homework 2.
1/17 lecture3.pdf lecture3.pptx 26 link Homework 2 review. Complex math: cmath. Homework 3. The computer representation of numbers and characters. 2's complement arithmetic. Unicode UTF-8.
1/22 lecture4.pdf lecture4.pptx Homework 2 review. Python tutorial (section 3): numbers, the cmath library (e.g. cmath.sqrt(-1)), strings.
1/24 lecture5.pdf lecture5.pptx link Introduction to Python: lists. Lists as stacks and queues.
1/29 lecture6.pdf lecture6.pptx link Introduction to Python: recap of lists as queues and stacks. Tuples. Dictionaries. for-loop. range().
1/31 lecture7.pdf lecture7.pptx link Introduction to Python. Classroom exercises 1-3. str.split(). List comprehension. re.sub (regular expression set substitution). Counter. Homework 4.

February

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
2/5 lecture8.pdf lecture8.pptx link Introduction to Python: def, input(), sys.argv[1], try-exception handling, formatted output
2/7 lecture9.pdf lecture9.pptx link Introduction to Python: file I/O.
Example data file.
2/12 lecture10.pdf lecture10.pptx link Named Entity Recognition (NER). The Illinois NER system.
Wall Street Journal data:
2/14 lecture11.pdf lecture11.pptx link Regex in Python.
2/19 lecture12.pdf lecture12.pptx link
2/21 lecture13.pdf lecture13.pptx link
2/26 lecture14.pdf lecture14.pptx link
2/28 lecture15.pdf lecture15.pptx link

March

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
3/5 Spring recess: no class.
3/7 Spring recess: no class.
3/12 lecture16.pdf lecture16.pptx link
3/14 lecture17.pdf lecture17.pptx link Installing nltk and nltk data on MacOS and Windows 10 (64 bit Python 3).
3/19 lecture18.pdf lecture18.pptx link nltk book: preface + chapter 1 section 1: Computing with Language: Texts and Words.
3/21 lecture19.pdf lecture19.pptx link nltk book: chapter 1 section 2-4.
3/26 lecture20.pdf lecture20.pptx link nltk book: chapter 1 section 4 and chapter 2 section 1. Accessing Gutenberg corpora. Example: "Emma" by Jane Austen. Simple statistics: #letters/word, #words/sentence, #times words used. Surprize vs. suprise. Free indirect style.
3/28 lecture21.pdf lecture21.pptx link nltk book: chapter 2 contd. Other corpora.

April

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
4/2 lecture22.pdf lecture22.pptx link Term project proposals.
4/4 lecture23.pdf lecture23.pptx link nltk book: chapter 2 contd. Importing your own corpus from online sources. Using request.urlopen(). Using .gettext() in BeautifulSoup. Reading local files. Searching already tokenized text in nltk: angle bracket notation.
4/9 lecture24.pdf lecture24.pptx link nltk book: chapter 3 contd. Useful Applications of Regular Expressions. Frequency distribution and vowel sequences. Leaving out word-internal vowels and readability of English. Stemming. Stemming for collocations. Word tokenization from scratch.
4/11 lecture25.pdf lecture25.pptx link nltk book: chapter 3 contd. Sentence tokenization of raw text. Stream of consciousness and Virginia Woolf.
More Python: pickle, formatted output, textwrap
nltk book: chapter 4: cool examples
4/16 lecture26.pdf lecture26.pptx link
4/18 lecture27.pdf lecture27.pptx link
4/23 lecture28.pdf lecture28.pptx link
4/25 lecture29.pdf lecture29.pptx link
4/30 lecture30.pdf lecture30.pptx link


To my linguistics homepage