To my linguistics homepage

LING 388
Computers and Language
Spring 2019

This is a introductory course in computational linguistics for undergraduates. There are no prerequisites. There is no textbook. Student will learn to program using Python (3.x) and also learn to use computational tools such as NLTK for language analysis. Both classroom lectures and computer laboratory exercises will be used.


We will use Python (freely available) in the computer laboratory classes.

Instructor: Sandiway Fong
Office: 311 Douglass


Location McClelland Park, Rm 102 (uaccess is not right: ECE 102)
Time Tuesday-Thursday 12:30-1:45 pm


See lecture 1 slides

Lecture Notes

Available in both Adobe PDF and Microsoft Powerpoint formats.


Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
1/10 lecture1.pdf lecture1.pptx 21 link Syllabus, Homework 1: install Python 3 on your computer.
1/15 lecture2.pdf lecture2.pptx 47 link link Numbers in Python. Berkeley Parser. Illinois Named Entity Recognizer. Google N-grams. Quick Homework 2.
1/17 lecture3.pdf lecture3.pptx 26 link link Homework 2 review. Complex math: cmath. Homework 3. The computer representation of numbers and characters. 2's complement arithmetic. Unicode UTF-8.
1/22 lecture4.pdf lecture4.pptx 22 link Homework 3 review. The computer representation of numbers and characters. 2's complement arithmetic.
1/24 lecture5.pdf lecture5.pptx 20 link Introduction to Python: floating point numbers, character sets. Homework 4
1/29 sample.pdf sample.pptx link Introduction to Python: strings, lists as queues and stacks. Tuples. Dictionaries. for-loop. range(). Introduction to Python, led by Colton Michael Flowers.
1/31 link Introduction to Python, led by Colton Michael Flowers.


Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
2/5 link Introduction to Python, led by Colton Michael Flowers.
2/7 link Introduction to Python, led by Colton Michael Flowers.
2/12 lecture10.pdf lecture10.pptx 18 None Homework 4 review. Text Summarization. Quick Homework 5. List comprehensions.
2/14 lecture11.pdf lecture11.pptx 12 link List comprehensions. Counter objects. File input. Homework 5 revisited.
2/19 lecture12.pdf lecture12.pptx 17 link File input, input(), eval(). sys.argv. Formatted output revisited.
File: falconheavylaunch.txt
Slides updated: working directory info from os
2/21 lecture13.pdf lecture13.pptx 12 link A note on working directories. Regex in Python.
2/26 lecture14.pdf lecture14.pptx 13 link Regex in Python contd., match objects methods .group(), .start(), .end(), .span(). finditer() and looping. Homework 7.
Update: terminal session from lecture 14: lecture14.txt
2/28 lecture15.pdf lecture15.pptx 10 link re.sub(), 7 Python regex exercises with the Brown corpus wordlist.
Brown corpus wordlist for class practice:
(note: unzip the file manually if your browser doesn't do it for you automatically.)


Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
3/5 Spring recess: no class.
3/7 Spring recess: no class.
3/12 lecture16.pdf lecture16.pptx 20 link Homework 8: install nltk and nltk_data on your computer.
3/14 lecture17.pdf lecture17.pptx 12 link nltk book: preface + chapter 1 section 1: Computing with Language: Texts and Words. .concordance(), .similar(), .common_contexts(). Lexical diversity.
Panopto crashed several times. Here is the transcript of what we did at the terminal:
P.S. install this too:
3/19 lecture18.pdf lecture18.pptx 22 link nltk contd. nltk.FreqDist(). Stylometrics. Homework 9 on Mendenhall's Characteristic Curves of Composition.
Slides updated: 1:30pm
3/21 lecture19.pdf lecture19.pptx 15 link Term project proposal. nltk contd. Accessing Gutenberg corpora. Example: "Emma" by Jane Austen. Simple statistics: #letters/word, #words/sentence, #times words used. Surprize vs. suprise. Free indirect style. Brown corpus.
Slides updated: 1:45pm
Terminal: lecture19.txt
3/26 lecture20.pdf lecture20.pptx 26 link Homework 9 Review. Conditional Frequency Distribution. Generating random text using bigram. Function random.choices().
Slides updated: 1:45pm
3/28 lecture21.pdf lecture21.pptx 16 link nltk book: chapter 2 contd. Stopword list. (First) names corpus: male vs. female ends in. Presidential Inaugural Address corpus. Universal Declaration of Human Rights corpus. Brown corpus: days of week and categories news and romance.


Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
4/2 lecture22.pdf lecture22.pptx 20 link Importing your own corpus from online sources. Using request.urlopen(). Using .get_text() in BeautifulSoup. Reading local files.
Slides updated: 1:45pm
4/4 lecture23.pdf lecture23.pptx 21 link Searching already tokenized text in nltk: angle bracket notation. Useful Applications of Regular Expressions. Frequency distribution and vowel sequences. Leaving out word-internal vowels and readability of English. Stemming. Stemming for collocations. Word tokenization from scratch: different regexs.
4/9 lecture24.pdf lecture24.pptx 18 link nltk book: chapter 3 contd. Sentence tokenization of raw text. Stream of consciousness and Virginia Woolf.
More Python: pickle, formatted output, textwrap
nltk book: chapter 4: cool examples, e.g. WordNet visualization.
4/11 lecture25.pdf lecture25.pptx 8 linklink More on WordNet and nltk.
Terminal session: terminal25.txt
4/16 lecture26.pdf lecture26.pptx 11 link Code case study: using NetworkX and a more fancy graph display for WordNet.
Terminal session: terminal26.txt
4/18 lecture27.pdf lecture27.pptx 23 linklink Last words on WordNet. Sentiment Analysis via Microsoft Azure and Google Cloud Natural Language.
4/23 lecture28.pdf lecture28.pptx 14 linklink Chapter 5: nltk book. POS tagging: universal tagset.
Slides updated: 2:30pm
4/25 lecture29.pdf lecture29.pptx 12 link Chapter 5: nltk book. Searching corpora with words and tags etc. Use of nltk.bigrams.
Slides updated: 1:45pm
4/30 lecture30.pdf lecture30.pptx 10 link Chapter 5: nltk book. Use of nlk.trigrams.

To my linguistics homepage