To my linguistics homepage

LING 388
Computers and Language
Spring 2019

This is a introductory course in computational linguistics for undergraduates. There are no prerequisites. There is no textbook. Student will learn to program using Python (3.x) and also learn to use computational tools such as NLTK for language analysis. Both classroom lectures and computer laboratory exercises will be used.

Software

We will use Python (freely available) in the computer laboratory classes.

Instructor: Sandiway Fong sandiway@email.arizona.edu
Office: 311 Douglass

Administrivia

Location McClelland Park, Rm 102 (uaccess is not right: ECE 102)
Time Tuesday-Thursday 12:30-1:45 pm

Syllabus

See lecture 1 slides

Lecture Notes

Available in both Adobe PDF and Microsoft Powerpoint formats.

January

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
1/10 lecture1.pdf lecture1.pptx 21 link Syllabus, Homework 1: install Python 3 on your computer.
pythonbook.pdf
1/15 lecture2.pdf lecture2.pptx 47 link link Numbers in Python. Berkeley Parser. Illinois Named Entity Recognizer. Google N-grams. Quick Homework 2.
1/17 lecture3.pdf lecture3.pptx 26 link link Homework 2 review. Complex math: cmath. Homework 3. The computer representation of numbers and characters. 2's complement arithmetic. Unicode UTF-8.
1/22 lecture4.pdf lecture4.pptx 22 link Homework 3 review. The computer representation of numbers and characters. 2's complement arithmetic.
1/24 lecture5.pdf lecture5.pptx 20 link Introduction to Python: floating point numbers, character sets. Homework 4
hw4.xlsx
1/29 sample.pdf sample.pptx link Introduction to Python: strings, lists as queues and stacks. Tuples. Dictionaries. for-loop. range(). Introduction to Python, led by Colton Michael Flowers.
1/31 link Introduction to Python, led by Colton Michael Flowers.

February

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
2/5 link Introduction to Python, led by Colton Michael Flowers.
2/7 link Introduction to Python, led by Colton Michael Flowers.
2/12 lecture10.pdf lecture10.pptx 18 None Homework 4 review. Text Summarization. Quick Homework 5. List comprehensions.
2/14 lecture11.pdf lecture11.pptx 12 link List comprehensions. Counter objects. File input. Homework 5 revisited.
2/19 lecture12.pdf lecture12.pptx 17 link File input, input(), eval(). sys.argv. Formatted output revisited.
File: falconheavylaunch.txt
Slides updated: working directory info from os
2/21 lecture13.pdf lecture13.pptx 12 link A note on working directories. Regex in Python.
2/26 lecture14.pdf lecture14.pptx 13 link Regex in Python contd., match objects methods .group(), .start(), .end(), .span(). finditer() and looping. Homework 7.
hw7.txt
Update: terminal session from lecture 14: lecture14.txt
2/28 lecture15.pdf lecture15.pptx 10 link re.sub(), 7 Python regex exercises with the Brown corpus wordlist.
Brown corpus wordlist for class practice: wordlist.py.zip
(note: unzip the file manually if your browser doesn't do it for you automatically.)

March

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
3/5 Spring recess: no class.
3/7 Spring recess: no class.
3/12 lecture16.pdf lecture16.pptx 20 link Homework 8: install nltk and nltk_data on your computer.
3/14 lecture17.pdf lecture17.pptx 12 link nltk book: preface + chapter 1 section 1: Computing with Language: Texts and Words. .concordance(), .similar(), .common_contexts(). Lexical diversity.
Panopto crashed several times. Here is the transcript of what we did at the terminal:
lecture17.txt
P.S. install this too:
3/19 lecture18.pdf lecture18.pptx 22 link nltk contd. nltk.FreqDist(). Stylometrics. Homework 9 on Mendenhall's Characteristic Curves of Composition.
Mendenhall1887.pdf
Slides updated: 1:30pm
3/21 lecture19.pdf lecture19.pptx 15 link Term project proposal. nltk contd. Accessing Gutenberg corpora. Example: "Emma" by Jane Austen. Simple statistics: #letters/word, #words/sentence, #times words used. Surprize vs. suprise. Free indirect style. Brown corpus.
Slides updated: 1:45pm
Terminal: lecture19.txt
3/26 lecture20.pdf lecture20.pptx link nltk book: chapter 1 section 4 and chapter 2 section 1. Accessing Gutenberg corpora. Example: "Emma" by Jane Austen. Simple statistics: #letters/word, #words/sentence, #times words used. Surprize vs. suprise. Free indirect style.
3/28 lecture21.pdf lecture21.pptx link nltk book: chapter 2 contd. Other corpora.

April

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
4/2 lecture22.pdf lecture22.pptx link Term project proposals.
4/4 lecture23.pdf lecture23.pptx link nltk book: chapter 2 contd. Importing your own corpus from online sources. Using request.urlopen(). Using .gettext() in BeautifulSoup. Reading local files. Searching already tokenized text in nltk: angle bracket notation.
4/9 lecture24.pdf lecture24.pptx link nltk book: chapter 3 contd. Useful Applications of Regular Expressions. Frequency distribution and vowel sequences. Leaving out word-internal vowels and readability of English. Stemming. Stemming for collocations. Word tokenization from scratch.
4/11 lecture25.pdf lecture25.pptx link nltk book: chapter 3 contd. Sentence tokenization of raw text. Stream of consciousness and Virginia Woolf.
More Python: pickle, formatted output, textwrap
nltk book: chapter 4: cool examples
4/16 lecture26.pdf lecture26.pptx link
4/18 lecture27.pdf lecture27.pptx link
4/23 lecture28.pdf lecture28.pptx link
4/25 lecture29.pdf lecture29.pptx link
4/30 lecture30.pdf lecture30.pptx link


To my linguistics homepage