LING581: Advanced Computational Linguistics

Pre-requisities

This is the follow-on to the introductory course LING 538: Computational Linguistics.
(Note: 538 is Offered in Fall semesters only.)

Required

Both LING 538 and 581 are required for students enrolled in the HLT Master's Program.

Classroom: Place and Time

Spring semester 2019: Tuesdays and Thursdays 3:30-4:45pm. McClelland Park, Room 102.

Course Objectives and Description

This course continues LING/C SC/PSYC 538 Computational Linguistics and is a course designed also to give students more in-depth knowledge and hands-on experience with technique and software than is possible in 538.

Students will be expected to be able to gain enough familiarity to install, run and perform project work on these packages on their own machines.

Projects to be tackled in this course are themed around the topic of language understanding:

  1. Treebanks (phrase-structure/dependency-based): e.g. Penn Treebank, lookup software.
  2. Part-of-speech taggers.
  3. The use and modification of statistical parsers trained on Treebanks
  4. Advanced linguistic theories
  5. Ontologies and Semantic Networks: WordNet etc.
  6. Question-Answering (QA)
  7. more...

Grading

Students will be given a series of tasks to accomplish. Completion of all tasks will result in a satisfactory grade.

Reading and Computational Resources

Required reading will be from the 538 course textbook Speech and Language Processing (Jurafsky & Martin), and in the form of project documentation (manuals) and papers and/or dissertations to be made available on-line.

Students are expected to install required software (all available freely) on their own machines.


Lecture Schedule

January

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
1/10 lecture1.pdf lecture1.pptx 26 link Syllabus, Homework 1: install Python 3 and nltk (if not already present).
1/15 lecture2.pdf lecture2.pptx 22 link Homework 2 on nltk. Loading your own corpus. Example: Mrs. Dalloway by Virigina Woolf.
1/17 lecture3.pdf lecture3.pptx 13 link Named Entity Recognition and Google Cloud Natural Language. Sentiment/magnitude scores. Dependency parses. Quick Homework 3.
1/22 lecture4.pdf lecture4.pptx 26 link Homework 3 remark. Homework 2 review. Homework 4: Install full Penn Treebank into nltk and test.
Slides corrected and modified: 10pm
1/24 lecture5.pdf lecture5.pptx 8 link link Homework 3 Review. Homework 5.
1/29 lecture6.pdf lecture6.pptx link None.
1/31 lecture7.pdf lecture7.pptx 35 link Replaces lecture 6 as well. Homework 6: Install Princeton WordNet (database and wnb browser). Install WordNet::QueryData. Hypernyms, hyponyms, meronyms, etc.

February

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
2/5 link
2/7 link
2/12 lecture10.pdf lecture10.pptx 5 link Review of Lecture 7 materials.
wnquerydata.perl
bfs.perl bfs4.perl
2/14 scheffler.pdf scheffler.pptx 42 link Guest lecture: Hidden Markov Models, Tatjana Scheffler.
2/19 lecture12.pdf lecture12.pptx 33 link WordNet verbs and adjectives. Framenet. bfs.perl searching.
2/21 zampieri.pdf zampieri.pptx 46 link Guest lecture: Text categorization, Marcos Zampieri.
annotation.pdf
2/26 lecture14.pdf lecture14.pptx 28 link More on WordNet. Homework 6. Word2vec examples.
code2.py
2/28 picoral.pdf 4 link link Guest lecture: Word embeddings, Adriana Picoral.

March

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
3/5 Spring recess: no class.
3/7 Spring recess: no class.
3/12 lecture16.pdf lecture16.pptx 17 link Cosine similarity. GloVe: Global Vectors for Word Representation. Homework 7.
Papers to read: firth.pdf, langendoen1964.pdf
Code: cosines.py to use with the GloVe data.
Updated (4:45pm) vectors.txt
3/14 lecture17.pdf lecture17.pptx link Guest lecture: Odin, Gus Hahn-Powell.
odin-slides.pdf
3/19 lecture18.pdf lecture18.pptx 45 Link Some possible test cases for WordNet and Distributional Semantics: Similarity (gloss vs. word), Semantic Opposition, Semantic Bleaching and Logical Metonymy.
3/26 lecture19.pdf lecture19.pptx 20 link Longendoen on Firth. Stanford tregex and the Penn treebank.
3/28 lecture20.pdf lecture20.pptx 20 link More on tregexp operators and syntax. Homework 8.
Command line options for tregex: here

April


Slides updated: see slide 4 etc. 6pm
Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
4/2 lecture21.pdf lecture21.pptx 27 link On free relatives in the PTB WSJ using tregex. Compare analyses with Google Cloud Natural Language.
4/4 lecture22.pdf lecture22.pptx 46 link Google Linguist talk. Homework 8 Review. The PTB in detail. Bikel's Parser. Homework: install it and verify.
Bikel Parser: dbp.zip
Compressed obj file: wsj-02-21.obj.gz (keep it compressed!)
4/9 lecture23.pdf lecture23.pptx 16 < link link Parsing and training with Bikel Collins on the PTB WSJ.
4/11 lecture24.pdf lecture24.pptx 43 link A note on tagging: jmx, Flair. EvalB.
jmx.tar.zip
4/16 lecture25.pdf lecture25.pptx 7 link Abbreviated class slides. Homework 9.
PTB WSJ section 23 input: wsj_23.txt.zip
4/27 lecture26.pdf lecture26.pptx 27 linklink Homework 9 Part 3. (Submit together with parts 1 and 2.)
Sensitivity to perturbation of training data.
Code: evalb.c
4/23 > link No lecture today.
4/25 hulden.pdf 59 link Teaching demo: Mans Hulden.
4/30 lecture28.pdf lecture28.pptx 5 link GPT-2 in the news.


To my linguistics homepage