LING581: Advanced Computational Linguistics

Pre-requisities

This is the follow-on course to LING 538: Computational Linguistics. (538 is Offered in Fall semesters only.)

Required

Both LING 538 and 581 are required for students enrolled in the HLT Master's Program.

Classroom: Place and Time

One class per week. Spring semester 2016: Wednesday 9:30-12pm. McClelland Park 102.

Course Objectives and Description

This course continues LING/C SC/PSYC 538 Computational Linguistics and is a course designed also to give students more in-depth knowledge and hands-on experience with technique and software than is possible in 538.

Students will be expected to be able to gain enough familiarity to install, run and perform project work on these packages on their own machines.

Projects to be tackled in this course are themed around the topic of language understanding:

  1. Treebanks: Penn Treebank, lookup software.
  2. Part-of-speech taggers.
  3. The use and modification of statistical parsers trained on Treebanks
  4. Advanced linguistic theories
  5. Ontologies and Semantic Networks: WordNet etc.
  6. Question-Answering (QA)
  7. more...

Grading

Students will be given a series of tasks to accomplish. Completion of all tasks will result in a satisfactory grade.

Reading and Computational Resources

Required reading will be from the 538 course textbook Speech and Language Processing (Jurafsky & Martin), and in the form of project documentation (manuals) and papers and/or dissertations to be made available on-line.

Students are expected to install required software (all available freely) on their own machines.


Lecture Schedule

January 13th

Initial meeting. Syllabus. Parsing methods contd.: LR-parsing.

Lecture Notes: lecture1.pdf / lecture1.pptx (34 slides)

January 20th

Topics: LR(0) and LR(1) grammars contd. Homework 1. Corpora and N-gram language models. Colorless green ideas and parsing models.

Lecture Notes: lecture2.pdf / lecture2.pptx (51 slides)
Slides updated: 1:18pm Jan 20th

Panopto: link

Files: grammar0.pl / lr0.pl / parse.pl / lr1.pl / parse1.pl

January 27th

Homework 1 Review. Homework 2: install tregex with the Penn Treebank.

Lecture Notes: lecture3.pdf / lecture3.pptx (27 slides)
Slides updated: 12:15pm 1/27.

Panopto: link

February 3rd

Colorless Green Ideas revisited. Tregex. Homework 3: small clauses and tregex.

Lecture Notes: lecture4.pdf / lecture4.pptx (30 slides)
The_Wonderful_World_of_Tregex.ppt

Panopto: link

February 10th

No lecture.

February 17th

Homework 3 review. Statistical parsing and treebanks. Homework 4: install Bikel-Collins.

Lecture Notes: lecture5.pdf / lecture5.pptx (59 slides)
Dan Bikel's reimplementation of Collins' Parser. dbp.zip

Panopto: link

February 24th

MXPOST tagger. BikelCollins parser/trainer. EVALB.

Lecture Notes: lecture6.pdf / lecture6.pptx (39 slides)

jmx.tar.zip

Panopto: link

March 2nd

Section 23 explained. Homework: training data run.

Lecture Notes: lecture7.pdf / lecture7.pptx (34 slides)

Section 23 sentences: tregex_wsj_23.txt

Panopto: link

March 9th

Recap: Bikel-Collins, tregex and EVALB.
WordNet. Perl package Wordnet Query Data.

Lecture Notes: lecture8.pdf / lecture8.pptx (66 slides)
Slides updated: 12pm 3/9

Panopto: link

March 16th

Spring break. No class.

March 23rd

WordNet on OS X installation issue. Searching WordNet programmatically. Wordnet Homework exercise.

Lecture Notes: lecture9.pdf / lecture9.pptx (35 slides)

Panopto: link
stubs.c
bfs.perl
bfs2.perl
bfs3.perl
bfs4.perl

March 30th

WordNet Homework review. 2nd WordNet homework: GRE word/definition matching. Other WordNet topics.

Lecture Notes: lecture10.pdf / lecture10.pptx (56 slides)
senses.perl

Panopto: link
Slides updated: 12pm 3/30/16

April 6th

WordNet Homework hints: WordNet::Similarity online and Perl Module versions. Framenet: lexical units (LU) and Frames. On the Generative Capacity of natural languages.

Panopto: link

Lecture Notes: lecture11.pdf / lecture11.pptx (42 slides)

April 13th

Solutions to the GRE WordNet homework. Factoid Question-Answering. TREC-9 Database. Homework: use parsing, syntactic transformation, internet search and WordNet.

Panopto: link

Lecture Notes: lecture12.pdf / lecture12.pptx (33 slides)

Update:
(For easier use: code uses m@rk instead if mark and sub found takes two arguments now)
bfs.perl
bfs2.perl
bfs3.perl
bfs4.perl

April 20th

Word vector models and the GRE homework. GloVe.

Panopto: link

Lecture Notes: lecture13.pdf / lecture13.pptx (48 slides)
Updated: 12pm 4/20
cosines.py (Compute vector cosines)
matlab.zip (Matlab supporting functions)

April 27th

Lecture Notes: lecture14.pdf / lecture14.pptx (45 slides)

Quick overview of logic. Propositional logic. Phrase structure grammar in Prolog. Solving the left recursion problem by lookahead.
Slides updated: 12:30pm 4/27
Panopto: link
Grammar developed in class: g.pl
Propositional logic evaluator:
plogic.pl / plogic2.pl / plogic3.pl

May 4th

Lecture Notes: lecture15.pdf / lecture15.pptx (49 slides)
Panopto: link

Lambda calculus. Semantic grammars in Prolog. Montague-style quantifiers vs. Generalized quantifiers. Upwards/downwards entailment and Wordnet.
Grammar developed previously in class: g.pl
Combined Syntax/Semantics grammar developed in class: g2.pl (Updated 12:30pm 5/4/2016)