Lab02: NLTK

It is easy to get our hands on millions of words of text. What can we do with it, assuming we can write some simple programs? In this chapter we’ll address the following questions:

  • What can we achieve by combining simple programming techniques with large quantities of text?

  • How can we automatically extract key words and phrases that sum up the style and content of a text?

  • What tools and techniques does the Python programming language provide for such work?

  • What are some of the interesting challenges of natural language processing?

This chapter is divided into sections that skip between two quite different styles.

  • In the “computing with language” sections we will take on some linguistically motivated programming tasks without necessarily explaining how they work.

  • In the “closer look at Python” sections we will systematically review key programming concepts.

You can skip Section 2 and 4 if you are confident with Python, but do attempt the exercises and refer back to those sections if you find some of the questions challenging.

We’ll flag the two styles in the section titles, but later labs will mix both styles without being so up-front about it. We hope this style of introduction gives you an authentic taste of what will come later, while covering a range of elementary concepts in linguistics and computer science.

Online reference material for NLTK is at http://nltk.org/.

This lab may raise more questions than it answers, we will try to address some in the rest of this unit.

Original Content Credit to Chapter 1 of the online NLTK Book