So I want to analyze the structure of fairly arbitrary English sentences with nltk. There seem to be lots of classes for doing this (eg PCFGs, ProbabilisticProjectiveDependencyParser s), but all require data to train on? Does NLTK come with data that can be used to train such parsers for arbitrary English (ie, I don't need exotic words, but basic English sentences should work).

The demo for the PPDP, for instance, seems to use a data set for Dutch. Further, this data sentence seems incomplete. It doesn't seem to be able to parse sentences with 'Ik' ('I' in Dutch according to google translate).