-
Notifications
You must be signed in to change notification settings - Fork 4
NltkRoadmap
This page describes the plans and actions required for getting DELPH-IN data and processing into the NLTK. For some context, please see this discussion from the 2016 Stanford Summit.
Contents
The NLTK (Natural-Language ToolKit) is a large and widely used (particularly in education) Python package supporting a number of NLP tasks, but currently it only has limited support for semantic representations, and nothing for representing/accessing DELPH-IN data (aside from a REPP wrapper). This is a good opportunity for us to expand our presence.
There are three kinds of additions we can provide to the NLTK:
-
Data representations (e.g. modules for representing MRS, Derivation trees, etc.)
-
Data (e.g. make Redwoods available through nltk.download() and provide necessary CorpusReaders)
-
Processors (e.g. ACE or RESTful server interfaces)
Specifically, the following:
- Data representations
- MRS
- DMRS
- EDS
- DM (bilexical dependencies)
- Derivation (and labeled) trees
- Data
-
Package Redwoods 9th growth or later
-
Provide CorpusReader for [incr tsdb()] profiles
-
- Processors
- ACE interface
- RESTful client
We should see if NLTK's DependencyGraph or FeatureStructure classes can be used for the data representations.
There are some non-programming tasks that need to be done, as well.
- Contact the NLTK maintainers (Ewan Klein, Liling Tan, or the
nltk-devs mailinglist)
- if our plans are appropriate for the NLTK (or some subset of them)
- how to proceed with implementations
- Provide unit tests
- Write or collaborate on writing new book sections for the functionality
Home | Forum | Discussions | Events