Scripts to extract clustering datasets from Open Directory Project (ODP), similar to ODP 239
-
Updated
Oct 9, 2010 - Python
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
Scripts to extract clustering datasets from Open Directory Project (ODP), similar to ODP 239
ODP TR-30: Turkish dataset for search result clustering (SRC) studies, extracted from Open Directory Project (ODP), as in ODP-239.
k-Met is a phonetic clustering algorithm for grouping words by their approximate pronunciation. It uses fuzzy matching techniques and the double metaphone indexing algorithm.
NaiveSumm is a naive summarization approach based on Luhn1958 work "The Automatic Creation of Literature Abstracts" It uses the frequencies of words in the document in order to calculate and extract the sentences that include the most frequent words.
Berlinifyer is a cgi script written in Perl that translates HTML documents to the dialect spoken in Berlin, Germany. Migrated from Sourceforge.
Example code on NSLinguisticTagger
Turkish Natural Language Toolkit
"POSTagger" package can helps for predicting fine-grained and/or coarse-grained POS tags of dependency treebank. This package has been developed by Mojtaba Khallash from Iran University of Science and Technology (IUST).
"WordClustering" package used for adding unsupervised features to dependency parser. This package has been developed by Mojtaba Khallash from Iran University of Science and Technology (IUST).
"VerbSpectralCluster" package used for adding verb spectral cluster id to dependency parsing. This package has been developed by Mojtaba Khallash from Iran University of Science and Technology (IUST).
"TreebankTransform" package is a helper tools for tranforming input conll file to desire formats. This package has been developed by Mojtaba Khallash from Iran University of Science and Technology (IUST).
"SemanticTagger" package used for adding semantic feature to dependency parsing. This package has been developed by Mojtaba Khallash from Iran University of Science and Technology (IUST).
A client-server system to parse and answer questions in German using Prolog and Apache Lucene. Comes with a web client and goodies.
Metaphone is a phonetic algorithm, an algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and …
An English Grammar Dictionary and a Natural Language Parser --- in Common Lisp and Prolog
sms processing and key data extraction
Vietnamese tokenizer (Maximum Matching and CRF)
Automated timeline for news events built at Insight Data Science
C++ Implementation for Affinity Propagation
Created by Alan Turing