tokenizers - Fast, Consistent Tokenization of Natural Language Text
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.
Last updated 1 years ago
nlppeer-reviewedtext-miningtokenizercpp
13.33 score 186 stars 81 dependents 1.1k scripts 36k downloadslinevis - Interactive Time Series Visualizations
Create interactive time series visualizations. 'linevis' includes an extensive API to manipulate time series after creation, and supports getting data out of the visualization. Based on the 'timevis' package and the 'vis.js' Timeline 'JavaScript' library <https://visjs.github.io/vis-timeline/docs/graph2d/>.
Last updated 1 months ago
4.40 score 167 downloads