R World News - Episode 1
The forecast package (https://cran.rstudio.com/web/packages/forecast/index.html) by Rob Hyndman is now at version 7.1 and has includes bug fixes along with some improvements, such as support for multivariate linear models. One of the more notable additions is built-in support for plotting forecast objects using ggplot2. There are 11 autoplot() S3 methods for various forecast objects, 8 “gg” methods for directly starting ggplot2 plot constructs, 2 fortify() methods for transforming forecast objects into data.frames and a new geom_forecast() which lets you easily incorporate forecast object with other geom_s or annotations.
The ggnetwork package (https://briatte.github.io/ggnetwork/) by Francois Briatte has made the jump from devtools into CRAN and provides support for the graphical display of virtually anything you can build with the network or igraph packages. This is the second package featured in today’s episode that takes advantage of the newly enhanced object model of ggplot2, making it straightforward to add scales, Geoms, Stats and even Coords (coordinate systems). The package authors provide a number of Geoms and themes, including the core ones: geom_edges() and geom_nodes().
rprojroot (http://krlmlr.github.io/rprojroot/) is a new utility package by Kirill Müller designed to ease the pain of referencing scripts or files in project subdirectories. Whether you’re building a package, working in an RStudio project or just in a git-managed directory, rprojroot has a simple interface to finding the directory root and letting you make the subdirectory & file references from that point. As the package author says, this solves a seemingly trivial but annoying problem that most of us encounter at one time or another.
The next two packages work great together when you want to process a corpus or three in R. tokenizers (https://cran.rstudio.com/web/packages/tokenizers/index.html), by Lincoln Mullen & Dmitriy Selivanov, provides a consistent interface for breaking up a corpus into components such as n-grams, words, word stems, lines, sentences, paragraphs and more. It uses the robust stringi package for much of its core functionality and it returns plain R vectors vs custom objects, making the transformed texts easy to use and manipulate.
The tidytext package (https://cran.rstudio.com/web/packages/tidytext/index.html) by Julia Silge, David Robinson & Gabriela De Queiroz uses the tokenizers (and a few other packages) to tranform a corpus into tidy data.frames that enable the use of dplyr and dplyr-like idioms in further processing. tidytext also provides tools for sentiment analysis and transforming objects to/from term/document matrix objects.
Finally, Kurt Hornik & Florian Schwendinger beat hrbrmstr to the next package: pandocfilters (https://cran.rstudio.com/web/packages/pandocfilters/README.html). This works with something called the abstract syntax tree (AST) generated each time pandoc is called to transform one document format to another. The AST a JSON file with a node for each token. You can write transformation functions for one or more node types in plain R code and then have pandoc process the resultant, modified AST into the desired format. One of the basic examples shown by the authors is to write a filter to transform all text nodes to lower-case, but you can do anything to any node type and even create ASTs from scratch, all with R code. We suspect this will be something that can be easily used with knitr/rmarkdown in the not-too-distant-future.
Plus a featurette on the feather (https://blog.rstudio.org/2016/03/29/feather/) package.