Business Science EARL SF 2017 Presentation: tidyquant, timekit, and more!

Written by Matt Dancho on June 18, 2017

Free Data Science Course: Jumpstart with R

We've just launched Jumpstart with R, the perfect course to jumpstart your skills!

Start FREE Course Now

Free Jumpstart with R Course

The EARL SF 2017 conference was just held June 5 - 7 in San Francisco, CA. There were some amazing presentations illustrating how R is truly being embraced in enterprises. We gave a three-part presentation on tidyquant for financial data science at scale, timekit for time series machine learning, and Business Science enterprise applications. We’ve uploaded the EARL presentation to YouTube. Check out the presentation, and don’t forget to check out our announcements and to follow us on social media to stay up on the latest Business Science news, events and information!

EARL 2017 Presentation

If you’re interested in financial analysis, forecasting, and business applications, check out our 30 minute presentation from EARL SF 2017! The presentation is three-in-one:

  1. Financial data science at scale with tidyquant (0:45)
  2. Time series machine learning with timekit (9:10)
  3. Enterprise applications with Business Science (23:00)

Forecasting daily CRAN downloads

One of the big areas of interest on twitter leading up to the presentation was this tweet from Hadley showing growth in daily CRAN downloads are up to 1.25M per day:

…and our response showing that it’s quite possible to exceed 2M downloads per day by end of the year!

How we made the CRAN daily download forecast graph

Several in the #rstats community wanted to know how this forecast was made:

EARL SF 2017 Presentation

It turns out that it’s actually a combination (or ensemble) of four separate predictions:

  1. prophet with linear growth
  2. prophet with logistic growth
  3. timekit using a linear regression on the time series signature
  4. timekit using a spline first to track trend and then a linear regression on the augmented data frame including the times series signature and the spline

We first made a log transformation and then calculated the for separate models. The key takeaway is that individually, none of the forecasts was a silver bullet! Each had issues with either the training set or the test set. The prophet models tended to detect trend better while the timekit models tended to detect pattern better.

EARL SF 2017 Presentation

However, when combined via a simple average of the models, the ensemble prediction exhibited both low training and test error.

EARL SF 2017 Presentation

If you’d like to take a deep dive into the code, the cran_dload_prediction.R file is available for download on the Business Science GitHub site.

Download Presentation and Code on GitHub

The slide deck and code from the EARL SF 2017 presentation can be downloaded from the Business Science GitHub site.

EARL 2017 Presentation

Download the EARL SF 2017 Presentation Slides!


  • We have completed the new package, sweep, which aims at “tidying” up the forecast workflow by applying broom concepts to the various model functions (auto.arima(), ets(), etc) and forecast() output. It’s not on CRAN quite yet, but we are encouraging testing. You can download from github: devtools::install_github("business-science/sweep"). Please provide feedback on the sweep github site!

  • We are working on a name change of the timekit package. While we love the name, there’s also that shares a product by the same name. To make it easier differentiating the software products, we are considering the change to timekitr. This transition is expected to take place in July.

Follow Business Science on Social Media


NEW - Data Science Fundamentals Newsletter

We just launched a new initiative to help you take your data science skills to the next level. Every Tuesday we send you new resources, tips, and advice to accelerate your learning.

Data Science Fundamentals

Sign Up For Data Science Fundamentals Newsletter

Data Science for Business Curriculum

Business Science University is an educational platform that teaches how to apply data science to business. Our offering includes of a fully integrated, project-based 3-Course R-Track.

BSU R-Track Course Curriculum

Each course takes the student through their progression in a data science journey. Begin your journey with DS4B 101-R which teaches foundations using the tidyverse. Next, master machine learning for business with DS4B 201-R, where you learn H2O and many advanced R packages. Finish with DS4B 301-R where you learn to develop high-performing web applications using Shiny, a powerful framework for productionizing R code.

R-Track Curriculum Summary

Business Analysis with R (Beginner) - Data Science Foundations 7-Week course 12 tidyverse Packages 2 business projects
Data Science For Business with R (Intermediate/Advanced) - Machine Learning + Business Consulting 10-Week course H2O, LIME, recipes, and 10 more packages 1 end-to-end business project
Web Apps for Business with Shiny (Advanced) - Web Frameworks (Bootstrap, HTML/CSS) and Shiny 6-Week course Shiny, shinytest, shinyloadtest, profvis, and more! Take machine learning model into production

Join Business Science University Today

Stay Connected, Get Updates, Learn Data Science

If you like our Business Science Software (anomalize, tidyquant, tibbletime, timetk, and sweep), our courses, and our company, you can connect with us:

Start learning today! Business Science University

Subscribe and we'll keep you updated.