Make Shiny Fast

…by doing as little work as possible

Alan Dipert (@alandipert)

February 2, 2018

Agenda

  1. Introduce methodology
  2. Learn to measure, analyze with Rprof & profvis
  3. CRAN Explorer optimization tour

Optimization Loop Method

Benchmark

What’s in a benchmark?

  1. Model: Representative user actions
  2. Metrics: Latencies experienced by model user

Model example

Reserving flights

😱

Results took > 20 seconds!

It’s OK.

  • Users expect to wait, UI confirms expectation
  • It’s Fast Enough™

Benchmarking in practice

Best done casually!

  • Fast Enough is easy to see
  • Only when it’s not Fast Enough must we Analyze

Analyze

Analysis

  1. Exercise model to produce metric data
  2. Identify the one slowest thing

Optimizing slowest thing gives highest payoff

Rprof and profvis

  • “Feels slow” usually means R is busy
  • Rprof: sample what R is doing
    • Computing (ggplot2, dplyr)
    • Waiting (database, network, disk)
  • profvis: visualize Rprof output

The call stack

Code

inner <- function(x) { 
  stop("oh no")
}
middle <- function(x) { x }
outer <- function(x) { x }

outer(middle(inner()))

Each call creates a frame on the call stack

Stack

Traceback

Call stack over time

outer(middle(inner()))
outer(middle(inner()))
outer(middle(inner()))

🤔

delay <- function(expr) {
  profvis::pause(1)
  force(expr)
}

delay(delay(delay(1)))

What if width represented duration?

profvis in action

library(profvis)
delay <- function(expr) {
  profvis::pause(1)
  force(expr)
}

profvis({
  delay(delay(delay(1)))
})

Short profvis Demo

example_apps/profvis_demo

In Practice

CRAN explorer

Optimizing CRAN explorer

Organization

cran_explorer/
├── app.R
├── deps.csv
├── packages.csv
├── plot_cache.R
└── utils.R
  • app.R: Shiny app
  • deps.csv, packages.csv: data
  • plot_cache.R: Disk-based plot cache
  • utils.R: Download, prepare .csv files

Architecture

  • utils.R for downloading .csv files
  • Data loaded as global reactiveVals on app.R startup
  • dplyr used to search, filter
  • ggplot2 used for plots

Optimization #1: Pre-process

  • Didn’t download from METACRAN every time
  • Winston’s experience saved time
  • Rule of thumb: if the data is big, pre-process

Optimization #2: Beware dplyr::group_by()

group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed “by group”.

group_by() example

> mtcars %>% summarise(disp = mean(disp), hp = mean(hp))
      disp       hp
1 230.7219 146.6875
> mtcars %>% 
    group_by(cyl) %>% 
    summarise(disp = mean(disp), hp = mean(hp))
    cyl     disp        hp
  <dbl>    <dbl>     <dbl>
1     4 105.1364  82.63636
2     6 183.3143 122.28571
3     8 353.1000 209.21429

filter() after group_by() Slowdown

mtcars %>% filter(disp > 200) # 2.99 sec
mtcars %>% group_by(cyl) %>% filter(disp > 200) # 3.93 sec
  • First filter applied only to mtcars
  • Second filter applied to each group

Offending reactive

packages_released_on_date <- reactive({
  req(input$date)
  all_data %>%
    filter(date <= input$date) %>%
    group_by(Package) %>%               # <--
    filter(any(date == input$date)) %>% # <--
    summarise(
      Version = first(Version),
      total_releases = n()
    ) %>%
    ungroup()
})

app.R at 0f7560

Optimization #3: CSVs read faster than RDS

microbenchmark(
  read_csv("packages.csv"),
  readRDS("packages.rds")
)
expr mean
read_csv("packages.csv") 661.4826
readRDS("packages.rds") 851.1554

Sidenote: scopes

  • R process-global (top-level)
  • Per-session (inside server function)
all_data <- reactiveVal(read_csv("packages.csv"))

app.R at 698b8fc

Optimization #4: Plot caching

  • plotCache: read-through cache for plots
  • Coming soon to Shiny

Thank you!

https://twitter.com/alandipert

https://github.com/alandipert/rstudio-conf-2017-shiny-perf