Build A R Shiny App (Tutorial) - Wedding Risk Model
Written by Bryan Clark on June 9, 2019
It continues to amaze us at the progress that our students are making in applying data science in the real world. Bryan Clark (LinkedIn), Data Scientist with H&M and student in our Business Science University DS4B 201-R course has successfully applied the BSPF Framework (our data science project-management methodology applied in the DS4B 201-R course) to model the cost and risk of his forthcoming wedding (congratulations on getting married!!). In this article, Bryan presents his amazing analysis that led to the development of a minimum viable product: A Wedding Invitation Risk Modeling Application using
Shiny. Way to go Bryan!
Objective: Bryan has a big wedding coming up (yay!), and he and his soon-to-be wife are interested in using a statistical model to determine how many invitations to send and to quantify the risk of going over budget.
Process: Bryan used the BSPF Framework. The BSPF is a repeatable data science project management framework designed to connect data science with the business.
Data Product (Web Application): Bryan used the
Rcode developed in this tutorial to build a Wedding Invitation Risk Modeling Application using
Data Science for Business Course, DS4B 201-R: Learn how to solve churn problems (big financial impact to organizations) using the BSPF Framework,
H2OAutomatic Machine Learning, and
LIMEBlack-Box Explainability. Bryan learned the BSPF Framework in the 201 course, and has since applied it to projects at H&M and his Wedding Modeling project. Go Bryan!
Predictive Web Applications for Business with R Shiny Course, DS4B 102-R: Learn to build predictive web applications (with integrated machine learning). Build 2 web applicactions with
XGBoostmachine learning to generate a demand forecast and product price modeling.
All code in this step-by-step tutorial can be found in Bryan’s GitHub repo here.
“We are all a little weird and life’s a little weird, and when we find someone whose weirdness is compatible with ours, we join up with them and fall in mutual weirdness and call it love.” - Dr. Suess
I heard this quote recently while attending the wedding of my very good friend. In fact, I will be having my own wedding in less than a year and just went through the process of selecting a venue. Part of the process includes fixed and variable costs that depend on the number of guests that ultimately RSVP to attend the wedding.
My first question of this process was wondering how many people ultimately respond to wedding invitations. In other words, of the people we invite, how many can we expect to attend?
As I processed this information, I wondered if there was a better way to quantify the uncertainty of how many people I can expect to attend, and then turn that into an estimate of what my budget will be. Additionally, that estimate can be extended into a risk of going over budget.
Objective and Key Result
Objective: Determine a better way to quantify the risk of going over budget
Key Result: Develop a model that consumes guest cost inputs and then outputs the risk profile and a recommendation for moving forward.
I will look to combine elements of statistical simulation, risk analytics, and design thinking to build an analytics product that will extend a single example use-case into a flexible product that others can use.
Business Science Problem Framework
We will leverage the Business Science Problem Framework to shape the structure of our analysis and product development. The goal here is to understand the problem, explore potential opportunities, and operationalize the outcomes.
Libraries & Theme Setup
These are the libraries used for the analysis.
1. View Business as Machine
1.1 Isolate the Business Unit
The “business unit” here is the coupe and their guests. While the costs associated with each guest is based on the wedding vendors, the costs of the wedding are impacted based on the number of guests that are invited and the guests that RSVP yes by the final guest-count cutoff. Inviting too many guests could lead the costs to increase last-minute due to an unexpected amount saying yes, which then creates a risk of going over budget to the couple.
Example Business Case
For the sake of the business case, let’s assume that
150 people are on the initial guest invitation list. when taking all the wedding vendors into account, there is a fixed costs of
50 guests, and then there is a variable cost of
$125 per guest above 50. This will be charged at a cutoff point 45 days prior to the date of the wedding, so at this point, the total cost will be known. The couple’s budget is
$30,000 and they have a risk tolerance of 25%. In other words, they can stomach a
20% chance of going over budget and ideally do not want to exceed
$32,000 at the very most.
1.2 Define Objectives
The main objective is to quantify the risk of going over a couple faces based on how many people they send invitations to.
1.3 Collect Outcomes
In doing initial research, the main outcomes of going over budget is underestimating how quickly the costs can climb.
2. Understand the Drivers
2.1 Investigate Objectives
The objective at the time invites are sent out are to make sure that all the right people are invited, but at the same time ensure that unexpected costs do not pop up because too many people RSVP yes.
There are a few drivers that lead to the uncertainty of guest attendance. These factors are the distance to the wedding, cost of of attendance (e.g. hotel & travel), availability of the guest based on the time of year, and the strength of the relationship between the guest and the couple.
2.2 Synthesize Outcomes
However, there is no historical data to analyze in this instance as the wedding has never happened before. Therefore, it is unknown if the outcomes align with the objective. Other weddings see a variable amount of response rates (guests with a RSVP of yes). This leads to increased uncertainty in the estimates.
2.3 Hypothesize Drivers
The variability of estimates is likely due to the variability in the factors mentioned above. Additionally, even with historical data to use as a guide, there is natural deviation from an expected response rate as the response rate is the expected long-run average. The result we see for the actual event is only a single experiment.
3. Measure the Drivers
3.1 Collect Data
Some initial research states 10-20% of invited guests will not attend while another source shows 60-75% of invited guests will share in the day. In other words, guest attendance rates could be anywhere from 60-90% based on a variety of factors.
We can use this data to help solve the problem analytically.
3.2 Develop KPIs
To develop a baseline, we have to figure out how to quantify our risk based on a hypothesis and statistics. The KPIs we focus on will be likelihood of risk, the expected total cost, and the expected value of risk.
We can use the data from our research to calculate baseline KPIs analytically.
To do so, we will make an assumption that there is a universal probability of each guest RVSPing yes. This then turns the concept of guests saying yes or no into a Bernoulli process. With the law of large numbers, we can then estimate our baseline KPIs based on the expected total number of guests.
Using this approach, we see the couple should plan for 113 guests to respond and will be under budget by $125. Using this method, the couple should Invite All.
4. Uncover Problems and Opportunities
4.1 Evaluate Baseline Performance
While the baseline method is favorable for the couple, it fails to account for the couple’s wedding being only a single experiment or trial of the Bernoulli process. The total cost may sometimes be lower, sometimes be higher, or sometimes the same as our analytical calculations.
4.2 Highlight Potential Problem Areas
The biggest opportunity will be to use simulation to replicate the experiment thousands of times and then analyze the frequency of outcomes. We also can add flexibility to the process by allowing a distribution of probabilities to be sampled for each replication. In other words, we can account for the 60-90% uncertainty range. Another option would be to have different categories of guests and assign each group a different probability.
4.3 Review Process
To summarize, we will attempt to model a Bernoulli process to generate data to simulate wedding guest invites. We will use a uniform distribution to sample guest probabilities provided from the research. This assumes that the expert guess for probability is accurate enough, so the biggest opportunity for improvement lies in using real data to improve the inputs.
This method is a simplified version of the real-world process, but should provide added value over the analytically calculated alternative.
5. Encode Algorithms
5.1 Develop Algorithms
We need a few additional functions to piece together our data generating process.
We need a function to sample a guest count based on
n invitations with
p probability to respond.
We also need a function to simulate
k weddings and return outputs for total guests, total cost and risk results. These outputs are creating using our functions from above and their respective inputs. This function will also accept an argument for
nand a range for
5.2 Quantify Financial Value Potential
Our simulation will return to us the results of
k trials, which we can use to analyze the outcomes. Each of these trials will capture whether the guest count caused the wedding to go over budget as well as the specific amount over budget it went.
5.3 Improve Decision-Making via Recommendation Algorithm
Each trial also converts the outcome into a recommendation. If the outcome was under budget, the recommendation is to
Invite All, otherwise it is
Invite Less. For the final recommendation, we will summarize the outcomes and, if the proportion of outcomes over budget is less than the risk tolerance, we will recommend inviting all guests.
6. Measure Results
6.1 Capture Outcomes
We then load up our simulator with the inputs from above. Only this time, we have 90 invites for in-town guests (with an estimated 90% chance of responding yes) and 60 invites for out-of-town guests (with an estimated 50% chance of responding yes).
6.2 Synthesize Results
Ultimately, we are concerned with if we have invited too many people. Applying our summary function, we see that we should invite less people.
It is worth noting that the weighted average of our response probability is a little lower (74%) than the analytical method (75%), but that is part of the flexibility of the solution design. So even with the slightly lower probability of response, we see that the risk threshold is too great to move forward with this many invites.
6.3 Visualize Outcomes
To better understand the outcomes of the simulation, we will visualize four plots (these will go into the final web application):
- Total Guest Count
- Total Cost
- Total Risk
- Recommendation Outcomes
Here is the code to generate the 4 plots in the Wedding Invitation Application.
6.3.1 Plot Guest Count
6.3.2 Plot Cost
6.3.3 Plot Risk
6.3.4 Plot Recommendation
7. Report Financial Impact
7.1 Measure Actual Results
Based on the outcome of our analysis, we see the couple needs to find a way to trim the invite list. Had we moved forward with the initial analytical solution, the couple would run a greater risk of exceeding their budget than they indicated they would be comfortable with.
The model can be re-run with a smaller invitation list and new results reported. This aspect of the model makes it a great candidate to be created into an analytics product. A tool like Shiny enables all the code to be embedded into a dashboard.
7.2 Quantify Financial Benefit
While the simulation shows the guest count coming in under budget more times than not (with savings potential over $3000), we do see the couple goes over budget over 20% of the time (their threshold). Had they moved forward with the counts as is, they faced a 95% chance to lose potentially over $2000.
Each of these leads them to trimming down the guest list and potentially saving additional money.
7.3 Report Financial Benefit to Stakeholders
Building a PDF report to generate for the couple would be beneficial as they could reference the report as they decide on how many people to trim from the list. The couple could also re-check their financial calculations and decide to move forward with the counts as is.
Whether in RMarkdown or Shiny, this would be a nice value-add feature.
The working minimum viable product created from this analysis can be found here.
I’d like to acknowledge the follow list of people/organizations for helping to influence this project:
- My soon-to-be wife for being my muse and soundboard
- Boston University MSc. Applied Business Analytics: AD 616 Enterprise Risk Analytics
- Business Science & Matt Dancho
- Analytics Lifecycle Toolkit
Now let’s chat in the comments: What did you think of Bryan’s analysis, Shiny App, and use of the BSPF Framework?