business-science.io2024-03-11 09:52:242024-03-11T09:52:24-04:00https://www.business-science.ioBusiness Scienceinfo@business-science.ioMake Microsoft Word Reports with R + officedown2024-02-24 11:00:002024-02-24T11:00:00-05:00https://www.business-science.io/code-tools/2024/02/24/make-microsoft-word-reports-officedown<p>What’s the one thing that will impress your company (that you can make in under 60 minutes)? <strong>A professional business report.</strong></p>
<p>And Microsoft Word is the defacto standard (NOT Jupyter Notebooks or HTML web-reports). Even PDFs aren’t ideal, especially if they need to review and comment on them.</p>
<h3 id="table-of-contents">Table of Contents</h3>
<p>Today I’m going to show you how to make professional Microsoft Word Reports use <code class="language-plaintext highlighter-rouge">officedown</code>. Here’s what you’re learning today:</p>
<ul>
<li>Tutorial: How to use <code class="language-plaintext highlighter-rouge">officedown</code> to effortlessly produce a Microsoft Word Report (that your company will read)</li>
<li><strong>Bonus: Get a Free Rmarkdown Template for making Word Reports</strong></li>
</ul>
<p><img src="/assets/officedown_word_report.jpg" alt="Microsoft Word Report Made with R" /></p>
<hr />
<!--
# SPECIAL ANNOUNCEMENT: How To Become A <u>6-Figure Business Scientist</u> (Even In A Recession) on August 30th
![Business Scientist](/assets/business-science-cube-2.jpg)
**What:** How To Become A 6-Figure Business Scientist (Even In A Recession)
**When:** Wednesday August 30th, 2pm EST
**How It Will Help You:** Data science in 2023 has changed. *The 10+ person data science team is out.* And the one-person Business Scientist is in. I'll show you how to become a 1-person data science team inside [my LIVE 6-figure business scientist masterclass](https://learn.business-science.io/registration-2-page?el=website).
**Price:** Does **Free** sound good?
**How To Join:** [**👉 Register Here**](https://learn.business-science.io/registration-2-page?el=website)
-->
<h1 id="special-announcement-chatgpt-for-data-scientists-workshop-on-march-27th">SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on March 27th</h1>
<p><a href="https://learn.business-science.io/registration-chatgpt-2?el=website">Inside the workshop</a> I’ll share how I built a Machine Learning Powered Production Shiny App with <code class="language-plaintext highlighter-rouge">ChatGPT</code> (extends this data analysis to an <em>insane</em> production app):</p>
<p><img src="/assets/lab_82_chatgpt_rcode.jpg" alt="ChatGPT for Data Scientists" /></p>
<p><strong>What:</strong> ChatGPT for Data Scientists</p>
<p><strong>When:</strong> Wednesday March 27th, 2pm EST</p>
<p><strong>How It Will Help You:</strong> Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside <a href="https://learn.business-science.io/registration-chatgpt-2?el=website">my free chatgpt for data scientists workshop</a>.</p>
<p><strong>Price:</strong> Does <strong>Free</strong> sound good?</p>
<p><strong>How To Join:</strong> <a href="https://learn.business-science.io/registration-chatgpt-2?el=website"><strong>👉 Register Here</strong></a></p>
<hr />
<h1 id="r-tips-weekly">R-Tips Weekly</h1>
<p>This article is part of R-Tips Weekly, a <a href="https://learn.business-science.io/r-tips-newsletter">weekly video tutorial</a> that shows you step-by-step how to do common R coding tasks. Pretty cool, right?</p>
<p>Here are the links to get set up. 👇</p>
<ul> <li><a href="https://learn.business-science.io/r-tips-newsletter">Get the Code</a></li> <li><a href="https://youtu.be/Sk_4CmouPwk">YouTube Tutorial</a></li> </ul>
<h1 id="this-tutorial-is-available-in-video">This Tutorial is Available in Video</h1>
<p>I have a companion video tutorial that gives you the bonus Rmarkdown MS Word Template shown in this video (plus walks you through how to use it). And, I’m finding that a lot of my students prefer the dialogue that goes along with coding. So check out this video to see me running the code in this tutorial. 👇</p>
<iframe width="100%" height="450" src="https://www.youtube.com/embed/Sk_4CmouPwk" title="YouTube video player" frameborder="1" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<h1 id="why-making-microsoft-word-reports-from-r-is-a-must">Why Making Microsoft Word Reports from R is a Must</h1>
<p>Listen, there’s one way to <strong>immediately turn off an executive…</strong></p>
<p>And that’s by giving them a Jupter Notebook (I mean look at this mess).</p>
<p><img src="/assets/officedown_dont_use_jupyter.png" alt="Don't Use Jupyter for Executives" /></p>
<p class="date text-center">Please don't send Executives reports that look like this.</p>
<p>Nothing against those that use Jupyter Notebooks to make their analysis.</p>
<p>But, if you sent one of those to me (and I’m an executive that’s used to reading reports in Microsoft Office formats like Excel and Word)…</p>
<h3 id="then-im-going-to-immediately-hit-my-email-trash-icon-and-probably-not-tell-you">…Then I’m going to immediately hit my Email Trash Icon (and probably not tell you.)</h3>
<p>How does that make you feel?</p>
<p>You just spent days on a report. And guess what, it’s not getting read.</p>
<h3 id="well-lets-fix-that-by-learning-how-to-making-microsoft-word-reports-today">Well let’s fix that by learning how to making Microsoft Word Reports today.</h3>
<h1 id="thank-you-to-the-developer-and-community">Thank You to the Developer (and Community).</h1>
<p>Before we do our deep-dive into <code class="language-plaintext highlighter-rouge">officedown</code>, I want to take a brief moment to thank the developer, <a href="https://www.linkedin.com/in/davidgohel/">David Gohel</a>. David runs a consulting company <a href="https://www.ardata.fr/">Ardata</a>. Please connect and follow David. <a href="https://github.com/davidgohel">His work is on GitHub here</a>.</p>
<p>Also I’d like to thank <a href="https://www.linkedin.com/in/adrianolszewski/">Adrian Olszewski, Principal Biostatistician at 2KMM</a> for sharing the Office-verse R ecosystem with me. Without community and sharing knowledge, this R-tip wouldn’t be possible.</p>
<h1 id="free-gift-cheat-sheet-for-my-top-100-r-packages-special-data-analysis-topics-included">Free Gift: Cheat Sheet for my Top 100 R Packages (Special Data Analysis Topics Included)</h1>
<p>Before we dive in…</p>
<p><strong>You’re going to need R packages to complete the analysis that goes in your MS Word reports.</strong> So why not speed up the process?</p>
<p>To help, I’m going to share my secret weapon…</p>
<p><strong>Even I forget which R packages to use from time to time.</strong> And this cheat sheet saves me so much time. Instead of googling to filter through 20,000 R packages to find a needle in a haystack. I keep my cheat sheet handy so I know which to use and when to use them. Seriously. <a href="https://www.business-science.io/r-cheatsheet.html">This cheat sheet is my bible.</a></p>
<p><img src="https://www.business-science.io/assets/free_cheatsheet.jpg" alt="Ultimate R Cheat Sheet" /></p>
<p>Once you <a href="https://www.business-science.io/r-cheatsheet.html">download it</a>, head over to page 3 and you’ll see several R packages I use frequently just for Data Analysis.</p>
<p><img src="/assets/cheatsheet_page_3_special_topics.jpg" alt="Cheat Sheet Page 3 Special Topics" /></p>
<p>Which is important when you want to work in these fields:</p>
<ul>
<li>Machine Learning</li>
<li>Time Series</li>
<li>Financial Analysis</li>
<li>Geospatial Analysis</li>
<li>Text Analysis and NLP</li>
<li>Shiny Web App Development</li>
</ul>
<p><a href="https://www.business-science.io/r-cheatsheet.html">So steal my cheat sheet.</a> It will save you a ton of time.</p>
<h1 id="tutorial-make-microsoft-word-reports-with-officedown">Tutorial: Make Microsoft Word Reports with <code class="language-plaintext highlighter-rouge">officedown</code></h1>
<p>Here’s how to use <code class="language-plaintext highlighter-rouge">officedown</code> to start make a professional Word Report.</p>
<h2 id="step-1-make-an-rmarkdown-document">Step 1: Make an Rmarkdown document</h2>
<p>Start by making a normal Rmarkdown document. Go to File > New File > R Markdown.</p>
<p><img src="/assets/officedown_01_make_rmd.jpg" alt="Make RMD File" /></p>
<h2 id="step-2-enable-officedown">Step 2: Enable Officedown</h2>
<p>Enable officedown as the Rmarkdown Output.</p>
<p><img src="/assets/officedown_02_add_officedown.jpg" alt="Enable Officedown" /></p>
<p class="text-center date"> <a href="https://learn.business-science.io/r-tips-newsletter" target="_blank"><strong>Get the code.</strong></a> </p>
<h2 id="step-3-setup-the-documents-global-knitr-options">Step 3: Setup the document’s global <code class="language-plaintext highlighter-rouge">knitr</code> options</h2>
<p>Use these <code class="language-plaintext highlighter-rouge">knitr</code> options to let <code class="language-plaintext highlighter-rouge">officedown</code> format the table and figure captions.</p>
<p><img src="/assets/officedown_03_global_knitr_opts.jpg" alt="Global Knitr Options Officedown" /></p>
<p class="text-center date"> <a href="https://learn.business-science.io/r-tips-newsletter" target="_blank"><strong>Get the code.</strong></a> </p>
<h2 id="step-4-add-table-of-contents">Step 4: Add Table of Contents</h2>
<p>The <code class="language-plaintext highlighter-rouge">block_toc()</code> function allows the Word Table of Contents to be generated.</p>
<p><img src="/assets/officedown_04_toc.jpg" alt="Table of Contents Officedown" /></p>
<p class="text-center date"> <a href="https://learn.business-science.io/r-tips-newsletter" target="_blank"><strong>Get the code.</strong></a> </p>
<p>Here’s what the Table of Contents looks like.</p>
<p><img src="/assets/officedown_04_toc_word_doc.jpg" alt="Table of Contents Officedown Word" /></p>
<h2 id="step-5-add-figures">Step 5: Add Figures</h2>
<p>This is where you start building the core of your report. Officedown integrates:</p>
<ul>
<li>Hyperlinked Figure Captioning using <code class="language-plaintext highlighter-rouge">\@ref(fig:fig_id)</code></li>
<li>Knitr Options like <code class="language-plaintext highlighter-rouge">fig.id</code> to connect the linked references to the figures</li>
<li>Just add R Code inside of the Rmarkdown chunks like you would normally</li>
</ul>
<p><img src="/assets/officedown_05_figures.jpg" alt="Add Figures Officedown" /></p>
<p class="text-center date"> <a href="https://learn.business-science.io/r-tips-newsletter" target="_blank"><strong>Get the code.</strong></a> </p>
<p>Here’s what it looks like in the report:</p>
<p><img src="/assets/officedown_05_figures_word_doc.jpg" alt="Figures Word Doc Officedown" /></p>
<h2 id="step-5-add-tables">Step 5: Add Tables</h2>
<p>The last step is adding tables in your document.</p>
<p><img src="/assets/officedown_06_tables.jpg" alt="Tables Word Doc Officedown" /></p>
<p class="text-center date"> <a href="https://learn.business-science.io/r-tips-newsletter" target="_blank"><strong>Get the code.</strong></a> </p>
<p>And here’s what it looks like in the Word Report.</p>
<p><img src="/assets/officedown_06_tables_word_doc.jpg" alt="Tables Word Doc Officedown" /></p>
<h2 id="step-6-knit-the-report">Step 6: Knit the Report</h2>
<p>The last step is to click the “knit” button.</p>
<p><img src="/assets/officedown_08_officedown_knit.jpg" alt="Knit Word Doc Officedown" /></p>
<p class="text-center date"> <a href="https://learn.business-science.io/r-tips-newsletter" target="_blank"><strong>Get the code.</strong></a> </p>
<p>Viola! You get a professional report:</p>
<p><img src="/assets/officedown_word_report.jpg" alt="Microsoft Word Report Made with R" /></p>
<h1 id="bonus-steal-my-officedown-template">Bonus: Steal My Officedown Template</h1>
<p>Want to speed up the process? You can steal my Officedown Template. All you need to do is <a href="https://learn.business-science.io/r-tips-newsletter">subscribe to my R-Tips Newsletter</a>.</p>
<p><img src="/assets/officedown_07_officedown_template.jpg" alt="Officedown Template" /></p>
<p class="text-center date"> <a href="https://learn.business-science.io/r-tips-newsletter" target="_blank">Steal My Officedown MS Word Template.</a> </p>
<p>Once you register, you’ll get instructions to download all of the R-Tips.</p>
<p>The Officedown Word Template is located in the folder <code class="language-plaintext highlighter-rouge">058_ms_word_reports</code>.</p>
<h1 id="-conclusions">💡 Conclusions</h1>
<p>You learned how to use the <code class="language-plaintext highlighter-rouge">officedown</code> library to create a professional-looking Microsoft Word Report. Great work! <strong>But, there’s a lot more to becoming a data scientist.</strong></p>
<p>If you’d like to become a Business Data Scientist (and have an awesome career, improve your quality of life, enjoy your job, and all the fun that comes along), then I can help with that.</p>
<h1 id="struggling-to-become-a-data-scientist">Struggling to become a data scientist?</h1>
<p>You know the feeling. Being unhappy with your current job.</p>
<p>Promotions aren’t happening. You’re stuck. Feeling Hopeless. Confused…</p>
<p>And you’re praying that the next job interview will go better than the last 12…</p>
<p>… But you know it won’t. Not unless you take control of your career.</p>
<p>The good news is…</p>
<h1 id="i-can-help-you-speed-it-up">I Can Help You Speed It Up.</h1>
<p>I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.</p>
<p>I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.</p>
<p>And I built a training program that gets my students life-changing data science careers (don’t believe me? <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series/">see my testimonials here</a>):</p>
<h4 class="text-center">
6-Figure Data Science Job at CVS Health ($125K)<br /><div style="height:10px;"></div>
Senior VP Of Analytics At JP Morgan ($200K)<br /><div style="height:10px;"></div>
50%+ Raises & Promotions ($150K)<br /><div style="height:10px;"></div>
Lead Data Scientist at Northwestern Mutual ($175K)<br /><div style="height:10px;"></div>
2X-ed Salary (From $60K to $120K)<br /><div style="height:10px;"></div>
2 Competing ML Job Offers ($150K)<br /><div style="height:10px;"></div>
Promotion to Lead Data Scientist ($175K)<br /><div style="height:10px;"></div>
Data Scientist Job at Verizon ($125K+)<br /><div style="height:10px;"></div>
Data Scientist Job at CitiBank ($100K + Bonus)<br /><div style="height:10px;"></div>
</h4>
<h1 id="whenever-you-are-ready-heres-the-system-they-are-taking">Whenever you are ready, here’s the system they are taking:</h1>
<p><a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">Here’s the system</a> that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…</p>
<p><img src="/assets/rtrack_what_theyre_doing_2.jpg" alt="What They're Doing - 5 Course R-Track" /></p>
<p style="font-size: 36px;text-align: center;">
<a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">
<strong>Join My 5-Course R-Track Program Now!</strong><br /><small style="font-size:24px;">(And Become The Data Scientist You Were Meant To Be...)</small>
</a>
</p>
<p>P.S. - Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">This could be you.</a></p>
<p><img src="/img/success_samantha_got_job.jpg" alt="Success Samantha Got The Job" /></p>
R-BloggersLearn-RRR-TipsCode-ToolsXGBoost: Tuning the Hyperparameters (My Secret 2 Step Process in R)2024-01-12 17:00:002024-01-12T17:00:00-05:00https://www.business-science.io/code-tools/2024/01/12/xgboost-hyperparameter-tuning<p>Hey guys, welcome back to my <a href="https://learn.business-science.io/r-tips-newsletter">R-tips newsletter</a>. For years, I was hyperparameter tuning XGBoost models wrong. In 3 minutes, I’ll share one secret that took me 3 years to figure out. When I did, it cut my training time 10X. Let’s dive in.</p>
<h3 id="table-of-contents">Table of Contents</h3>
<p>Here’s what you’re learning today:</p>
<ul>
<li><strong>My big mistake</strong> I’ll explain what I was doing wrong for 3 years. And how I fixed it.</li>
<li><strong>How I Hyperparameter Tune XGBoost Models Now in R</strong>. This will blow your mind.</li>
</ul>
<p><img src="/assets/076_get_the_r_code.jpg" alt="XGBoost R Code" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 076 Folder)</a></p>
<hr />
<!--
# SPECIAL ANNOUNCEMENT: How To Become A <u>6-Figure Business Scientist</u> (Even In A Recession) on August 30th
![Business Scientist](/assets/business-science-cube-2.jpg)
**What:** How To Become A 6-Figure Business Scientist (Even In A Recession)
**When:** Wednesday August 30th, 2pm EST
**How It Will Help You:** Data science in 2023 has changed. *The 10+ person data science team is out.* And the one-person Business Scientist is in. I'll show you how to become a 1-person data science team inside [my LIVE 6-figure business scientist masterclass](https://learn.business-science.io/registration-2-page?el=website).
**Price:** Does **Free** sound good?
**How To Join:** [**👉 Register Here**](https://learn.business-science.io/registration-2-page?el=website)
-->
<h1 id="special-announcement-chatgpt-for-data-scientists-workshop-on-march-27th">SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on March 27th</h1>
<p><a href="https://learn.business-science.io/registration-chatgpt-2?el=website">Inside the workshop</a> I’ll share how I built a Machine Learning Powered Production Shiny App with <code class="language-plaintext highlighter-rouge">ChatGPT</code> (extends this data analysis to an <em>insane</em> production app):</p>
<p><img src="/assets/lab_82_chatgpt_rcode.jpg" alt="ChatGPT for Data Scientists" /></p>
<p><strong>What:</strong> ChatGPT for Data Scientists</p>
<p><strong>When:</strong> Wednesday March 27th, 2pm EST</p>
<p><strong>How It Will Help You:</strong> Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside <a href="https://learn.business-science.io/registration-chatgpt-2?el=website">my free chatgpt for data scientists workshop</a>.</p>
<p><strong>Price:</strong> Does <strong>Free</strong> sound good?</p>
<p><strong>How To Join:</strong> <a href="https://learn.business-science.io/registration-chatgpt-2?el=website"><strong>👉 Register Here</strong></a></p>
<hr />
<h1 id="r-tips-weekly">R-Tips Weekly</h1>
<p>This article is part of R-Tips Weekly, a <a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">weekly video tutorial</a> that shows you step-by-step how to do common R coding tasks. Pretty cool, right?</p>
<p>Here are the links to get set up. 👇</p>
<ul>
<li><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Sign up for our R-Tips Newsletter and get the code.</a></li>
<!-- <li><a href="https://youtu.be/fkwKQi7skAw">YouTube Tutorial</a></li>-->
</ul>
<h1 id="for-years-i-was-hyperparameter-tuning-xgboost-wrong-heres-how-i-do-it-now">For years I was hyperparameter tuning XGBoost wrong. Here’s how I do it now.</h1>
<p>First, here’s a quick review of XGBoost and the algorithm’s hyperparameters.</p>
<p><img src="/assets/076_xgboost_hyperparameters.jpg" alt="XGBoost Hyperparameter Tuning" /></p>
<h3 id="what-is-xgboost">What is XGBoost?</h3>
<p>XGBoost (eXtreme Gradient Boosting) is a popular machine learning algorithm, especially for structured (tabular) data. It’s claim to fame is winning tons of Kaggle Competitions. But more importantly, it’s fast, accurate, and easy to use. But it’s also easy to screw it up.</p>
<h3 id="hyperparameter-tuning">Hyperparameter Tuning</h3>
<p>To stabilize your XGBoost models, you need to perform hyperparameter tuning. Otherwise XGBoost can overfit your data causing predictions to be horribly wrong on out of sample data.</p>
<h3 id="my-3-year-beginner-mistake">My 3-Year “Beginner” Mistake:</h3>
<p><strong>XGBoost has tons of parameters.</strong> The mistake I was making was treating all of the parameters equally. This caused me hours of tuning my models. And my results weren’t half as good until I started doing this.</p>
<h3 id="how-i-improved-my-hyperparameter-tuning">How I improved my hyperparameter tuning:</h3>
<p>XGBoost has one parameter that rules them all. And after 3 years, I noticed that model stability was 80% driven by this parameter. What was it?</p>
<p><strong>Learning rate.</strong> When I figured this out that’s when things started to change. My models got better. My training times were reduced. Win win.</p>
<h3 id="my-simple-2-step-hyperparameter-tuning-method-for-xgboost">My Simple 2 Step Hyperparameter Tuning Method for XGBoost:</h3>
<p>What I was doing wrong was doing random grid search over all of the parameters. This took hours. So I made a key change. I began isolating Learning Rate, tuning it first. This was Step 1. The search space for one parameter is super fast to tune.</p>
<p><strong>What about the other parameters?</strong> Once learning rate was tuned, I then opened the search space to more parameters. This is Step 2. The rest of the parameters have maybe 20% contribution to performance, so that means I can reduce the search space dramatically.</p>
<h3 id="the-big-benefit">The BIG benefit:</h3>
<p>Separating tuning into 2 steps cut my training times by a factor of 10X. And my models actually became better. Faster training, better models. Win win.</p>
<h1 id="xgboost-hyperparameter-tuning-how-to-do-my-2-step-process-in-r">XGBoost Hyperparameter Tuning (how to do my 2 step process in <code class="language-plaintext highlighter-rouge">R</code>)</h1>
<p>Now that you know the secret, let’s see how to do it in <code class="language-plaintext highlighter-rouge">R</code>.</p>
<h3 id="r-code">R Code</h3>
<p><strong>Get The Code:</strong> You can follow along with the R code in the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter</a>. <strong>All code is avaliable in R-Tip 076.</strong></p>
<p><img src="/assets/076_get_the_r_code.jpg" alt="R Code" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 076 Folder)</a></p>
<h2 id="load-the-libraries-and-data">Load the Libraries and Data</h2>
<p>First, we load the libraries and data. Run this code from the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter 076 Folder</a>.</p>
<p><img src="/assets/076_1_libraries_data.jpg" alt="Libraries and Data" /></p>
<p>This loads in the customer churn dataset. We’ll use this to demonstrate the 2 step process.</p>
<p><img src="/assets/076_2_churn_data.jpg" alt="Customer Churn Data" /></p>
<h2 id="set-up-a-model-and-preprocessor-specification">Set up a Model and Preprocessor Specification</h2>
<p>This is from <code class="language-plaintext highlighter-rouge">tidymodels</code>. We’ll use this to set up our model and preprocessing specification. Run this code from the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter 076 Folder</a>.</p>
<p><strong>Important:</strong> We only specify the <code class="language-plaintext highlighter-rouge">learn_rate = tune()</code> as the only tuning parameter right now. This is Step 1. We’ll add more parameters in Step 2.</p>
<p><img src="/assets/076_3_model_and_preprocessor.jpg" alt="Model and Preprocessor" /></p>
<h2 id="step-1-tuning-the-learn-rate">Step 1: Tuning the Learn Rate</h2>
<p>For the first stage, we tune the learn rate. This is the most important parameter. Run this code from the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter 076 Folder</a>.</p>
<p><img src="/assets/076_4_tune_learn_rate.jpg" alt="Tune Learn Rate" /></p>
<p>In the code above:</p>
<ol>
<li>You make a Tuning Grid specifying 10 values for the learn rate.</li>
<li>You set up the Workflow using the model and preprocessing specification.</li>
<li>You set up the Resampling Specification using 5-fold cross validation. Then tune the learn rate using the <code class="language-plaintext highlighter-rouge">tune_grid()</code> function and optimizing for the maximum ROC AUC value.</li>
</ol>
<p>The last line of code returns the ranked results. You can see that the best learn rate is 2.91e-2.</p>
<p><img src="/assets/076_5_rankings.jpg" alt="Tune Learn Rate Results" /></p>
<h2 id="step-2-tuning-the-rest-of-the-parameters">Step 2: Tuning the Rest of the Parameters</h2>
<p>Now that we have the learn rate, we can tune the rest of the parameters. Run this code from the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter 076 Folder</a>.</p>
<p><img src="/assets/076_6_tune_other_params.jpg" alt="Tune Rest of Parameters" /></p>
<p>In the code above:</p>
<ol>
<li>Get the best learn rate from step 1</li>
<li>Update the model specification with the best learn rate and the other parameters to tune.</li>
<li>Make a new grid with 10 combinations of the new tuning parameters</li>
<li>Tune the model using the new grid and the same resampling specification as before.</li>
</ol>
<p>The last line of code returns the ranked results. You can see that the best AUC is still 0.839, which is what we obtained before.</p>
<p><img src="/assets/076_7_rankings.jpg" alt="Tune Rest of Parameters Results" /></p>
<h2 id="bonus-code-finalize-the-model">Bonus Code: Finalize the Model</h2>
<p>Now that we have the best parameters, we can finalize the model. Run this code from the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter 076 Folder</a>.</p>
<p><img src="/assets/076_8_bonus_code.jpg" alt="Bonus Code" /></p>
<h1 id="conclusions">Conclusions:</h1>
<p>You’ve learned my secret 2 step process for tuning XGBoost models in R. But there’s a lot more to becoming an elite data scientist.</p>
<p>If you are struggling to become a Data Scientist for Business, then please read on…</p>
<h1 id="struggling-to-become-a-data-scientist">Struggling to become a data scientist?</h1>
<p>You know the feeling. Being unhappy with your current job.</p>
<p>Promotions aren’t happening. You’re stuck. Feeling Hopeless. Confused…</p>
<p>And you’re praying that the next job interview will go better than the last 12…</p>
<p>… But you know it won’t. Not unless you take control of your career.</p>
<p>The good news is…</p>
<h1 id="i-can-help-you-speed-it-up">I Can Help You Speed It Up.</h1>
<p>I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.</p>
<p>I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.</p>
<p>And I built a training program that gets my students life-changing data science careers (don’t believe me? <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series/">see my testimonials here</a>):</p>
<h4 class="text-center">
6-Figure Data Science Job at CVS Health ($125K)<br /><div style="height:10px;"></div>
Senior VP Of Analytics At JP Morgan ($200K)<br /><div style="height:10px;"></div>
50%+ Raises & Promotions ($150K)<br /><div style="height:10px;"></div>
Lead Data Scientist at Northwestern Mutual ($175K)<br /><div style="height:10px;"></div>
2X-ed Salary (From $60K to $120K)<br /><div style="height:10px;"></div>
2 Competing ML Job Offers ($150K)<br /><div style="height:10px;"></div>
Promotion to Lead Data Scientist ($175K)<br /><div style="height:10px;"></div>
Data Scientist Job at Verizon ($125K+)<br /><div style="height:10px;"></div>
Data Scientist Job at CitiBank ($100K + Bonus)<br /><div style="height:10px;"></div>
</h4>
<h1 id="whenever-you-are-ready-heres-the-system-they-are-taking">Whenever you are ready, here’s the system they are taking:</h1>
<p><a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">Here’s the system</a> that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…</p>
<p><img src="/assets/rtrack_what_theyre_doing_2.jpg" alt="What They're Doing - 5 Course R-Track" /></p>
<p style="font-size: 36px;text-align: center;">
<a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">
<strong>Join My 5-Course R-Track Program Now!</strong><br /><small style="font-size:24px;">(And Become The Data Scientist You Were Meant To Be...)</small>
</a>
</p>
<p>P.S. - Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">This could be you.</a></p>
<p><img src="/img/success_samantha_got_job.jpg" alt="Success Samantha Got The Job" /></p>
R-BloggersLearn-RRR-TipsxgboostCode-ToolsThe Top 5 Time Series Analysis Concepts (that helped me the most in my career)2023-12-23 09:20:002023-12-23T09:20:00-05:00https://www.business-science.io/code-tools/2023/12/23/time-series-analysis<p>Hey guys, welcome back to my <a href="https://learn.business-science.io/r-tips-newsletter">R-tips newsletter</a>. Time series analysis has been critical in my career. But it took me 3 years to get comfortable. In today’s R-Tip, I’ll share 3 years of experience in time series in 3 minutes. Let’s go!</p>
<h3 id="table-of-contents">Table of Contents</h3>
<p>Here’s what you’re learning today:</p>
<ul>
<li><strong>What is Time Series Analysis?</strong> I’ll explain what time series analysis is and why it was important to me to learn it.</li>
<li><strong>The 5 Concepts that Helped Me the Most in My Career</strong>. I’ll share the 5 concepts that helped me the most in my career.</li>
<li><strong>How to Make the 5 Top Time Series Visualizations in 5 lines of R code</strong>. I’ll show you how to make the 5 top time series visualizations in 5 lines of R code.</li>
</ul>
<p><img src="/assets/075_time_series_analysis.jpg" alt="Statistical Test Selection" /></p>
<p class="date text-center">Time Series Analysis (Top 5 Visualizations)</p>
<hr />
<!--
# SPECIAL ANNOUNCEMENT: How To Become A <u>6-Figure Business Scientist</u> (Even In A Recession) on August 30th
![Business Scientist](/assets/business-science-cube-2.jpg)
**What:** How To Become A 6-Figure Business Scientist (Even In A Recession)
**When:** Wednesday August 30th, 2pm EST
**How It Will Help You:** Data science in 2023 has changed. *The 10+ person data science team is out.* And the one-person Business Scientist is in. I'll show you how to become a 1-person data science team inside [my LIVE 6-figure business scientist masterclass](https://learn.business-science.io/registration-2-page?el=website).
**Price:** Does **Free** sound good?
**How To Join:** [**👉 Register Here**](https://learn.business-science.io/registration-2-page?el=website)
-->
<h1 id="special-announcement-chatgpt-for-data-scientists-workshop-on-march-27th">SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on March 27th</h1>
<p><a href="https://learn.business-science.io/registration-chatgpt-2?el=website">Inside the workshop</a> I’ll share how I built a Machine Learning Powered Production Shiny App with <code class="language-plaintext highlighter-rouge">ChatGPT</code> (extends this data analysis to an <em>insane</em> production app):</p>
<p><img src="/assets/lab_82_chatgpt_rcode.jpg" alt="ChatGPT for Data Scientists" /></p>
<p><strong>What:</strong> ChatGPT for Data Scientists</p>
<p><strong>When:</strong> Wednesday March 27th, 2pm EST</p>
<p><strong>How It Will Help You:</strong> Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside <a href="https://learn.business-science.io/registration-chatgpt-2?el=website">my free chatgpt for data scientists workshop</a>.</p>
<p><strong>Price:</strong> Does <strong>Free</strong> sound good?</p>
<p><strong>How To Join:</strong> <a href="https://learn.business-science.io/registration-chatgpt-2?el=website"><strong>👉 Register Here</strong></a></p>
<hr />
<h1 id="r-tips-weekly">R-Tips Weekly</h1>
<p>This article is part of R-Tips Weekly, a <a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">weekly video tutorial</a> that shows you step-by-step how to do common R coding tasks. Pretty cool, right?</p>
<p>Here are the links to get set up. 👇</p>
<ul>
<li><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Sign up for our R-Tips Newsletter and get the code.</a></li>
<!-- <li><a href="https://youtu.be/fkwKQi7skAw">YouTube Tutorial</a></li>-->
</ul>
<h1 id="my-dirty-little-secret-how-i-2x-ed-my-salary-in-3-years-using-time-series-analysis">My dirty little secret… How I 2x-ed my salary in 3 years using Time Series Analysis</h1>
<p>It was 2015. I was working for a manufacturer that supplied products for Oil and Gas. The company was struggling. The price of oil had dropped from $100 to $30 leading to the worst performance in over 2 decades.</p>
<p><img src="/assets/075_price_of_oil.jpg" alt="Oil Price and Product Sales" /></p>
<p>I was working on a project to forecast product sales when I stumbled upon something. I found that the sales for certain products were highly correlated with the 3-month lag of the price of oil.</p>
<p>With this information, I was able to forecast sales for the next 3 months with 95% accuracy. This was a game-changer for the company. We were able to forecast sales and adjust production to meet demand.</p>
<p><strong>Impact on my career:</strong> This led to 3 promotions in 3 years. I went from a Manager of Sales to the Director of Sales and Engineering leading 60+ person sales team when I left in 2018. But I had a secret…</p>
<p><strong>My dirty little secret:</strong> Behind the scenes I was using R and time series analysis to get ahead in my career. Specifically, I used autocorrelation and partial autocorrelation to find the signal. The same techniques that you’re learning today.</p>
<h1 id="what-is-time-series-analysis">What is Time Series Analysis?</h1>
<p>Time series analysis is a statistical technique that deals with time-ordered data points. It’s commonly used to analyze and interpret trends, patterns, and relationships within data that is recorded over time (e.g. with timestamps).</p>
<h2 id="uses-in-business">Uses in Business</h2>
<p>Understanding and applying time series analysis concepts is critical for <strong>forecasting, detecting anomalies, and drawing insights on data that varies over time.</strong></p>
<p><strong>Time series data is everywhere.</strong> Anything with a timestamp is a time series. Product sales, website traffic, stock prices, and weather data are all examples of time series data. It is used in many industries including finance, retail, marketing, and manufacturing.</p>
<p><strong>Time Series Analysis is important because it allows us to understand the past and predict the future.</strong> Time series analysis is used to understand the past and predict the future. It is used in many industries including finance, retail, marketing, and manufacturing.</p>
<h1 id="the-5-concepts-that-helped-me-the-most-in-my-career-and-how-to-do-them-in-r">The 5 Concepts that Helped Me the Most in My Career (and how to do them in <code class="language-plaintext highlighter-rouge">R</code>)</h1>
<p><img src="/assets/075_time_series_analysis.jpg" alt="Time Series Visualizations" /></p>
<p class="date text-center">The 5 Concepts that helped me the most</p>
<h2 id="r-code">R Code</h2>
<p><strong>Get The Code:</strong> You can follow along with the R code in the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter</a>. <strong>All code is avaliable in R-Tip 075.</strong></p>
<h2 id="1-visualizing-time-series-data">1. Visualizing Time Series Data</h2>
<p>Visualizing time series is the start of all of my time series analysis. This is the first step in understanding the data.</p>
<p><img src="/assets/075_time_series_visualization.jpg" alt="Time Series Visualizations" /></p>
<h4 id="r-code-to-make-this-plot"><code class="language-plaintext highlighter-rouge">R</code> code to make this plot:</h4>
<p>The main functions come from <code class="language-plaintext highlighter-rouge">timetk</code>. Full disclosure- I’m the author of <code class="language-plaintext highlighter-rouge">timetk</code>. I created <code class="language-plaintext highlighter-rouge">timetk</code> to make time series analysis easier.</p>
<p><img src="/assets/075_01_time_series_code.jpg" alt="Time Series Plot Code" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 075 Folder)</a></p>
<h2 id="time-series-is-noisy-finding-the-signal">Time Series is Noisy (Finding the Signal)</h2>
<p>Often, time series data is noisy. We can use smoothing to find the signal. LOESS smoothing is a technique that uses local regression to smooth out the noise.</p>
<p><img src="/assets/075_time_series_plot_smoother.jpg" alt="Time Series Smoothing" /></p>
<h4 id="r-code-to-make-visualization-2"><code class="language-plaintext highlighter-rouge">R</code> code to make Visualization 2:</h4>
<p>It’s the same function, but now we turn <code class="language-plaintext highlighter-rouge">.smooth = TRUE</code>. You can adjust the value of the smoother span to get different results.</p>
<p><img src="/assets/075_time_series_smoother_code.jpg" alt="Time Series Smoothing Code" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 075 Folder)</a></p>
<h2 id="3-autocorrelation-and-partial-autocorrelation">3. Autocorrelation and Partial Autocorrelation</h2>
<p><img src="/assets/075_autocorrelation.jpg" alt="Autocorrelation and Partial Autocorrelation" /></p>
<p><strong>Autocorrelation:</strong> This refers to the correlation of a time series with its own past and future values. It measures the relationship (correlation) between a variable’s current value and its past values.</p>
<p><strong>Partial Autocorrelation:</strong> Autocorrelation has a problem. Some of the correlation is confounded by earlier lags. Enter Partial Autocorrelation. This removes the correlation effect of earlier lags. We can see that Lag 1 and 6 are the most important for this time series.</p>
<h4 id="r-code-to-make-this-plot-1"><code class="language-plaintext highlighter-rouge">R</code> Code to make this plot:</h4>
<p><img src="/assets/075_autocorrelation_code.jpg" alt="Autocorrelation and Partial Autocorrelation Code" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 075 Folder)</a></p>
<h2 id="4-seasonal-decomposition">4. Seasonal Decomposition</h2>
<p><img src="/assets/075_seasonal_decomposition.jpg" alt="Seasonal Decomposition" /></p>
<p>Seasonal decomposition decomposes a time series into three components: <strong>trend, seasonal, and residual (irregular)</strong>. STL stands for Seasonal-Trend-Loess.</p>
<p><strong>It uses a “LOESS” smoother</strong> to remove seasonal and trend effects. STL is flexible and can handle any type of seasonality, not just fixed seasonal effects.</p>
<p><strong>The residuals</strong> can be analyzed for outliers since they have been de-trended and de-seasonalized.</p>
<h4 id="r-code-to-make-this-plot-2"><code class="language-plaintext highlighter-rouge">R</code> Code to make this plot:</h4>
<p><img src="/assets/075_seasonal_decomposition_code.jpg" alt="Seasonal Decomposition Code" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 075 Folder)</a></p>
<h2 id="5-calendar-effects">5. Calendar Effects</h2>
<p><img src="/assets/075_calendar_effects.jpg" alt="Calendar Effects" /></p>
<p>Calendar effects refer to variations in a time series that can be attributed to the calendar itself. This can include effects due to day of the week, month of the year, or holidays tied to the calendar.</p>
<h4 id="r-code-to-make-this-plot-3"><code class="language-plaintext highlighter-rouge">R</code> Code to make this plot:</h4>
<p><img src="/assets/075_calendar_effects_code.jpg" alt="Calendar Effects Code" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 075 Folder)</a></p>
<h1 id="conclusions">Conclusions:</h1>
<p>You’ve learned the 5 concepts that helped me the most in my career. And the best part is that you can do all of this in 5 lines of R code.</p>
<p>Here’s another little secret, I teach these concepts plus others in just Module 1 of 18 in my <a href="https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting?el=website">High-Performance Time Series Course</a>.</p>
<p><strong>However, there is A LOT more to becoming an expert in time series for your company.</strong></p>
<p>If you want to become a Time Series Expert for your company, then please read on…</p>
<h2 id="take-the-high-performance-forecasting-course">Take the High-Performance Forecasting Course</h2>
<blockquote>
<p>Become the forecasting expert for your organization</p>
</blockquote>
<p><a href="https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting?el=website" target="_blank"><img src="https://www.filepicker.io/api/file/bKyqVAi5Qi64sS05QYLk" alt="High-Performance Time Series Forecasting Course" width="100%" style="box-shadow: 0 0 5px 2px rgba(0, 0, 0, .5);" /></a></p>
<p><a href="https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting?el=website"><em>High-Performance Time Series
Course</em></a></p>
<h3 id="time-series-is-changing">Time Series is Changing</h3>
<p>Time series is changing. <strong>Businesses now need 10,000+ time series
forecasts every day.</strong> This is what I call a <em>High-Performance Time
Series Forecasting System (HPTSF)</em> - Accurate, Robust, and Scalable
Forecasting.</p>
<p><strong>High-Performance Forecasting Systems will save companies by improving
accuracy and scalability.</strong> Imagine what will happen to your career if
you can provide your organization a “High-Performance Time Series
Forecasting System” (HPTSF System).</p>
<h3 id="how-to-learn-high-performance-time-series-forecasting">How to Learn High-Performance Time Series Forecasting</h3>
<p>I teach how to build a HPTFS System in my <a href="https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting?el=website"><strong>High-Performance Time
Series Forecasting
Course</strong></a>.
You will learn:</p>
<ul>
<li><strong>Time Series Machine Learning</strong> (cutting-edge) with <code class="language-plaintext highlighter-rouge">Modeltime</code> - 30+
Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)</li>
<li><strong>Deep Learning</strong> with <code class="language-plaintext highlighter-rouge">GluonTS</code> (Competition Winners)</li>
<li><strong>Time Series Preprocessing</strong>, Noise Reduction, & Anomaly Detection</li>
<li><strong>Feature engineering</strong> using lagged variables & external regressors</li>
<li><strong>Hyperparameter Tuning</strong></li>
<li><strong>Time series cross-validation</strong></li>
<li><strong>Ensembling</strong> Multiple Machine Learning & Univariate Modeling
Techniques (Competition Winner)</li>
<li><strong>Scalable Forecasting</strong> - Forecast 1000+ time series in parallel</li>
<li>and more.</li>
</ul>
<p class="text-center" style="font-size:24px;">
Become the Time Series Expert for your organization.
</p>
<p><br /></p>
<p class="text-center" style="font-size:30px;">
<a href="https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting?el=website">Take
the High-Performance Time Series Forecasting Course</a>
</p>
R-BloggersLearn-RRR-TipstimetkCode-ToolsIntroduction to A/B Testing in R (For Marketing Analytics)2023-12-16 09:20:002023-12-16T09:20:00-05:00https://www.business-science.io/code-tools/2023/12/16/ab-testing-in-r<p>Hey guys, welcome back to my <a href="https://learn.business-science.io/r-tips-newsletter">R-tips newsletter</a>. In today’s R-Tip, I’m sharing how to do A/B Testing in R. Let’s go!</p>
<h3 id="table-of-contents">Table of Contents</h3>
<p>Here’s what you’re learning today:</p>
<ul>
<li><strong>What is A/B Testing (and how to pick the right Statistical Test)?</strong>: A/B Testing is a statistical method for comparing two groups to determine if there is a statistically significant difference between the two groups.</li>
<li><strong>Business Case</strong>: We’ll use a business case to demonstrate how to do A/B Testing in R by measuring the effect of Adspend on Hotel Bookings.</li>
<li><strong>R Code</strong>: We’ll walk step-by-step through how to perform A/B Testing in R.</li>
</ul>
<p><img src="/assets/073_statistical_test_selection.jpg" alt="Statistical Test Selection" /></p>
<p class="date text-center">Statistical Test Selection for A/B Testing!</p>
<hr />
<!--
# SPECIAL ANNOUNCEMENT: How To Become A <u>6-Figure Business Scientist</u> (Even In A Recession) on August 30th
![Business Scientist](/assets/business-science-cube-2.jpg)
**What:** How To Become A 6-Figure Business Scientist (Even In A Recession)
**When:** Wednesday August 30th, 2pm EST
**How It Will Help You:** Data science in 2023 has changed. *The 10+ person data science team is out.* And the one-person Business Scientist is in. I'll show you how to become a 1-person data science team inside [my LIVE 6-figure business scientist masterclass](https://learn.business-science.io/registration-2-page?el=website).
**Price:** Does **Free** sound good?
**How To Join:** [**👉 Register Here**](https://learn.business-science.io/registration-2-page?el=website)
-->
<h1 id="special-announcement-chatgpt-for-data-scientists-workshop-on-march-27th">SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on March 27th</h1>
<p><a href="https://learn.business-science.io/registration-chatgpt-2?el=website">Inside the workshop</a> I’ll share how I built a Machine Learning Powered Production Shiny App with <code class="language-plaintext highlighter-rouge">ChatGPT</code> (extends this data analysis to an <em>insane</em> production app):</p>
<p><img src="/assets/lab_82_chatgpt_rcode.jpg" alt="ChatGPT for Data Scientists" /></p>
<p><strong>What:</strong> ChatGPT for Data Scientists</p>
<p><strong>When:</strong> Wednesday March 27th, 2pm EST</p>
<p><strong>How It Will Help You:</strong> Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside <a href="https://learn.business-science.io/registration-chatgpt-2?el=website">my free chatgpt for data scientists workshop</a>.</p>
<p><strong>Price:</strong> Does <strong>Free</strong> sound good?</p>
<p><strong>How To Join:</strong> <a href="https://learn.business-science.io/registration-chatgpt-2?el=website"><strong>👉 Register Here</strong></a></p>
<hr />
<h1 id="r-tips-weekly">R-Tips Weekly</h1>
<p>This article is part of R-Tips Weekly, a <a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">weekly video tutorial</a> that shows you step-by-step how to do common R coding tasks. Pretty cool, right?</p>
<p>Here are the links to get set up. 👇</p>
<ul>
<li><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Sign up for our R-Tips Newsletter and get the code.</a></li>
<!-- <li><a href="https://youtu.be/fkwKQi7skAw">YouTube Tutorial</a></li>-->
</ul>
<h1 id="this-tutorial-is-part-of-a-1-hour-live-workshop-on-causal-inference-and-ab-testing">This Tutorial is Part of a 1-Hour Live Workshop on Causal Inference and A/B Testing</h1>
<p>If you want to understanding A/B Testing, Geo Experimentation, Uplift Modeling, and Causal Inference at a deeper level, <strong>check out this free video</strong>. 👇</p>
<iframe width="100%" height="450" src="https://www.youtube.com/embed/Otb340lyiAQ?si=OcoJ1vI0P-nK1yUe" title="YouTube video player" frameborder="1" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<h1 id="what-is-ab-testing">What is A/B Testing?</h1>
<p><strong>A/B Testing</strong> is a statistical method for comparing two groups to determine if there is a statistically significant difference between the two groups.</p>
<h2 id="how-is-ab-testing-used-in-marketing-analytics">How is A/B Testing used in Marketing Analytics?</h2>
<p><strong>A/B Testing is used commonly in Marketing Analytics</strong> to determine if a marketing campaign is effective:</p>
<ul>
<li>For example, a company may want to know if a marketing campaign is effective at <strong>driving sales</strong>.</li>
<li>To do this, they will <strong>run an A/B Test</strong> where they compare the sales of a group that was exposed to the marketing campaign (the treatment group) to the sales of a group that was not exposed to the marketing campaign (the control group).</li>
<li>If there is a <strong>statistically significant difference</strong> between the two groups and a <strong>positive average treatment effect (ATE)</strong>, then the company can conclude that the marketing campaign is effective at driving sales.</li>
<li>And we can estimate the <strong>Lift (the increase in sales)</strong> that the marketing campaign drove.</li>
</ul>
<h2 id="how-to-pick-the-right-statistical-test">How to pick the right Statistical Test?</h2>
<p>There are many different types of statistical tests that can be used for A/B Testing. The type of statistical test that you use depends on the type of data that you have.</p>
<p>The following diagram shows the <strong>different types of statistical tests</strong> that can be used for A/B Testing and the selection process.</p>
<p><img src="/assets/073_statistical_test_selection.jpg" alt="Statistical Test Selection" /></p>
<p class="date text-center">Statistical Test Selection for A/B Testing!</p>
<p>For our business case, we’ll rely on a very common test: <strong>The 2 sample T-Test</strong>, which is used to compare the means of two groups.</p>
<p>For other types of A/B Testing, you may need to use a different type of statistical test depending on the metric you are interested in (e.g. conversion metrics, counts of page views, etc). The table above can be used as a guide to help you select the right statistical test for your A/B Testing needs.</p>
<h2 id="how-to-create-an-experiment">How to create an experiment?</h2>
<p>To create an experiment, you need to have two groups of data: a treatment group and a control group.</p>
<ul>
<li><strong>The treatment group</strong> is the group that is exposed to the marketing campaign.</li>
<li><strong>The control group</strong> is the group that is not exposed to the marketing campaign.</li>
</ul>
<p>Now that we know what A/B Testing is and how it is used in Marketing Analytics, let’s look at an example of how to do A/B Testing in R.</p>
<h1 id="business-case-hotel-bookings-and-return-on-adspend">Business Case: Hotel Bookings and Return on Adspend</h1>
<p>In this example, you are part of the Data Science team working for an upscale hotel chain.</p>
<p><img src="/assets/073_hotel.jpg" alt="Statistical Test Selection" /></p>
<p><strong>Your Mission:</strong>
Your team has been tasked with developing an online experiment to use Google Ads to drive hotel bookings (the action of reserving a room at the hotel). We will use A/B Testing to determine if a marketing campaign is effective at driving hotel bookings.</p>
<h1 id="r-tutorial-ab-testing-in-r">R Tutorial: A/B Testing in R</h1>
<p><strong>Super Important:</strong> We’ll start by trying to answer these business questions that are relevant to our Hotel Bookings business case:</p>
<ol>
<li><strong>Does Adspend increase bookings?</strong></li>
<li><strong>By how much? Was there a Return on Adspend (ROAS)?</strong></li>
</ol>
<p>These questions drive our experiment setup and analysis (more on this in a minute).</p>
<p><strong>Get The Code:</strong> You can follow along with the R code in the <a href="https://learn.business-science.io/r-tips-newsletter?el=website">R-Tips Newsletter</a>. All code is avaliable in R-Tip 073.</p>
<h2 id="step-1-load-the-libraries-and-data">Step 1: Load the Libraries and Data</h2>
<p><img src="/assets/073_01_libraries.jpg" alt="A/B Testing: Load the Libraries and Data" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 073 Folder)</a></p>
<h3 id="experiment-setup-data-description">Experiment Setup (Data Description):</h3>
<p>When you load the data, it looks like this:</p>
<p><img src="/assets/073_01_data.jpg" alt="A/B Testing: Data" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Data (In the R-Tip 073 Folder)</a></p>
<p>The data contains the following columns:</p>
<ul>
<li>period = 0: Pre/Post Experiment, 1: During Experiment</li>
<li>assigment = “control” part of the control group, “treatment” part of the treatment group</li>
<li>treatment = 0: No Adspend, 1: Adspend</li>
<li>geo: Segmentation was performed by geography (this is common in marketing experiments to track pre and post experiment performance)</li>
<li>bookings: Target feature that we want to measure the effect of Adspend on</li>
<li>cost: Adspend (the amount of money spent on the marketing campaign during the experiment period = 1)</li>
</ul>
<h2 id="step-2-visualize-the-data">Step 2: Visualize the Data</h2>
<p>Next, we will visualize the aggregate bookings by period for the control and treatment group to see if we can spot any visual effect of the adspend.</p>
<ul>
<li><strong>The Pre-Intervention Period (Period = 0)</strong> is from 2015-01-05 to 2015-02-15</li>
<li><strong>The Post Intervention Period (Period = 1)</strong> is from 2015-02-16 to 2015-03-15 (This is when the experiment was run)</li>
</ul>
<h3 id="data-visualization-code">Data Visualization Code</h3>
<p>Run this code to visualize the experiment:</p>
<p><img src="/assets/073_02_exploratory_code.jpg" alt="A/B Testing: Visualize the Data" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 073 Folder)</a></p>
<h3 id="ab-testing-analyzing-the-experiment-visually">A/B Testing: Analyzing the Experiment Visually</h3>
<p>The output is the following plot:</p>
<p><img src="/assets/073_02_time_series_ab_test.jpg" alt="A/B Testing: Visualize the Data" /></p>
<p>We can see that it looks like there’s a slight bump in bookings during the experiment period for the treatment group (the group that was exposed to the marketing campaign). But:</p>
<ol>
<li>It’s hard to tell if this is a <strong>statistically significant effect or just random noise.</strong></li>
<li>It’s hard to tell if there was a <strong>return on adspend.</strong></li>
</ol>
<p>To answer these questions, we’ll need to run a statistical test.</p>
<h2 id="step-3-run-the-statistical-test">Step 3: Run the Statistical Test</h2>
<p>Next, we’ll run the statistical test to determine if there is a statistically significant difference between the control and treatment group.</p>
<h3 id="split-the-data-into-pre-and-experiment-periods">Split the data into pre and experiment periods</h3>
<p>We’ll just need the experiment period (period = 1) for the statistical test. So, we’ll split the data into pre and experiment periods. Run this code:</p>
<p><img src="/assets/073_03_split_the_data.jpg" alt="A/B Testing: Split the Data" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 073 Folder)</a></p>
<h3 id="ab-testing-run-the-statistical-test">A/B Testing: Run the Statistical Test</h3>
<p>Run this code to run the statistical test:</p>
<p><img src="/assets/073_03_2_sample_t_test_code.jpg" alt="A/B Testing: 2 Sample T-Test" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 073 Folder)</a></p>
<h3 id="ab-testing-2-sample-t-test-results">A/B Testing: 2 Sample T-Test Results</h3>
<p>The output is the following:</p>
<p><img src="/assets/073_03_results.jpg" alt="A/B Testing: 2 Sample T-Test Results" /></p>
<p>We can see that the:</p>
<ul>
<li>
<p><strong>estimated average treatment effect (ATE) is 96.2:</strong> This means that on average each of the geo-segments saw an increase of $96.20 per booking-day during the experiment period (the period when the marketing campaign was run). This is good news.</p>
</li>
<li>
<p><strong>p-value is 0.0545:</strong> Generally there is a 0.05 used as the cutoff. But this is a business decision. In this case, we see that the lower CI (confidence interval) around the ATE is -$1.87 and the upper CI is $194.00. So that gives me confidence that the ATE is likely positive.</p>
</li>
</ul>
<h3 id="what-could-we-be-missing">What could we be missing?</h3>
<p>Sometimes there are other factors that can affect the results of an experiment. In this case, we may be missing the effect of seasonality.</p>
<p><strong>For a more advanced tutorial on Uplift Modeling,</strong> <a href="https://www.youtube.com/watch?v=Otb340lyiAQ&t=1942s">See Part 2 of this video</a> where I discuss how to use Meta (Facebook) <code class="language-plaintext highlighter-rouge">GeoLift</code> package on this problem.</p>
<h3 id="return-on-adspend-roas">Return on Adspend (ROAS)</h3>
<p>We have answered the first question- Is there an effect? Yes, there is a statistically significant effect. At a 0.10 level, we can say that there is a statistically significant effect. <strong>The Average Treatement Effect is $96.20.</strong></p>
<p>But, we still need to answer the second question: <strong>Was there a return on adspend (ROAS)?</strong></p>
<p>To answer this question, we need to calculate the ROAS.</p>
<h3 id="ab-testing-calculate-the-roas">A/B Testing: Calculate the ROAS</h3>
<p>Run this code to calculate the ROAS:</p>
<p><img src="/assets/073_03_roas.jpg" alt="A/B Testing: Calculate the ROAS" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 073 Folder)</a></p>
<h3 id="ab-testing-roas-results">A/B Testing: ROAS Results</h3>
<p>The output is the following:</p>
<p><img src="/assets/073_03_roas_results.jpg" alt="A/B Testing: ROAS Results" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code (In the R-Tip 073 Folder)</a></p>
<p>We can see that the <strong>Estimated ROAS is 2.67</strong>. This means that for every dollar spent on the marketing campaign, we get $2.67 back in bookings.</p>
<h1 id="conclusions">Conclusions:</h1>
<p>We have answered the two questions that we set out to answer:</p>
<ol>
<li><strong>Does Adspend increase bookings?</strong> Yes, there is a statistically significant effect. At a 0.10 level, we can say that there is a statistically significant effect. <strong>The Average Treatement Effect is $96.20.</strong></li>
<li><strong>By how much? Was there a Return on Adspend (ROAS)?</strong> Yes, there was a return on adspend. <strong>The Estimated ROAS is 2.67.</strong> This means that for every dollar spent on the marketing campaign, we get $2.67 back in bookings.</li>
</ol>
<p><strong>However, there is A LOT more to becoming a Data Scientist for Business than just A/B Testing.</strong></p>
<p>If you are struggling to become a Data Scientist for Business, then please read on…</p>
<h1 id="struggling-to-become-a-data-scientist">Struggling to become a data scientist?</h1>
<p>You know the feeling. Being unhappy with your current job.</p>
<p>Promotions aren’t happening. You’re stuck. Feeling Hopeless. Confused…</p>
<p>And you’re praying that the next job interview will go better than the last 12…</p>
<p>… But you know it won’t. Not unless you take control of your career.</p>
<p>The good news is…</p>
<h1 id="i-can-help-you-speed-it-up">I Can Help You Speed It Up.</h1>
<p>I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.</p>
<p>I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.</p>
<p>And I built a training program that gets my students life-changing data science careers (don’t believe me? <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series/">see my testimonials here</a>):</p>
<h4 class="text-center">
6-Figure Data Science Job at CVS Health ($125K)<br /><div style="height:10px;"></div>
Senior VP Of Analytics At JP Morgan ($200K)<br /><div style="height:10px;"></div>
50%+ Raises & Promotions ($150K)<br /><div style="height:10px;"></div>
Lead Data Scientist at Northwestern Mutual ($175K)<br /><div style="height:10px;"></div>
2X-ed Salary (From $60K to $120K)<br /><div style="height:10px;"></div>
2 Competing ML Job Offers ($150K)<br /><div style="height:10px;"></div>
Promotion to Lead Data Scientist ($175K)<br /><div style="height:10px;"></div>
Data Scientist Job at Verizon ($125K+)<br /><div style="height:10px;"></div>
Data Scientist Job at CitiBank ($100K + Bonus)<br /><div style="height:10px;"></div>
</h4>
<h1 id="whenever-you-are-ready-heres-the-system-they-are-taking">Whenever you are ready, here’s the system they are taking:</h1>
<p><a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">Here’s the system</a> that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…</p>
<p><img src="/assets/rtrack_what_theyre_doing_2.jpg" alt="What They're Doing - 5 Course R-Track" /></p>
<p style="font-size: 36px;text-align: center;">
<a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">
<strong>Join My 5-Course R-Track Program Now!</strong><br /><small style="font-size:24px;">(And Become The Data Scientist You Were Meant To Be...)</small>
</a>
</p>
<p>P.S. - Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">This could be you.</a></p>
<p><img src="/img/success_samantha_got_job.jpg" alt="Success Samantha Got The Job" /></p>
R-BloggersLearn-RRR-TipsinferCode-ToolsData Engineering in R: How to Build Your First Data Pipeline with R, Mage, and Google Cloud Platform (in under 45 Minutes)2023-12-02 11:00:002023-12-02T11:00:00-05:00https://www.business-science.io/code-tools/2023/12/02/how-to-use-r-in-production-mage-ai-google-cloud<p>Hey guys, welcome back to my <a href="https://learn.business-science.io/r-tips-newsletter">R-tips newsletter</a>. In today’s R-Tip, <a href="https://www.linkedin.com/in/arben-kqiku-301457117/">Arben Kqiku</a> is sharing his <strong>exact 8-step framework</strong> for taking R into production for Digital Analytics projects. You’ll learn how to use R, Mage.ai, and Google Cloud Platform (GCP) to build your first data engineering pipeline <strong>in under 45 minutes.</strong></p>
<h3 id="about-the-author">About the Author</h3>
<p><a href="https://www.linkedin.com/in/arben-kqiku-301457117/">Arben</a> is a digital analytics and Google Cloud Platform (GCP) expert. He’s also a Business Science University student. In this post, Arben shares how to use R in production, with Mage.ai and Google Cloud.</p>
<p>This article was originally published on <a href="https://www.simoahava.com/analytics/join-ga4-google-ads-data-in-google-bigquery/">Simo Ahava’s website</a>, which is focused on aspiring Digital Analytics Professionals. We’ve republished it here with permission to help spread the word of R in production with new tools including Mage.ai and Google Cloud Platform.</p>
<p>Let’s dive in!</p>
<h3 id="table-of-contents">Table of Contents</h3>
<p>Here’s what you’re learning today:</p>
<ul>
<li>
<p><strong><em>The Problem:</em></strong> We’ll cover a case study from a recent problem Arben had in Multi-Touch Campaign Attribution.</p>
</li>
<li>
<p><strong><em>The Solution: Arben’s 8-Step Framework:</em></strong> Arben’s sharing his exact process for how he sets up production R data engineering pipelines on GCP with R and Mage.ai (perfect if it’s your first time).</p>
</li>
<li>
<p><strong><em>Full Code Demo:</em> EXACTLY HOW TO BUILD YOUR FIRST DATA SCIENCE PIPELINE (IN UNDER 45 minutes).</strong></p>
</li>
</ul>
<h3 id="what-you-make-today">What You Make Today:</h3>
<p>Below you can see an architectural overview of what we’ll build today.</p>
<p><img src="/assets/r_mage_gcp_workflow.jpg" alt="Data Engineering Workflow" /></p>
<p class="date text-center">What You Make Today!</p>
<h3 id="the-8-step-framework-to-accomplish-this">The 8-Step Framework to Accomplish This:</h3>
<p>Here’s the 8-step framework that Arben will walk you through today:</p>
<p><img src="/assets/r_mage_gcp_8_step_framework.jpg" alt="8-Step Framework" /></p>
<p class="date text-center">The 8 steps you follow</p>
<h3 id="the-8-things-youll-learn-in-this-tutorial">The 8 Things You’ll learn in this tutorial:</h3>
<ol>
<li>
<p><strong>How to create a Google Cloud project.</strong></p>
</li>
<li>
<p>How to set up a virtual machine.</p>
</li>
<li>
<p><strong>How to access your virtual machine remotely.</strong></p>
</li>
<li>
<p>How to install Mage.ai on the virtual machine to handle the automation.</p>
</li>
<li>
<p><strong>How to retrieve data from the GA4 API in a production environment.</strong></p>
</li>
<li>
<p>How to retrieve data from the Google Ads API in a production environment.</p>
</li>
<li>
<p><strong>How to export data to Google BigQuery in a production environment.</strong></p>
</li>
<li>
<p>How to schedule a data pipeline that automatically updates every 5 minutes.</p>
</li>
</ol>
<hr />
<!--
# SPECIAL ANNOUNCEMENT: How To Become A <u>6-Figure Business Scientist</u> (Even In A Recession) on August 30th
![Business Scientist](/assets/business-science-cube-2.jpg)
**What:** How To Become A 6-Figure Business Scientist (Even In A Recession)
**When:** Wednesday August 30th, 2pm EST
**How It Will Help You:** Data science in 2023 has changed. *The 10+ person data science team is out.* And the one-person Business Scientist is in. I'll show you how to become a 1-person data science team inside [my LIVE 6-figure business scientist masterclass](https://learn.business-science.io/registration-2-page?el=website).
**Price:** Does **Free** sound good?
**How To Join:** [**👉 Register Here**](https://learn.business-science.io/registration-2-page?el=website)
-->
<h1 id="special-announcement-chatgpt-for-data-scientists-workshop-on-march-27th">SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on March 27th</h1>
<p><a href="https://learn.business-science.io/registration-chatgpt-2?el=website">Inside the workshop</a> I’ll share how I built a Machine Learning Powered Production Shiny App with <code class="language-plaintext highlighter-rouge">ChatGPT</code> (extends this data analysis to an <em>insane</em> production app):</p>
<p><img src="/assets/lab_82_chatgpt_rcode.jpg" alt="ChatGPT for Data Scientists" /></p>
<p><strong>What:</strong> ChatGPT for Data Scientists</p>
<p><strong>When:</strong> Wednesday March 27th, 2pm EST</p>
<p><strong>How It Will Help You:</strong> Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside <a href="https://learn.business-science.io/registration-chatgpt-2?el=website">my free chatgpt for data scientists workshop</a>.</p>
<p><strong>Price:</strong> Does <strong>Free</strong> sound good?</p>
<p><strong>How To Join:</strong> <a href="https://learn.business-science.io/registration-chatgpt-2?el=website"><strong>👉 Register Here</strong></a></p>
<hr />
<h1 id="r-tips-weekly">R-Tips Weekly</h1>
<p>This article is part of R-Tips Weekly, a <a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">weekly tutorial</a> that shows you step-by-step how to do common R coding tasks. Pretty cool, right?</p>
<p>Here are the links to get set up. 👇</p>
<ul>
<li><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Get the Code</a></li>
<!-- <li><a href="https://youtu.be/fkwKQi7skAw">YouTube Tutorial</a></li>-->
</ul>
<h1 id="the-problem-multi-touch-campaign-attribution-in-digital-analytics">The Problem: Multi-Touch Campaign Attribution in Digital Analytics</h1>
<p>As a digital analyst, I often need to combine data from different sources and display it in a dashboard. This is especially true when I’m working with Google Analytics 4 (GA4) and Google Ads for Campaign Attribution.</p>
<h2 id="case-study-digital-analytics-and-multi-touch-campaign-attribution">Case Study: Digital Analytics and Multi-Touch Campaign Attribution</h2>
<p>For instance, clients run campaigns on platforms like Google Ads and Meta Ads, and <strong>they want to understand the impact of each channel or even individual campaigns.</strong></p>
<p>To address this, we usually:</p>
<ol>
<li>Use <strong>conversion data</strong> from a third-party source, like Google Analytics, and</li>
<li>Combine it with other data such as impressions, clicks, and cost from the advertising channels.</li>
</ol>
<p>This helps us <strong>calculate the cost per conversion</strong> for each channel more accurately.</p>
<h2 id="building-the-multi-touch-attribution-data-engineering-pipeline">Building the Multi-Touch Attribution Data Engineering Pipeline</h2>
<p>To build a data engineering pipeline, we need to factor in:</p>
<ol>
<li>
<p><strong>Accessibility:</strong> Make sure we can easily get data from different sources, such as Google Ads, Meta Ads, and GA4.</p>
</li>
<li>
<p><strong>Data integration:</strong> Combine data from different sources accurately.</p>
</li>
<li>
<p><strong>Storage:</strong> Create a data warehouse in Google BigQuery for the joined data and make it accessible to data visualization tools.</p>
</li>
<li>
<p><strong>Maintenance:</strong> Find a way to automate these steps without needing manual intervention. That way stakeholders will have access to almost real-time data.</p>
</li>
</ol>
<h2 id="our-tech-stack-r-mageai-google-cloud-platform-and-vscode-ide">Our Tech Stack: R, Mage.ai, Google Cloud Platform, and VSCode IDE</h2>
<p><img src="/assets/r_mage_gcp_tool_integration.jpg" alt="How the Tools Integrate" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Register for R-Tips Newsletter Here</a></p>
<p>To build this pipeline, we’ll use:</p>
<ol>
<li>R: To retrieve data from the APIs and combine it.</li>
<li>Mage.ai: To automate the Extract Transform Load (ETL) process.</li>
<li>Google Cloud Platform (GCP): To store the data and make it accessible to data visualization tools.</li>
<li>VSCode IDE: To access the virtual machine remotely.</li>
</ol>
<h3 id="1-r-to-retrieve-data-from-the-apis-and-combine-it">1. R: To retrieve data from the APIs and combine it</h3>
<p><img src="/assets/069_r_logo_board.png" alt="Rstudio" /></p>
<p>If you are new to R:</p>
<ul>
<li>Install R here: <a href="https://www.r-project.org/">https://www.r-project.org/</a></li>
<li>Access to 20,000+ of open source R packages here: <a href="https://cran.r-project.org/">https://cran.r-project.org/</a></li>
</ul>
<p>Packages we’ll use today:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">tidyverse</code>: To work with data and make the data pipeline.</li>
<li><code class="language-plaintext highlighter-rouge">googleAnalyticsR</code>: To retrieve data from the GA4 API.</li>
<li><code class="language-plaintext highlighter-rouge">rgoogleads</code>: To retrieve data from the Google Ads API.</li>
<li><code class="language-plaintext highlighter-rouge">bigrquery</code>: To export data to Google BigQuery.</li>
<li><code class="language-plaintext highlighter-rouge">gargle</code>: For Google authentication.</li>
</ul>
<h3 id="2-mageai-to-automate-the-extract-transform-load-etl-process">2. Mage.ai: To automate the Extract Transform Load (ETL) process</h3>
<p>I love R and I am so thankful that <a href="https://www.linkedin.com/in/dangtommy/">Tommy Dang</a> and his team included it in Mage.</p>
<p><img src="/assets/r_mage_gcp_mage_ai.jpg" alt="Mage AI" /></p>
<p class="text-center date">Mage AI</p>
<p>If you are new to Mage:</p>
<ul>
<li>Mage.ai is a tool that helps you automate the ETL process. It’s a great tool for data scientists who want to automate their data engineering pipelines.</li>
<li>Mage.ai: <a href="https://mage.ai/">https://mage.ai/</a></li>
</ul>
<p>The screenshot below comes from Mage. Mage is a data engineering tool that allows you to build your ETL (extract, transform, and load) pipelines. What I love about Mage is that it is easy to use, you can visualize your data pipelines and it supports multiple programming languages: SQL, Python.. and R!</p>
<p><img src="/assets/r_mage_gcp_mage-ai-example.jpg" alt="Mage.ai" /></p>
<p>In addition to building our pipeline, we’ll use Mage to <strong>schedule your pipelines</strong>, as you can see in the example below.</p>
<p><img src="/assets/r_mage_gcp_mage-schedule.jpg" alt="Mage Schedule" /></p>
<p class="text-center date"><a href="https://learn.business-science.io/r-tips-newsletter?el=website" target="_blank">Register for R-Tips Newsletter Here</a></p>
<h3 id="3-google-cloud-platform-gcp-to-store-the-data-and-make-it-accessible-to-data-visualization-tools">3. Google Cloud Platform (GCP): To store the data and make it accessible to data visualization tools.</h3>
<p>You can run Mage on your local machine or in the cloud.</p>
<p>Obviously, if you run it locally, your computer needs to be on all the time, which is not ideal. Therefore, we’ll create a virtual machine (VM) on the Google Cloud Platform and run Mage from there.</p>
<p>A virtual machine (VM) on GCP is like a computer in the cloud. It’s not a physical machine you can touch; instead, it’s a powerful, remote computer that you can use to run your software and store your data.</p>
<p><img src="/assets/r_mage_gcp_google_cloud_platform.jpg" alt="Google Cloud Platform" /></p>
<p class="text-center date">Google Cloud Platform (GCP)</p>
<p>If you are new to Google Cloud Platform (GCP):</p>
<ul>
<li>Google Cloud Platform (GCP) is a cloud computing platform that allows you to store data and make it accessible to data visualization tools.</li>
<li>You’ll need to create a Google Cloud account to use GCP.</li>
<li>Google Cloud Platform: <a href="https://cloud.google.com/">https://cloud.google.com/</a></li>
</ul>
<p>To use GCP, you need a payment method. But worry not, as of today, If you have never used GCP, <strong>you get a credit of $300</strong>. So, go to the Google Cloud Console and create an account: <a href="https://console.cloud.google.com/welcome">https://console.cloud.google.com/welcome</a>.</p>
<p><img src="/assets/r_mage_gcp_google-cloud-credits.jpg" alt="$300 Credits" /></p>
<p>Once you have used your free credits, you need to add a credit card to your account, by going under “BILLING”:</p>
<p><img src="/assets/r_mage_gcp_billing.jpg" alt="Billing" /></p>
<h3 id="4-vscode-ide-to-access-the-virtual-machine-remotely">4. VSCode IDE: To access the virtual machine remotely</h3>
<p>To access the virtual machine from our computer, we’ll use Visual Studio Code, which is a lovely, free code editor that supports many programming languages.</p>
<p><img src="/assets/r_mage_gcp_vscode_ide.jpg" alt="VSCode" /></p>
<p class="text-center date">VSCode IDE</p>
<p>If you are new to VSCode IDE:</p>
<ul>
<li>VSCode IDE is a free code editor that supports many programming languages including R, Python, C++ and has extensions for tools like Remote SSH (covered in this tutorial).</li>
<li>Install the VSCode IDE here: <a href="https://code.visualstudio.com/">https://code.visualstudio.com/</a></li>
</ul>
<h1 id="the-solution-arbens-8-step-framework-for-data-engineering-in-r-with-mage-and-gcp-in-under-45-minutes">The Solution: Arben’s 8-Step Framework for Data Engineering in R with Mage and GCP (in under 45 minutes)</h1>
<p><img src="/assets/r_mage_gcp_8_step_framework.jpg" alt="8-Step Framework" /></p>
<p class="date text-center">The 8 steps you follow</p>
<p>Now for my <strong>8-step framework</strong> for building a data engineering pipeline in R with Mage.ai and GCP.</p>
<ul>
<li>These are the steps I follow when I’m building a data engineering pipeline for a client.</li>
<li>Once you are familiar with my framework, you can build your own data engineering pipelines <strong>in under 45 minutes.</strong></li>
</ul>
<p><strong>Heads up, this is a comprehensive tutorial.</strong> This is because I wanted to build the training I wish I had when I solved this problem for the first time. I hope you enjoy it!</p>
<h2 id="step-1-how-to-create-a-google-cloud-project">Step 1: How to create a Google Cloud project</h2>
<p>In order to use GCP, we need a project. Later, everything that we’ll do will be within this project.</p>
<p>So, go back to https://console.cloud.google.com/welcome and create a new project by first clicking on the project selector in the top left.</p>
<p><img src="/assets/r_mage_gcp_project-selector.jpg" alt="Project Selector" /></p>
<p>Then click on “NEW PROJECT”:</p>
<p><img src="/assets/r_mage_gcp_new-project.jpg" alt="New Project" /></p>
<p>Next, name your project. I called my project <code class="language-plaintext highlighter-rouge">mage-ai-test</code>.</p>
<p><img src="/assets/r_mage_gcp_new-cloud-project.jpg" alt="Project Name" /></p>
<p>Finally, click on “CREATE”. Then simply wait until your project is created. Once you have selected your project, type “vm instances” in the search bar, and select “VM instances”.</p>
<p><img src="/assets/r_mage_gcp_vm-instances.jpg" alt="VM Instances" /></p>
<p>This will lead to the following screen:</p>
<p><img src="/assets/r_gcp_mage_compute-engine-api.jpg" alt="Compute Engine" /></p>
<h2 id="step-2-how-to-set-up-a-virtual-machine">Step 2: How to set up a virtual machine</h2>
<p>There are 4 sub-steps:</p>
<ol>
<li>Activate the Compute Engine API’s features</li>
<li>Set up SSH keys</li>
<li>Create a virtual machine</li>
<li>Connect to the virtual machine via SSH</li>
</ol>
<h3 id="step-21-activate-the-compute-engine-apis-features">Step 2.1: Activate the Compute Engine API’s features</h3>
<p>On GCP, to use specific features, you must activate the corresponding APIs:</p>
<ul>
<li>For example, we’ll enable the Google Analytics API later to get data from GA4.</li>
<li>To make a virtual machine, we need to enable the Compute Engine API.</li>
<li>Afterward, you’ll see this screen, but we won’t create a VM instance just yet…</li>
</ul>
<p><img src="/assets/r_mage_gcp_create-vm-instance.jpg" alt="VM Instance" /></p>
<h3 id="step-22-set-up-ssh-keys">Step 2.2: Set up SSH keys</h3>
<p>Next, we need to create SSH keys that will allow us to access our virtual machine from our computer.</p>
<p>SSH keys are like special keys that help your computer talk securely to another computer, such as a virtual machine.</p>
<p>It’s a way for your computer to prove it’s really you when connecting to the virtual machine. It’s like having a secret handshake between your computer and the virtual machine, making sure they can trust each other without needing to type in a password every time.</p>
<h4 id="create-ssh-and-public-keys">Create SSH and Public Keys</h4>
<p>We need to create two SSH keys, a private and a public key. Think of SSH keys like a pair of magic keys for your online accounts. You have one key that you keep secret (the private key) and another key that you share with others (the public key).</p>
<ol>
<li><strong>Private Key (Secret Key):</strong> This is like the key to your front door that only you have. You keep it safe on your computer, and it’s a secret. It’s used to unlock and access your accounts securely.</li>
<li><strong>Public Key (Shared Key):</strong> This is like a lock that matches your private key.</li>
</ol>
<p>When you connect to a server or service, you use your private key to prove you are who you say you are. The server then checks this with your public key to make sure it’s really you. This way, even if someone gets your public key, they can’t do anything without the private key, which stays safe on your computer. It’s a bit like having a special lock and key where only your key can open it.</p>
<p>To create your keys, hop to the terminal in your local machine and type the following code:</p>
<pre><code class="language-{bash}">ssh-keygen -t rsa -f ~/.ssh/mage-ai-test -C arbenkqiku
</code></pre>
<p>The end of the code should be your username, in my case <code class="language-plaintext highlighter-rouge">arbenkqiku</code>. If you don’t know your user name, type <code class="language-plaintext highlighter-rouge">whoami</code> in the terminal and press enter. This will output your username.</p>
<p>Once you enter the code mentioned above, you’ll be prompted to insert your computer’s password, if you have any. Once you add your password, your SSH keys will be created.</p>
<p><img src="/assets/r_mage_gcp_create-ssh-key.jpg" alt="SSH Keys" /></p>
<p>Now, go to the directory where your SSH keys can be found. <code class="language-plaintext highlighter-rouge">cd</code> stands for “change directory”:</p>
<pre><code class="language-{bash}">cd ~/.ssh
</code></pre>
<p>This is where your public private and public SSH keys are located.</p>
<p>Now, type the following code to display the content of your public SSH key in the terminal.</p>
<pre><code class="language-{bash}">cat mage-ai-test.pub
</code></pre>
<p>This will show the content of your public SSH key that we will later paste into our VM.</p>
<p><img src="/assets/r_mage_gcp_public-key.jpg" alt="Public SSH Key" /></p>
<h3 id="step-23-create-a-virtual-machine">Step 2.3: Create a virtual machine</h3>
<p>Now, let’s go back to Google Cloud Platform and click on “CREATE INSTANCE” in the VM instances overview.</p>
<p><img src="/assets/r_mage_gcp_create-new-vm-instance.jpg" alt="Create Instance" /></p>
<p>Give a name to the VM instance and select the region closest to you:</p>
<p><img src="/assets/r_mage_gcp_name-and-region-of-vm-instance.jpg" alt="VM Instance Name" /></p>
<p>Go to the “Boot disk” section and click on “CHANGE”:</p>
<p><img src="/assets/r_mage_gcp_change-boot-disk.jpg" alt="Boot Disk" /></p>
<p>Select the following options:</p>
<p><img src="/assets/r_gcp_mage_advanced-boot-disk-options.jpg" alt="Boot Disk Options" /></p>
<p>Under Firewall, select the following options:</p>
<p><img src="/assets/r_gcp_mage_firewall-options.jpg" alt="Firewall Options" /></p>
<p>This is important, as otherwise we won’t be able to access Mage by using the IP address of our VM, you’ll understand later what I mean by this.</p>
<p>Under Advanced Options > Security, click on “ADD ITEM”. Here is where we’ll add our <strong>public SSH key</strong>.</p>
<p><img src="/assets/r_gcp_mage_add-public-key-to-vm.jpg" alt="Add SSH Key" /></p>
<p>Copy the entire SSH public key and paste it.</p>
<p><img src="/assets/r_gcp_mage_paste-ssh-key.jpg" alt="Paste SSH Key" /></p>
<p>Finally, click on “CREATE”. It may take some time to create the VM.</p>
<p>Once done, your new VM will appear here. Also, you’ll see that your VM will have an “External IP”.</p>
<p><img src="/assets/r_gcp_mage_vm-external-ip.jpg" alt="External IP" /></p>
<p>You can use this “External IP” and your SSH private key to connect to this VM. Let’s do this!</p>
<h2 id="step-3-how-to-access-your-virtual-machine-remotely">Step 3: How to access your virtual machine remotely</h2>
<p>Step 3 has 2 sub-steps:</p>
<ol>
<li>How to connect to your VM via SSH</li>
<li>How to connect via VSCode IDE (using Remote - SSH extension)</li>
</ol>
<h3 id="step-31-how-to-connect-to-your-vm-via-ssh">Step 3.1: How to connect to your VM via SSH</h3>
<p>Go back to the terminal in your local machine and go to the directory where the SSH keys are located:</p>
<pre><code class="language-{bash}">cd ~/.ssh
</code></pre>
<p>Next, type this command:</p>
<pre><code class="language-{bash}">ssh -i mage-ai-test arbenkqiku@34.65.231.180
</code></pre>
<p>I’ll break it down to you so you know what to replace:</p>
<pre><code class="language-{bash}">ssh -i name_of_private_key user_name@gcp_vm_instance_external_ip
</code></pre>
<p>You’ll likely will be prompted to enter your password again, and also to add the “External IP” as a host. Just follow the instructions and you should be able to connect to your VM.</p>
<p>As you can see from the image below, we connected to the VM named <code class="language-plaintext highlighter-rouge">mage-demo-test</code>. And if you recall, in “Boot disk” options, we selected Ubuntu as our operating system.</p>
<p><img src="/assets/r_mage_gcp_ubuntu-vm-remote.jpg" alt="SSH Connection" /></p>
<h3 id="step-32-how-to-connect-via-vscode-ide-using-remote---ssh-extension">Step 3.2: How to connect via VSCode IDE (using Remote - SSH extension)</h3>
<p>We could do this whole process through the terminal, but it is much more user-friendly to do it through Visual Studio Code.</p>
<p>Visual Studio Code is a very powerful code editor. Go to this link: <a href="https://code.visualstudio.com/download">https://code.visualstudio.com/download</a>, and download Visual Studio Code.</p>
<p>Once you have installed it, go to “Extensions” and install “Remote - SSH”.</p>
<p><img src="/assets/r_mage_gcp_remote-ssh-code-extension.jpg" alt="Remote SSH" /></p>
<p>In Visual Studio Code, go the the search bar and type >, and then select the following option:</p>
<p><img src="/assets/r_mage_gcp_open-ssh-config.jpg" alt="Remote SSH Config" /></p>
<p>In the configuration file that will open, you need to enter your details. Essentially, we’re providing the details to connect to our VM.</p>
<pre><code class="language-{bash}">Host mage-demo-test # Give a name to your host
HostName 34.65.231.180 # Replace with the External IP address in GCP
User arbenkqiku # Replace this with your user name
IdentityFile /Users/arbenkqiku/.ssh/mage-ai-test # Path to private SSH key
</code></pre>
<p>Now, we still have to go back to the terminal one last time and type this:</p>
<pre><code class="language-{bash}">eval $(ssh-agent)
ssh-add /Users/arbenkqiku/.ssh/mage-ai-test # Path to private SSH key
</code></pre>
<p>Then, type your password when prompted. This basically means that you can use your password when you try to access the VM through Visual Studio Code.</p>
<p><img src="/assets/r_mage_gcp_ssh-add-command.jpg" alt="SSH Agent" /></p>
<p>Now, go back to the search bar of Visual Studio Code, type > and select the following option:</p>
<p><img src="/assets/r_mage_gcp_code-connect-to-ssh-host.jpg" alt="Remote SSH Connect" /></p>
<p>It should suggest the host that you just created, click on that host:</p>
<p><img src="/assets/r_gcp_mage_choose-ssh-host.jpg" alt="Choose Host" /></p>
<p>Then, you’ll be prompted to enter your password. Once you enter your password, you’ll be connected to your VM.</p>
<p><img src="/assets/r_mage_gcp_ssh-passphrase.jpg" alt="Password" /></p>
<p>Now, click on the “Remote Explorer” icon, and it should show that you connected to your VM:</p>
<p><img src="/assets/r_mage_gcp_remote-explorer-vm.jpg" alt="Remote Explorer" /></p>
<p>On the top right, click this icon to display the terminal below:</p>
<p><img src="/assets/r_mage_gcp_display-terminal-icon.jpg" alt="Terminal Below" /></p>
<p>Now click on “TERMINAL”. Congratulations, you have accessed your VM through Visual Studio Code!</p>
<p><img src="/assets/r_mage_gcp_access-terminal-success.jpg" alt="Terminal" /></p>
<h2 id="step-4-how-to-install-mageai-on-the-virtual-machine-to-handle-the-automation">Step 4: How to install Mage.ai on the virtual machine to handle the automation</h2>
<p>To install mage on GCP, I largely followed <a href="https://www.youtube.com/watch?v=C0fNc8ZOpSI&t=696s&ab_channel=DataSlinger">this tutorial</a>, but I will also explain every step here.</p>
<p>Ther are mainly 3 sub-steps:</p>
<ol>
<li>Create the folder for Mage</li>
<li>Install <code class="language-plaintext highlighter-rouge">Docker</code></li>
<li>Install <code class="language-plaintext highlighter-rouge">Mage</code></li>
<li>Access <code class="language-plaintext highlighter-rouge">Mage</code> through the External IP from GCP</li>
</ol>
<h3 id="step-41-create-the-folder-for-mage">Step 4.1: Create the folder for Mage</h3>
<p>First of all, let’s create a directory in our VM for mage:</p>
<pre><code class="language-{bash}">mkdir mage-demo
</code></pre>
<p>Now, if you type the following code, you should be able to see the newly created folder:</p>
<pre><code class="language-{bash}">ls
</code></pre>
<p>Then, let’s access the folder:</p>
<pre><code class="language-{bash}">cd mage-demo
</code></pre>
<h3 id="step-42-install-docker">Step 4.2: Install <code class="language-plaintext highlighter-rouge">Docker</code></h3>
<p>Now, to install mage, we need to first install <code class="language-plaintext highlighter-rouge">Docker</code>.</p>
<p>Docker is a platform for developing, shipping, and running applications. It uses containerization technology to package an application and its dependencies together into a single unit called a “container”.</p>
<p>In the <code class="language-plaintext highlighter-rouge">mage-demo</code> folder, let’s download a GitHub repo that contains the installation for Docker:</p>
<pre><code class="language-{bash}">git clone https://github.com/MichaelShoemaker/DockerComposeInstall.git
</code></pre>
<p>Let’s access the folder that contains the Docker installation:</p>
<pre><code class="language-{bash}">cd DockerComposeInstall
</code></pre>
<p>Let’s modify the file to make it executable:</p>
<pre><code class="language-{bash}">chmod +x InstallDocker
</code></pre>
<p>Then, let’s run it:</p>
<pre><code class="language-{bash}">./InstallDocker
</code></pre>
<p>Type this to verify that Docker has been installed correctly:</p>
<pre><code class="language-{bash}">docker run hello-world
</code></pre>
<p>This should show the following message:</p>
<p><img src="/assets/r_mage_gcp_hello-docker.jpg" alt="Docker Hello World" /></p>
<h3 id="step-43-install-mage">Step 4.3: Install <code class="language-plaintext highlighter-rouge">Mage</code></h3>
<p>Now, let’s go back to the initial directory:</p>
<pre><code class="language-{bash}">cd mage-demo
</code></pre>
<p>Now, we can finally install mage with this command:</p>
<pre><code class="language-{bash}">docker run -it -p 6789:6789 -v $(pwd):/home/src --restart always mageai/mageai /app/run_app.sh mage start mage-ai-test
</code></pre>
<p>With the command <code class="language-plaintext highlighter-rouge">--restart always</code>, we’re asking the VM to always restart mage whenever the VM is shut down and later restarted.</p>
<p>At the end, <code class="language-plaintext highlighter-rouge">mage-ai-test</code> represents the name of our project.</p>
<h3 id="step-44-access-mage-through-the-external-ip-from-gcp">Step 4.4: Access <code class="language-plaintext highlighter-rouge">Mage</code> through the External IP from GCP</h3>
<p>Now, to access mage through our External IP from GCP, we need to hop back on GCP first, as we need to create a <strong>firewall rule</strong>.</p>
<p>This is necessary to control and regulate incoming and outgoing traffic to and from your VM on Google Cloud Platform. When you want to access mage through your External IP from GCP, a firewall rule is needed to explicitly allow the traffic to reach your VM.</p>
<p>Browse to Firewall in the Google Cloud Platform.</p>
<p>Click on “CREATE FIREWALL RULE”:</p>
<p><img src="/assets/r_mage_gcp_create-firewall-rule.jpg" alt="Create Firewall Rule" /></p>
<p>Select the following options and click on “CREATE”:</p>
<p><img src="/assets/r_mage_gcp_firewall-options.jpg" alt="Firewall Rule Options" /></p>
<p>Basically, with this firewall rule in place, it means we can access mage via the external IP address by using port number 6789.</p>
<p>Now, if you type <strong>your VM external IP</strong> followed by <code class="language-plaintext highlighter-rouge">:6789</code> in your web browser you should be able to access mage.</p>
<p>For example, this is the URL I would use with my configuration: <code class="language-plaintext highlighter-rouge">http://34.65.231.180:6789</code>.</p>
<p><img src="/assets/r_mage_gcp_mage-test.jpg" alt="Mage IP Test" /></p>
<p>As you can see, <code class="language-plaintext highlighter-rouge">mage-ai-test</code> was the name of our project in a previous command.</p>
<p>Congrats, now you can create data pipelines that will run in the cloud!</p>
<h2 id="step-5-how-to-retrieve-data-from-the-ga4-api-in-a-production-environment">Step 5: How to retrieve data from the GA4 API in a production environment</h2>
<p><strong>Now, we can finally create the pipeline.</strong> We’ll first focus on retrieving data from the Google Analytics 4 (GA4) API. We will accomplish this inside of <code class="language-plaintext highlighter-rouge">Mage</code>.</p>
<p>We have the following sub-steps:</p>
<ol>
<li>Create a new pipeline</li>
<li>Select a Mage block tyoe (Data Loader)</li>
<li>Use R packages and code to retrieve data from the GA4 API</li>
<li>GA4 API: How to get an access token</li>
<li>How to run GA Authentication in a production environment</li>
<li>Create a Google Analytics token</li>
<li>Test <code class="language-plaintext highlighter-rouge">R</code> Code on Your Local Machine</li>
<li>Create the full <code class="language-plaintext highlighter-rouge">R</code> Script</li>
<li>Make JSON service account key accessible to Mage</li>
<li>Add the <code class="language-plaintext highlighter-rouge">R</code> Script to Mage</li>
</ol>
<h3 id="step-51-create-a-new-pipeline">Step 5.1: Create a new pipeline</h3>
<p>To start, click on <strong>New pipeline > Standard (batch)</strong>:</p>
<p><img src="/assets/r_mage_gcp_new-batch-pipeline.jpg" alt="New Pipeline" /></p>
<p>On the left side, you can see all your files inside of <code class="language-plaintext highlighter-rouge">Mage</code>`, even the pipeline that we have just created.</p>
<p><img src="/assets/r_mage_gcp_files-in-pipeline.jpg" alt="Mage Files" /></p>
<p>In the middle, you can see the blocks that you can use to build your pipelines. In this guide, we’ll use <strong>Data loader</strong>, <strong>Transformer</strong>, and <strong>Data exporter</strong> blocks:</p>
<p><img src="/assets/r_mage_gcp_mage-blocks.jpg" alt="Mage Blocks" /></p>
<h3 id="step-52-select-a-mage-block-type-data-loader">Step 5.2: Select a Mage block type (Data Loader)</h3>
<p><strong>The Data loader block:</strong> As mentioned previously, you can use Python, SQL, and R in each block. In our case, we’ll use <code class="language-plaintext highlighter-rouge">R</code>. So, click on Data Loader and select R:</p>
<p><img src="/assets/r_mage_gcp_use-r.jpg" alt="Data Loader" /></p>
<p>Name the block <code class="language-plaintext highlighter-rouge">ga4</code>, then click Save and add block. You should now see the block on the right, together with a sample R code.</p>
<p><img src="/assets/r_mage_gcp_sample-r-code.jpg" alt="Data Loader R" /></p>
<h3 id="step-53-use-r-packages-and-code-to-retrieve-data-from-the-ga4-api">Step 5.3: Use R packages and code to retrieve data from the GA4 API</h3>
<p>To install and load packages, mage uses the pacman package. Once you load <code class="language-plaintext highlighter-rouge">pacman</code>, you can install packages by using:</p>
<pre><code class="language-{r}">pacman::p_load(package1, package2, package3)
</code></pre>
<p>The first time you run the <code class="language-plaintext highlighter-rouge">p_load()</code> function, it will install a package, and then it will simply load it. For this block, we’ll install three packages:</p>
<pre><code class="language-{r}">library("pacman")
pacman::p_load(dplyr, purrr, googleAnalyticsR)
load_data <- function() {
}
</code></pre>
<h3 id="step-54-how-to-get-an-access-token">Step 5.4: How to get an access token</h3>
<p>In order to access GA4 data by using the <code class="language-plaintext highlighter-rouge">googleAnalyticsR</code> package, developed by Mark Edmondson, you need an access token.</p>
<p>An access token is like your digital ID card; it confirms your identity and verifies that you truly have permission to access the GA4 properties you’re attempting to retrieve data from.</p>
<p>To get an access token, you can run the following function in the RStudio console in your local machine: <code class="language-plaintext highlighter-rouge">ga_auth()</code>.</p>
<p>Once you run this function, you’ll be redirected to a browser window where you’ll select your account:</p>
<p><img src="/assets/r_mage_gcp_select-google-account.jpg" alt="GA Auth" /></p>
<p>With this, you are basically giving permission to the googleAnalyticsR package to access your GA4 properties.</p>
<p><strong>However, the problem is that we’ll run our data pipeline in a production environment where you cannot interact with the browser.</strong></p>
<p>So, we need to find another way to solve this problem.</p>
<p>In fact, if I try to run the function <code class="language-plaintext highlighter-rouge">ga_auth()</code> on Mage, <strong>it throws an error</strong>:</p>
<p><img src="/assets/r_mage_gcp_ga-auth-error.jpg" alt="GA Auth Error" /></p>
<p>So, we need to generate a Google Analytics token that we can use in a production environment.</p>
<h3 id="step-55-how-to-run-ga-authentication-in-a-production-environment-without-a-browser">Step 5.5: How to run GA Authentication in a production environment (without a browser)</h3>
<h4 id="enable-google-analytics-reporting-api">Enable Google Analytics Reporting API</h4>
<p>First, let’s go back to GCP and browse to Enabled APIs & services.</p>
<p>Click on “ENABLE APIS AND SERVICES”.</p>
<p><img src="/assets/r_mage_gcp_enable-apis-services.jpg" alt="Enable APIs" /></p>
<p>Search for <code class="language-plaintext highlighter-rouge">Google Analytics</code>, click the <strong>Google Analytics Reporting API</strong> result, and then choose <strong>ENABLE</strong>.</p>
<p><img src="/assets/r_mage_gcp_enable-ga-api.jpg" alt="Enable GA API" /></p>
<p>This means that our project is now eligible to use the Google Analytics Reporting API.</p>
<h4 id="repeat-steps-to-enable-google-analytics-data-api">Repeat steps to Enable Google Analytics Data API</h4>
<p>Next, repeat these API-enabling steps for the <strong>Google Analytics Data API.</strong></p>
<p>Once done, we have the APIs enabled but we still haven’t created the required token.</p>
<h3 id="step-56-how-to-create-a-google-analytics-token">Step 5.6: How to create a Google Analytics token</h3>
<p>Browse to <a href="https://console.cloud.google.com/apis/credentials">Credentials</a> in the Google Cloud console.</p>
<p>Hover over “CREATE CREDENTIALS” and click on Service account.</p>
<p><img src="/assets/r_mage_gcp_create-service-account.jpg" alt="Service Account" /></p>
<p>Give the service account a name and then click CREATE AND CONTINUE.</p>
<p><img src="/assets/r_mage_gcp_create-and-continue-service-account.jpg" alt="Service Account Name" /></p>
<p>Give the service account the Editor role and then click on Continue.</p>
<p><img src="/assets/r_mage_gcp_set-editor-role.jpg" alt="Set Editor Role" /></p>
<p>Finally, click on <strong>DONE</strong>.</p>
<p>Now that the service account has been created, go back to the Credentials view and you’ll see the account that you just created. Click on it.</p>
<p><img src="/assets/r_mage_gcp_click-service-account-edit.jpg" alt="Service Account Credentials" /></p>
<p>Then, click the <strong>KEYS</strong> tab and choose to <strong>Create new key</strong>.</p>
<p><img src="/assets/r_mage_gcp_create-new-key.jpg" alt="Create New Key" /></p>
<p>Select <strong>JSON</strong> as the key type and click <strong>Create</strong>.</p>
<p>This should download your key as a JSON file.</p>
<p><strong>Important: Store it in a safe place.</strong> Basically, the service account is like an account that has permission to act on your behalf. When you want your application or service to communicate with the GA4 API, it needs to prove its identity. Instead of using a user’s personal Google account, which may not be appropriate for server-to-server communication, you can use a service account.</p>
<p>Now, as if it were a real user, we need to go to the GA4 property and add our service account email. So, go back to <a href="https://console.cloud.google.com/apis/credentials">Credentials</a> and copy your service account’s <strong>email address</strong>:</p>
<p><img src="/assets/r_mage_gcp_copy-service-account-email.jpg" alt="Service Account Email" /></p>
<p>Next, open Google Analytics 4, go to your property, and click on <strong>Property access management</strong> in Admin:</p>
<p><img src="/assets/r_mage_gcp_property-access-management.jpg" alt="Property Access Management" /></p>
<p>Add your service account email address to the list of users, give it Viewer permissions, and click on Add to add the service account as a user to the GA4 property.</p>
<h3 id="step-57-test-r-code-on-your-local-machine">Step 5.7: Test R Code on Your Local Machine</h3>
<p>Now, before adding code to Mage, I like to test it on my local machine to make sure that everything works properly.</p>
<p>So, on your local machine, open a new R script and try the following code:</p>
<pre><code class="language-{r}"># Packages ----
library(purrr)
library(dplyr)
library(googleAnalyticsR)
# Authenticate ----
# path to your JSON service account that we saved earlier
ga_auth(json_file = "/Users/arbenkqiku/Desktop/mage-ai/mage-ai-test-405614-2e1e1c865c18.json")
</code></pre>
<p>If everything works correctly, you should see the following message:</p>
<p><img src="/assets/r_mage_gcp_ga-auth-test-worked.jpg" alt="GA Auth Success" /></p>
<p>That means that your pipeline can now communicate with the GA4 Reporting API without any extra authentication flows.</p>
<h3 id="step-58-create-the-r-script">Step 5.8: Create the R Script</h3>
<p>Now, what I want to retrieve from GA4 are the sessions where a lead generation conversion event happened.</p>
<p>In the case of this client of mine, either someone submitted a form, clicked on the WhatsApp icon to talk to them privately, or clicked on the phone icon to call them.</p>
<p>So, in the the next piece of code I want to create a filter with all the event names I am interested in, namely the event names equal to <code class="language-plaintext highlighter-rouge">form_submit_lead</code> or <code class="language-plaintext highlighter-rouge">whatsapp_click</code> or <code class="language-plaintext highlighter-rouge">phone_click</code>.</p>
<pre><code class="language-{r}"># GA4 property ID
property_id = "1234567"
# Create filter
goals_filter = ga_data_filter("eventName" == "form_submit_lead" | "eventName" == "whatsapp_click" | "eventName" == "phone_click")
</code></pre>
<p>In the next piece of code, we have the actual query to GA4:</p>
<pre><code class="language-{r}"># Get conversions from GA4
goals_data = ga_data(propertyId = property_id,
date_range = c("2023-10-01", "2023-11-08"),
dimensions = c("date"),
metrics = c("sessions"),
dim_filter = goals_filter) %>%
# rename sessions to goals
set_names(c("date", "goals"))
</code></pre>
<p>Basically, we’re getting the sessions from 1st October 2023 until 8th November 2023, segmented by date, and only when one of the events mentioned earlier occurred.</p>
<p>This is what the final table looks like in my case:</p>
<p><img src="/assets/r_mage_gcp_ga-table-results.jpg" alt="GA4 Data" /></p>
<p>It is not always easy to know what certain fields are called in the GA4 API. You can go to <a href="https://ga-dev-tools.google/ga4/dimensions-metrics-explorer/">this website</a> and look for a specific field. For example, if we look for “channel”, you can see all the different fields that contain “channel” and what they are called in the GA4 API.</p>
<p><img src="/assets/r_mage_gcp_ga-explorer-dims.jpg" alt="GA4 API Fields" /></p>
<p>Now, in addition to retrieving the sessions where a conversion event occurred, I also want to retrieve the sessions segmented by day, so I’ll use this query:</p>
<pre><code class="language-{r}"># Get sessions from GA4
sessions_data = ga_data(
propertyId = property_id,
date_range = c("2023-10-01", "2023-11-08"),
dimensions = c("date"),
metrics = c("sessions")
)
</code></pre>
<p>This returns a table of sessions segmented by date.</p>
<p>Now, to join the sessions with the conversions:</p>
<pre><code class="language-{r}"># Merge GA4 goals and sessions
sessions_goals_ga4 = sessions_data %>%
# join sessions with goals
full_join(goals_data) %>%
# replace all NAs with 0
replace(is.na(.), 0)
</code></pre>
<p>This is the final result:</p>
<p><img src="/assets/r_mage_gcp_sessions-by-goals.jpg" alt="GA4 Goals and Sessions" /></p>
<p><strong>Here is the complete code.</strong> At the end of the script, I added the <code class="language-plaintext highlighter-rouge">sessions_goals_ga4</code> dataframe. This is because in Mage, we’re using this code within a Data Loader block. We need to return a dataframe for the next block, otherwise the next block doesn’t have any data to play with.</p>
<pre><code class="language-{r}"># Packages ----
library(purrr)
library(dplyr)
library(googleAnalyticsR)
# Authenticate ----
# path to your JSON service account that we save earlier
ga_auth(json_file = "/Users/arbenkqiku/Desktop/mage-ai/mage-ai-test-405614-2e1e1c865c18.json")
# GA4 property ID
property_id = "1234567"
# Create filter
goals_filter = ga_data_filter("eventName" == "form_submit_lead" | "eventName" == "whatsapp_click" | "eventName" == "phone_click")
# Get conversions from GA4
goals_data = ga_data(propertyId = property_id,
date_range = c("2023-10-01", "2023-11-08"),
dimensions = c("date"),
metrics = c("sessions"),
dim_filter = goals_filter) %>%
# rename sessions to goals
set_names(c("date", "goals"))
# Get sessions from GA4
sessions_data = ga_data(propertyId = property_id,
date_range = c("2023-10-01", "2023-11-08"),
dimensions = c("date"),
metrics = c("sessions"))
# Merge GA4 goals and sessions
sessions_goals_ga4 = sessions_data %>%
# join sessions with goals
full_join(goals_data) %>%
# replace all NAs with 0
replace(is.na(.), 0)
# Final data frame for next block in mage.ai
sessions_goals_ga4
</code></pre>
<h3 id="step-59-make-json-service-account-key-accessible-to-mage">Step 5.9: Make JSON service account key accessible to Mage</h3>
<p>Now, before we copy this code to Mage, we need to make our JSON service account key accessible to Mage, as for now it is only available on our local machine.</p>
<p>Remember, Mage is installed on our virtual machine. We need to paste the JSON service account key there.</p>
<p>Open Visual Studio Code and click on “Open”.</p>
<p><img src="/assets/r_mage_gcp_visual-studio-code-open.jpg" alt="Open VSCode" /></p>
<p>Go to the path where your JSON service account key is located in your local machine. You should be able to see your service account key in the left panel.</p>
<p><img src="/assets/r_mage_gcp_vs-code-json-path.jpg" alt="Copy JSON Path" /></p>
<p>Right-click and copy it.</p>
<p>Next, go to the search bar, type > and connect to your virtual machine.</p>
<p><img src="/assets/r_mage_gcp_connect-to-vm.jpg" alt="Connect to VM" /></p>
<p>Once you are in the VM, click on “Open…” and access the folder where Mage is installed. Click on “OK”.</p>
<p><img src="/assets/r_mage_gcp_open-file-folder.jpg" alt="Open Mage Folder" /></p>
<p>On the left side you should now see the files contained in that folder.</p>
<p>Right-click in that area and choose <strong>Paste</strong> to paste your service account JSON file into the project.</p>
<p>You should see your service account file now successfully added to the files in your VM.</p>
<p><img src="/assets/r_mage_gcp_service-account-in-vm.jpg" alt="Paste JSON" /></p>
<p>In Mage, you can use the function <code class="language-plaintext highlighter-rouge">list.files()</code> to see that the service account key is available.</p>
<p><img src="/assets/r_mage_gcp_service-account-key-available.jpg" alt="List Files" /></p>
<h3 id="step-510-add-the-r-script-to-mage">Step 5.10: Add the R Script to Mage</h3>
<p>Now, take the code that we previously played with in RStudio and paste it into Mage. You need to make some adjustments, though.</p>
<p>The main change is that the bulk of the code is now within the <code class="language-plaintext highlighter-rouge">load_data()</code> function. The only code that’s run outside that function are the library loads.</p>
<p>Another thing that changes is the path to the service account key. This now needs to reference the path to the file in your VM. As it should be in the root of your project, you just need to add the filename.</p>
<pre><code class="language-{r}">library("pacman")
pacman::p_load(dplyr, purrr, googleAnalyticsR)
load_data <- function() {
# Specify your data loading logic here
# Return value: loaded dataframe
# Retrieve data ----
# path to your JSON service account
ga_auth(json_file = "mage-ai-test-405614-2e1e1c865c18.json")
# GA4 property ID
property_id = "1234567"
# Create filter
goals_filter = ga_data_filter("eventName" == "form_submit_lead" | "eventName" == "whatsapp_click" | "eventName" == "phone_click")
# Get conversions from GA4
goals_data = ga_data(propertyId = property_id,
date_range = c("2023-10-01", "2023-11-08"),
dimensions = c("date"),
metrics = c("sessions"),
dim_filter = goals_filter,
) %>%
set_names(c("date", "goals"))
# Get sessions from GA4
sessions_data = ga_data(propertyId = property_id,
date_range = c("2023-10-01", "2023-11-08"),
dimensions = c("date"),
metrics = c("sessions"))
# Merge GA4 goals and sessions
sessions_goals_ga4 = sessions_data %>%
# join sessions with goals
full_join(goals_data) %>%
# replace all NAs with 0
replace(is.na(.), 0)
# Final data frame
sessions_goals_ga4
}
</code></pre>
<p>If everything worked properly, Mage will provide a preview of the data retrieved:</p>
<p><img src="/assets/r_mage_gcp_image-preview.jpg" alt="Preview Data" /></p>
<p>As you can see, our Data loader block has a green tick next to it, which means that it was able to run successfully.</p>
<p><img src="/assets/r_mage_gcp_data-loader-worked.jpg" alt="Data Loader Success" /></p>
<p>Later, we can use this data that we retrieved from GA4 for whatever purpose we want. However, before playing around with it, let’s download some data from Google Ads!</p>
<h2 id="step-6-how-to-retrieve-data-from-the-google-ads-api-in-a-production-environment">Step 6: How to retrieve data from the Google Ads API in a production environment</h2>
<p>To retrieve data from the Google Ads API, we’ll use the R package <code class="language-plaintext highlighter-rouge">rgoogleads</code>, developed by Alexey Seleznev. Unfortunately, with this package it is not possible to use a service account key.</p>
<p>Instead, we’ll have to generate an access token by using the <code class="language-plaintext highlighter-rouge">gargle</code> package. The goal of <code class="language-plaintext highlighter-rouge">gargle</code>, as explained on their website, is to “take some of the agonizing pain out of working with Google APIs”.</p>
<p>This step has 4 sub-steps:</p>
<ol>
<li>Get an access token</li>
<li>Test the access token locally</li>
<li>Retrieve Google Ads API data into our production environment</li>
</ol>
<h3 id="step-61-how-to-get-an-access-token">Step 6.1: How to get an access token</h3>
<p>First of all, you need to browse to the <a href="https://console.cloud.google.com/marketplace/product/google/googleads.googleapis.com">Google Ads API</a> in Google Cloud Platform and click to Enable it.</p>
<p>So, when we attempt to fetch our Google Ads data, Google asks for our permission to let this app access our ads data. If we say yes, Google gives us an access token. This token then lets our computer talk to the Google Ads API without having to interact each time.</p>
<p>Before doing anything, GCP will ask you to set up a “consent screen”. This screen is like a friendly message to users, letting them know that our app wants to look at their Google Ads data.</p>
<p>It’s a way to make sure users are aware and agree to let our app access their information. To get started, browse to the <a href="https://console.cloud.google.com/apis/credentials/consent">OAuth consent screen</a> section of your GCP project.</p>
<p>Here, click on “CONFIGURE CONSENT SCREEN”.</p>
<p><img src="/assets/r_mage_gcp_configure-consent-screen.jpg" alt="Configure Consent Screen" /></p>
<p>Select <strong>External</strong> as the User Type and then click “CREATE”.</p>
<p>Give your app a name and add your email address.</p>
<p><img src="/assets/r_mage_gcp_name-and-email-consent.jpg" alt="App Name" /></p>
<p>Add your email to the <strong>Developer email address</strong>, too, and then click “SAVE AND CONTINUE”.</p>
<p>In the next screen, click on “ADD OR REMOVE SCOPES”. Scopes govern what your app is allowed to do with the APIs.</p>
<p>Search for google ads and select the <strong>Google Ads API</strong>. Click UPDATE when done.</p>
<p><img src="/assets/r_mage_gcp_add-google-ads-scope.jpg" alt="Google Ads API" /></p>
<p>Then, click “SAVE AND CONTINUE” to proceed to the “Test users” step.</p>
<p>Here, click “ADD USERS”. Add your email address and click “ADD”.</p>
<p><img src="/assets/r_mage_gcp_add-test-user.jpg" alt="Test Users" /></p>
<p>Make sure to include your email because our app is currently in the “Testing” phase. During this phase, only the emails that are added can be used by the app. So, adding your email is crucial to get your Google Ads data.</p>
<p>Click on “SAVE AND CONTINUE” to proceed to the Summary step, and then “BACK TO DASHBOARD” when done with configuring the consent screen.</p>
<p>Now that the consent screen has been configured, you can browse to <a href="https://console.cloud.google.com/apis/credentials">Credentials</a> again.</p>
<p>Here, click on “CREATE CREDENTIALS” and this time choose OAuth client ID.</p>
<p><img src="/assets/r_mage_gcp_oauth-client-id.jpg" alt="OAuth Client ID" /></p>
<p>Under <strong>Application type</strong>, select <strong>Desktop app</strong>, give a name to your OAuth client ID, and click on “CREATE”:</p>
<p><img src="/assets/r_mage_gcp_create-oauth-id.jpg" alt="OAuth Client ID Name" /></p>
<p>Download your client ID as a JSON file and click on OK.</p>
<p><img src="/assets/r_mage_gcp_download-oauth-json.jpg" alt="Download JSON" /></p>
<p>Save it in a secure location.</p>
<h3 id="step-62-how-to-test-the-access-token-locally">Step 6.2: How to test the access token locally</h3>
<p>Now, let’s go back to <code class="language-plaintext highlighter-rouge">RStudio</code> or <code class="language-plaintext highlighter-rouge">VSCode</code> on our local machine. Open a new script and load these packages:</p>
<pre><code class="language-{r}"># Packages
library(gargle)
library(rgoogleads)
</code></pre>
<p>Then, we’ll import the OAuth Client ID credentials that we just created by using the function <code class="language-plaintext highlighter-rouge">gargle_oauth_client_from_json()</code>. The name of your client can be whatever you prefer:</p>
<pre><code class="language-{r}"># Create gargle client
my_client = gargle_oauth_client_from_json(
path = "/Users/arbenkqiku/Desktop/mage-ai/mage-demo-client-id.json",
name = "Google Ads App"
)
</code></pre>
<p>Then, we can add the following scope and email to our token request:</p>
<pre><code class="language-{r}">scopes = "https://www.googleapis.com/auth/adwords"
email = "arben.kqiku@gmail.com"
</code></pre>
<p>Finally, we can go through the process of acquiring a token by running this function:</p>
<pre><code class="language-{r}"># Create a token by using Gargle
my_token = gargle2.0_token(
email = email,
package = "rgoogleads",
scope = scopes,
client = my_client
)
</code></pre>
<p>This will open a browser window.</p>
<p>Do you recognize the name of the App? That’s the name of our application! We’re now going through the process of authorizing our app to access our Google Ads data. Now, select your email.</p>
<p><img src="/assets/r_mage_gcp_authorize-app.jpg" alt="Select Email" /></p>
<p>Google will tell you that this app isn’t verified, as its status is still “testing”.</p>
<p>However, it is our own app, so we can safely click on “Continue”.</p>
<p>Authorize the app to “See, edit, create and delete your Google Ads accounts and data…” and click on “Continue”.</p>
<p><img src="/assets/r_mage_gcp_authorize-google-ads-access.jpg" alt="Authorize App" /></p>
<p>If everything worked correctly, you should see a message saying, “Authentication complete. Please close this page and return to R.”</p>
<p>Now, if we review the content of the variable <code class="language-plaintext highlighter-rouge">my_token</code>, which contains our access token, we can review the information again, for example the email associated with the token, the scopes, and so forth.</p>
<p><img src="/assets/r_mage_gcp_review-token.jpg" alt="Token Info" /></p>
<p>We can now test if the token works properly by running the <code class="language-plaintext highlighter-rouge">gads_auth()</code> function. Nothing should really happen here, as with the token we can authenticate non-interactively.</p>
<pre><code class="language-{r}"># Authenticate by using the previously created token
gads_auth(token = my_token)
</code></pre>
<p>Let’s run a simple function of the <code class="language-plaintext highlighter-rouge">rgoogleads</code> package to see if we can access our data:</p>
<pre><code class="language-{r}"># get list of accessible accounts
gads_get_accessible_customers()
</code></pre>
<p>Yes, I am able to retrieve the accounts that I have access to!</p>
<p><img src="/assets/r_mage_gcp_ads-accounts-listed.jpg" alt="Accessible Accounts" /></p>
<p>However, we are not ready for production yet. In fact, if we type this code:</p>
<pre><code class="language-{r}"># where is the cache of the token located
my_token$cache_path
</code></pre>
<p>We’ll get the result that the token is cached in a local directory, such as <code class="language-plaintext highlighter-rouge">~/Library/Caches/gargle</code>.</p>
<p>This means that when we try to load <code class="language-plaintext highlighter-rouge">my_token</code> in production, it will look for the local path instead of a path on the VM.</p>
<p>So, we need to change the cache path to our Mage directory on the VM. This is how you’d do it:</p>
<pre><code class="language-{r}"># change path of cache to mage main directory
my_token$cache_path = "/home/src"
# save token again with changed directory
saveRDS(my_token, file = "google_ads_token_mage_demo.RDS")
</code></pre>
<p>Here is the full code to generate, test, and save the token:</p>
<pre><code class="language-{r}"># Packages
library(gargle)
library(rgoogleads)
# Create gargle client
my_client = gargle_oauth_client_from_json(path = "/Users/arbenkqiku/Desktop/mage-ai/mage-demo-client-id.json",
name = "Google Ads App")
# Define scope and email
scopes = "https://www.googleapis.com/auth/adwords"
email = "arben.kqiku@gmail.com"
# Create a token by using Gargle
my_token = gargle2.0_token(email = email,
package = "rgoogleads",
scope = scopes,
client = my_client)
# Authenticate by using the previously created token
gads_auth(token = my_token)
# Test token by getting the list of accessible accounts
gads_get_accessible_customers()
# Change path of cache to mage main directory, so you can use the token in production
my_token$cache_path = "/home/src"
# Save token with changed directory
saveRDS(my_token, file = "google_ads_token_mage_demo.RDS")
</code></pre>
<h3 id="step-63-how-to-retrieve-data-from-the-google-ads-api-in-a-production-environment">Step 6.3: How to retrieve data from the Google Ads API in a production environment</h3>
<p>Now that we have generated the access token, you can copy-paste the JSON file from your local machine to the VM directory by using Visual Studio Code. Follow the exact steps you took to copy-paste the service account JSON file before.</p>
<p>Next, we can go back to Mage, add a Data loader block, and select R as the programming language.</p>
<p><img src="/assets/r_mage_gcp_new-data-loader-with-r.jpg" alt="Data Loader" /></p>
<p>Name the block <code class="language-plaintext highlighter-rouge">google_ads</code> and click on “Save and add block”.</p>
<p>In the block code, we need to first load the necessary packages.</p>
<pre><code class="language-{r}">library("pacman")
p_load(rgoogleads)
p_load(dplyr)
p_load(purrr)
load_data <- function() {
}
</code></pre>
<p>Then, we need to load our access token, authenticate with it, and set the Google Ads account ID we want to get the data from.</p>
<pre><code class="language-{r}"># load Google Ads access token
my_token = readRDS(file = "google_ads_token_mage_demo.RDS")
# Authenticate with the token
gads_auth(token = my_token)
# Set the Google Ads account id you want to get data from
gads_set_customer_id('123-123-1234')
</code></pre>
<p>Here is the query that we’re using to retrieve our data. We’ll retrieve impressions, clicks, and cost segmented by date, from “2023-10-19” until “2023-11-01”.</p>
<pre><code class="language-{r}"># run query
google_ads_account_data = gads_get_report(
resource = "customer",
fields = c("segments.date",
"metrics.impressions",
"metrics.clicks",
"metrics.cost_micros"),
date_from = "2023-10-19",
date_to = "2023-11-01"
)
</code></pre>
<p>The first argument you need to define is the resource you are getting the data from, in our case <code class="language-plaintext highlighter-rouge">customer</code>.</p>
<p>You can find <a href="https://developers.google.com/google-ads/api/fields/v13/overview#list-of-all-resources">here</a> the list of all available resources.</p>
<p>For example, if you would like to retrieve data at the ad group level, you should define the resource as <code class="language-plaintext highlighter-rouge">ad_group</code>.</p>
<p>To build our query, we can use the <a href="https://developers.google.com/google-ads/api/fields/v13/customer_query_builder">Google Ads query builder</a>, which can be used for any resource, in our case <code class="language-plaintext highlighter-rouge">customer</code>.</p>
<p><img src="/assets/r_mage_gcp_build-customer-query.jpg" alt="Google Ads Query Builder" /></p>
<p>Below you can select attributes, segments, or metrics:</p>
<p><img src="/assets/r_mage_gcp_select-attributes-segments-metrics.jpg" alt="Google Ads Query Builder" /></p>
<p>When you select fields, it will start populating the query in the user interface of the builder.</p>
<p>This is very useful to know what the metrics and dimensions are called in the Google Ads API.</p>
<p>Here is the final part of our Data loader block, which should always be a variable containing data, as we have to pass something to the next block.</p>
<pre><code class="language-{r}"># return data
google_ads_account_data
</code></pre>
<p>Here is the complete code block we’re working with:</p>
<pre><code class="language-{r}">library("pacman")
p_load(rgoogleads)
p_load(dplyr)
p_load(purrr)
load_data <- function() {
# Specify your data loading logic here
# Return value: loaded dataframe
# load Google Ads access token
my_token = readRDS(file = "google_ads_token_mage_demo.RDS")
# Authenticate with the token
gads_auth(token = my_token)
# Set the Google Ads account id you want to get data from
gads_set_customer_id('123-123-1234')
# run query
google_ads_account_data = gads_get_report(
resource = "customer",
fields = c("segments.date",
"metrics.impressions",
"metrics.clicks",
"metrics.cost_micros"),
date_from = "2023-10-19",
date_to = "2023-11-01"
)
# return data for next block
google_ads_account_data
}
</code></pre>
<p>If you run this code, you should be able to see clicks, cost, and impressions segmented by date.</p>
<p><img src="/assets/r_mage_gcp_ads-data-table.jpg" alt="Google Ads Data" /></p>
<p>We’re now done with this Data Loader block. Next, let’s move on to <strong>transformers</strong>.</p>
<h2 id="step-7-how-to-join-and-export-data-to-google-bigquery-in-a-production-environment">Step 7: How to join and export data to Google BigQuery in a production environment</h2>
<p>This step has 2 sub-steps:</p>
<ol>
<li>Join the data from GA4 and Google Ads with a Transformer block</li>
<li>Export the data to Google BigQuery with a Data Exporter block</li>
</ol>
<h3 id="step-71-join-the-data-from-ga4-and-google-ads-with-a-transformer-block">Step 7.1: Join the data from GA4 and Google Ads with a Transformer block</h3>
<p>In Mage, add a new <strong>Transformer</strong> block and select <strong>R</strong> as the programming language.</p>
<p>Give the block a name like <code class="language-plaintext highlighter-rouge">join_ga4_google_ads</code> and click on “Save and add block”.</p>
<p>In the Tree, we can now see that the Transformer block named <code class="language-plaintext highlighter-rouge">join_ga4_google_ads</code> only receives data from the Data Loader block <code class="language-plaintext highlighter-rouge">google_ads</code>. We need to also link the Data Loader <code class="language-plaintext highlighter-rouge">ga4</code> with the Transformer.</p>
<p>To do this, you simply need to drag and drop the arrow from the <code class="language-plaintext highlighter-rouge">ga4</code> block to the <code class="language-plaintext highlighter-rouge">join_ga4_google_ads</code> Transformer.</p>
<p><img src="/assets/r_mage_gcp_join-ga4-transformer.jpg" alt="Join Data" /></p>
<p>The first thing that we’ll do in the Transformer block is to add the final variables from the previous Data loader blocks to the <code class="language-plaintext highlighter-rouge">transform()</code> function.</p>
<p><img src="/assets/r_mage_gcp_transform-data-loader-functions.jpg" alt="Add Variables" /></p>
<p>Next, we can add the following packages on top of the <code class="language-plaintext highlighter-rouge">transform()</code> function:</p>
<pre><code class="language-{r}">library("pacman")
p_load(tibble, dplyr, purrr, stringr, lubridate)
</code></pre>
<p>The first piece of code that we’re adding is this:</p>
<pre><code class="language-{r}"># Build a row with the exact time
check_time = tibble(
Date = Sys.time(),
Impressions = 0,
Sessions = 0,
Clicks = 0,
Cost = 0,
Goals = 0
)
</code></pre>
<p>I am creating this tibble called <code class="language-plaintext highlighter-rouge">check_time</code> only so that later in BigQuery we can verify whether our schedule from Mage is working correctly.</p>
<p>Then, we can finally join the Google Ads data with the GA4 data, and also return the <code class="language-plaintext highlighter-rouge">merged_data</code> variable for the next block:</p>
<pre><code class="language-{r}"># Merge Google Ads with GA4 Data
merged_data = google_ads_account_data %>%
left_join(sessions_goals_ga4, by = c("date" = "date")) %>%
# reorder and capitalise columns
select(date, impressions, sessions, clicks, cost, goals) %>%
set_names(names(.) %>% str_to_title()) %>%
# add check_time variable to verify schedule
mutate(Date = Date %>% as_datetime()) %>%
bind_rows(check_time) %>%
# replace NAs with 0
replace(is.na(.), 0) %>%
arrange(desc(Date))
# Return merged_data variable for next block
merged_data
</code></pre>
<p>If everything worked properly, you should get something similar to this:</p>
<p><img src="/assets/r_mage_gcp_joined-data-after-transformation.jpg" alt="Merged Data" /></p>
<p>I am aware that we’re joining Google Ads data with GA4 data from all sources, and we should actually only join GA4 data coming from Google Ads. However, the goal of this guide is simply to show how to perform data engineering tasks with digital data.</p>
<h3 id="step-72-export-the-data-to-google-bigquery-with-a-data-exporter-block">Step 7.2: Export the data to Google BigQuery with a Data Exporter block</h3>
<p>Now that we joined data successfully from Google Ads and GA4, we’re ready to export the data to BigQuery.</p>
<p>Browse to the <a href="https://console.cloud.google.com/bigquery/">BigQuery console</a> in your Google Cloud Platform project.</p>
<p>BigQuery has the following data hierarchy: <strong>project -> dataset -> table.</strong></p>
<p>We already have a project, so now we need to create a dataset where our tables will reside. Click on the three dots on the right of your project, and then on “Create data set”:</p>
<p><img src="/assets/r_mage_gcp_create-bq-dataset.jpg" alt="Create Dataset" /></p>
<p>Give a name to your data set, select a region, and click on “CREATE DATA SET”:</p>
<p><img src="/assets/r_mage_gcp_configure-bq-dataset.jpg" alt="Dataset Name" /></p>
<p>Back in Mage, add a <strong>Data Explorer</strong> block and choose <strong>R</strong> as the programming language again.</p>
<p>Name the block <code class="language-plaintext highlighter-rouge">biq_query_export</code> and click on “Save and add block”.</p>
<p>This is what your data tree should look like.</p>
<p><img src="/assets/r_mage_gcp_mage-data-tree.jpg" alt="Data Tree" /></p>
<p>Go the the <code class="language-plaintext highlighter-rouge">big_query_export</code> block, and add <code class="language-plaintext highlighter-rouge">merged_data</code> as the argument of the function <code class="language-plaintext highlighter-rouge">export_data()</code>. Also, let’s load the <code class="language-plaintext highlighter-rouge">bigrquery</code> package.</p>
<pre><code class="language-{r}">library("pacman")
p_load(bigrquery)
export_data <- function(merged_data) {
# Specify your data exporting logic here
# Return value: exported dataframe
}
</code></pre>
<p>To authenticate with BigQuery, we can actually use the service account key that we previously created for GA4.</p>
<p>The only thing that changes is the function <code class="language-plaintext highlighter-rouge">bq_auth()</code> instead of <code class="language-plaintext highlighter-rouge">ga_auth()</code>.</p>
<p>This is great news as it means we don’t have to go through yet another cumbersome authentication process:</p>
<pre><code class="language-{r}"># Authenticate
bq_auth(path = "mage-ai-test-405614-2e1e1c865c18.json")
</code></pre>
<p>In fact, you can use the same service account key to authenticate with multiple Google services such as Google Drive or Google Sheets.</p>
<p>There are different R packages for these services, such as <code class="language-plaintext highlighter-rouge">googledrive</code> and <code class="language-plaintext highlighter-rouge">googlesheets4</code>.</p>
<p>Granted, you need to authorize the respective APIs in the Google Cloud Platform as shown previously, but this is a great time saver!</p>
<p>The next thing to do is to create a table reference for BigQuery.</p>
<p>As you may remember, we previously created only a data set, so we now need to create a placeholder for our table.</p>
<p>To do so, we need to define our project name, data set, and table. The project name and data set are already defined and we can retrieve these from BigQuery. The table name is up to you.</p>
<pre><code class="language-{r}"># Define Big Query Project data
project = "mage-ai-test-405614"
data_set = "mage_demo"
table = "merged_data"
# Define table
table = bq_table(project = project, dataset = data_set, table = table)
</code></pre>
<p>To find the right project and data set name, go to BigQuery and click on the data set you created.</p>
<p>To the right, you should see the <strong>Data set ID</strong>, which comprises <code class="language-plaintext highlighter-rouge">project_name.data_set_name</code>. You can separate and copy those values to insert them into the code above.</p>
<p><img src="/assets/r_mage_gcp_dataset-info.jpg" alt="Data Set ID" /></p>
<p>In the following code, if the table exists, we delete and recreate it before uploading data.</p>
<p>I’m doing this every 5 minutes for demonstration, but in real production, I’d likely run it less frequently, adding only the new data instead of recreating the whole table.</p>
<pre><code class="language-{r}">if(bq_table_exists(table)){
# if table already exists, delete it
bq_table_delete(table)
# recreate table so that we can fill it out
bq_table_create(table, merged_data)
# fill out table
bq_table_upload(table, merged_data)
}else{
bq_table_create(table, merged_data)
bq_table_upload(table, merged_data)
}
</code></pre>
<p>Here is the final code:</p>
<pre><code class="language-{r}">library("pacman")
p_load(bigrquery)
export_data <- function(merged_data) {
# Specify your data exporting logic here
# Return value: exported dataframe
# Authenticate
bq_auth(path = "mage-ai-test-405614-2e1e1c865c18.json")
# Define Big Query Project data
project = "mage-ai-test-405614"
data_set = "mage_demo"
table = "merged_data"
# Create table reference
table = bq_table(project = project, dataset = data_set, table = table)
if(bq_table_exists(table)){
# if table already exists, delete it
bq_table_delete(table)
# recreate table so that we can fill it out
bq_table_create(table, merged_data)
# fill out table
bq_table_upload(table, merged_data)
}else{
bq_table_create(table, merged_data)
bq_table_upload(table, merged_data)
}
}
</code></pre>
<p>If you run the code, you should have a new table called <code class="language-plaintext highlighter-rouge">merged_data</code> in BigQuery. If you click PREVIEW, you should be able to see data within.</p>
<p><img src="/assets/r_mage_gcp_bigquery-preview.jpg" alt="BigQuery Table" /></p>
<p>Our pipeline is complete, as you can see all the blocks have a green tick:</p>
<p><img src="/assets/r_mage_gcp_mage-pipeline-complete.jpg" alt="Pipeline Complete" /></p>
<h2 id="step-8-how-to-schedule-a-data-pipeline-that-automatically-updates-every-5-minutes">Step 8: How to schedule a data pipeline that automatically updates every 5 minutes</h2>
<p>There are 2 sub-steps:</p>
<ol>
<li>Test the entire pipeline (verify it runs)</li>
<li>Create a schedule</li>
</ol>
<h3 id="step-81-test-the-entire-pipeline-verify-it-runs">Step 8.1: Test the entire pipeline (verify it runs)</h3>
<p>Only because each block ran successfully, there is no guarantee that the entire pipeline will run smoothly. So, we have to run the entire pipeline before creating a schedule.</p>
<p>In Mage, click on “Triggers”:</p>
<p><img src="/assets/r_mage_gcp_mage-triggers.jpg" alt="Triggers" /></p>
<p>At the top, click on <strong>Run @once</strong>.</p>
<p>This will produce a trigger, and you’ll see that its status will change to <code class="language-plaintext highlighter-rouge">running</code>:</p>
<p><img src="/assets/r_mage_gcp_trigger-running.jpg" alt="Trigger Running" /></p>
<p>When done, it should say <code class="language-plaintext highlighter-rouge">completed</code> and switch to inactive state.</p>
<p>If we now refresh the BigQuery table, we can see that it has an updated date/time for the rows. This means that our pipeline ran successfully!</p>
<p><img src="/assets/r_mage_gcp_date-time-bigquery-updated.jpg" alt="BigQuery Table" /></p>
<h3 id="step-82-create-a-schedule">Step 8.2: Create a schedule</h3>
<p>Now that we know that our pipeline works properly, let’s create a trigger that runs every 5 minutes.</p>
<p>In Mage’s Triggers view, click on <strong>New trigger.</strong></p>
<p>Select <strong>Schedule</strong> as the trigger type.</p>
<p><img src="/assets/r_mage_gcp_mage-new-schedule-trigger.jpg" alt="Schedule Trigger" /></p>
<p>Given that the trigger will run every 5 minutes, let’s name it <code class="language-plaintext highlighter-rouge">every_5_minutes</code>.</p>
<p>Select custom as frequency and give the following cron expression: <code class="language-plaintext highlighter-rouge">*/5 * * * *</code>.</p>
<p><img src="/assets/r_mage_gcp_every-five-minutes.jpg" alt="Cron Expression" /></p>
<p>A <strong><em>cron</em></strong> expression is like a schedule for your computer tasks.</p>
<p>It’s a simple set of instructions that tells your system when to run a specific job.</p>
<p>The expression consists of five parts, representing minutes, hours, days, months, and days of the week. For example, <code class="language-plaintext highlighter-rouge">*/15 * * * *</code> means “every 15 minutes, every hour, every day, every month, every day of the week”.</p>
<p>When ready with the trigger, click on <strong>Save changes</strong>.</p>
<p><img src="/assets/r_mage_gcp_save-schedule-trigger.jpg" alt="Save Changes" /></p>
<p>Now you have created your trigger, but as you can see its status is inactive. To start it, click on <strong>Start trigger</strong>.</p>
<p><img src="/assets/r_mage_gcp_start-schedule-trigger.jpg" alt="Start Trigger" /></p>
<p>The status switches to <code class="language-plaintext highlighter-rouge">active</code>. If you browse back to the Triggers view, it will show you when it’s set to trigger next.</p>
<p><img src="/assets/r_mage_gcp_trigger-next-run-date.jpg" alt="Next Trigger" /></p>
<p>Be mindful of the fact that the time zone in Mage is in UTC.</p>
<p>Once the timer is set to go off, its status should change to <code class="language-plaintext highlighter-rouge">running</code>.</p>
<p>After it’s run, you can now refresh the BigQuery table and see that the data has now been updated again.</p>
<p><img src="/assets/r_mage_gcp_bigquery-schedule-updated.jpg" alt="BigQuery Table" /></p>
<p>Congratulations! Our journey is complete. I hope you had fun and learned something useful.</p>
<p>If you have any comments, please post them below. If you want to connect with me, Arben, <a href="https://www.linkedin.com/in/arben-kqiku-301457117/">here is my LinkedIn</a>!</p>
<h1 id="conclusion">Conclusion:</h1>
<p>Using R in production is possible with tools like Mage and Google Cloud Platform. If you are an aspiring Digital Analytics professional, you now have a clear pathway forward for using R, Mage, and Google Cloud Platform to build your own data pipelines.</p>
<p>However, if you are a Digital Analytics professional, you may be wondering how to get started with R. You may be wondering how to learn R, how to learn R for Digital Analytics, and how to learn R for Digital Analytics in a way that is practical and useful.</p>
<p><strong>If you need to learn R for data analytics and data science, then I can help. Read on.</strong></p>
<h1 id="struggling-to-become-a-data-scientist">Struggling to become a data scientist?</h1>
<p>You know the feeling. Being unhappy with your current job.</p>
<p>Promotions aren’t happening. You’re stuck. Feeling Hopeless. Confused…</p>
<p>And you’re praying that the next job interview will go better than the last 12…</p>
<p>… But you know it won’t. Not unless you take control of your career.</p>
<p>The good news is…</p>
<h1 id="i-can-help-you-speed-it-up">I Can Help You Speed It Up.</h1>
<p>I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.</p>
<p>I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.</p>
<p>And I built a training program that gets my students life-changing data science careers (don’t believe me? <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series/">see my testimonials here</a>):</p>
<h4 class="text-center">
6-Figure Data Science Job at CVS Health ($125K)<br /><div style="height:10px;"></div>
Senior VP Of Analytics At JP Morgan ($200K)<br /><div style="height:10px;"></div>
50%+ Raises & Promotions ($150K)<br /><div style="height:10px;"></div>
Lead Data Scientist at Northwestern Mutual ($175K)<br /><div style="height:10px;"></div>
2X-ed Salary (From $60K to $120K)<br /><div style="height:10px;"></div>
2 Competing ML Job Offers ($150K)<br /><div style="height:10px;"></div>
Promotion to Lead Data Scientist ($175K)<br /><div style="height:10px;"></div>
Data Scientist Job at Verizon ($125K+)<br /><div style="height:10px;"></div>
Data Scientist Job at CitiBank ($100K + Bonus)<br /><div style="height:10px;"></div>
</h4>
<h1 id="whenever-you-are-ready-heres-the-system-they-are-taking">Whenever you are ready, here’s the system they are taking:</h1>
<p><a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">Here’s the system</a> that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…</p>
<p><img src="/assets/rtrack_what_theyre_doing_2.jpg" alt="What They're Doing - 5 Course R-Track" /></p>
<p style="font-size: 36px;text-align: center;">
<a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">
<strong>Join My 5-Course R-Track Program Now!</strong><br /><small style="font-size:24px;">(And Become The Data Scientist You Were Meant To Be...)</small>
</a>
</p>
<p>P.S. - Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). <a href="https://university.business-science.io/p/5-course-bundle-machine-learning-web-apps-time-series">This could be you.</a></p>
<p><img src="/img/success_samantha_got_job.jpg" alt="Success Samantha Got The Job" /></p>
R-BloggersLearn-RRR-TipsproductionGoogle CloudMageCode-Tools