Setting Up Your Analytics Environment
Learning Objectives
Why RStudio and Positron? A Comparison of Modern Analytics Environments
When you sit down to write a financial analysis or build a machine learning model, your choice of tools shapes not just how fast you work, but how clearly you think. The environment you choose—the text editor, the console, the file browser—becomes an extension of your mind.
For decades, statisticians and data analysts have relied on R: a language born in 1995 from a community that values statistical rigor, reproducibility, and transparency. R is free, open-source, and has evolved into one of the world’s most powerful ecosystems for data science. Today, you have two main choices for your R environment: RStudio and Positron.
RStudio (now called Posit after its parent company rebranded) is the industry standard. It was built specifically for R and has matured into a complete integrated development environment (IDE). If you’ve worked with RStudio before, it feels like home: a script editor on the left, a console on the right, packages and environment panes below. Countless thousands of data scientists and statisticians around the world use it daily. It’s stable, well-documented, and production-ready.
Positron is newer—Posit released it in 2024 to challenge the IDE space itself. Positron is built on Visual Studio Code, the lightweight, fast, open-source editor from Microsoft that dominates software development. Positron brings a modern feel: faster startup times, a cleaner interface, native support for both R and Python side by side, and rich extensions. It’s built for teams that work polyglot—mixing R and Python in the same project—which is increasingly common in business analytics.
So which should you choose?
Choose RStudio if: You’re primarily working in R, you value stability and established workflows, your organization already uses RStudio, or you prefer a purpose-built environment designed specifically for statistics and data science.
Choose Positron if: You want to work fluidly between R and Python in the same session, you’re comfortable with modern development environments, you appreciate a cleaner UI, or you’re building teams that need to collaborate across languages.
This book will teach you both environments and demonstrate how to use them together. The good news: the underlying languages and workflows are identical. What differs is the look, feel, and some convenience features.
Why both R and Python? In the real world, you’ll encounter both. Many financial institutions and large enterprises use Python for production systems and R for statistical modeling. African tech startups increasingly favor Python for web backends but reach for R for analytical reports. By learning both, you’re not just picking up two syntax patterns—you’re learning two different philosophical approaches to solving problems. R thinks in terms of vectors and data frames; Python thinks in terms of sequences and objects. Together, they make you a more flexible, more thoughtful analyst.
Installing R and RStudio
R and RStudio are separate pieces of software. R is the engine—the actual language interpreter and runtime. RStudio is the dashboard—the environment where you write code, see results, and manage your work. You must install R first, then RStudio.
Step 1: Install R
Navigate to https://cran.r-project.org/, the Comprehensive R Archive Network. CRAN is the official repository of R packages and the source for the R language itself.
On Windows: 1. Click “Download R for Windows” 2. Click “base” 3. Download the latest version (look for a link like “Download R 4.3.x for Windows”) 4. Run the installer (.exe file) 5. Accept the default settings (location: C:\Program Files\R\R-x.x.x) 6. Complete the installation
On macOS: 1. Click “Download R for macOS” 2. Choose the correct architecture: - Apple Silicon (M1/M2/M3 chips): download the “ARM64” file - Intel (older Macs): download the “x86_64” file 3. Open the .pkg file and follow the installer 4. R installs to /Library/Frameworks/R.framework/
On Linux (Ubuntu/Debian):
For Red Hat/Fedora:
Step 2: Verify Your R Installation
Open a terminal (Command Prompt on Windows, Terminal on macOS/Linux) and type:
You should see output like:
R version 4.3.2 (2023-10-31) -- "Eye Holes"
...
If you see a version number, R is installed correctly.
Step 3: Install RStudio
Navigate to https://posit.co/download/rstudio-desktop/. The website will detect your operating system automatically.
On Windows: 1. Download the .exe installer 2. Run it and follow the default options 3. RStudio installs to C:\Program Files\RStudio by default
On macOS: 1. Download the .dmg file 2. Open it and drag RStudio.app to the Applications folder 3. Launch RStudio from Applications
On Linux: 1. Download the appropriate .deb (Debian/Ubuntu) or .rpm (Red Hat/Fedora) file 2. Install it: ```bash # Ubuntu/Debian sudo dpkg -i rstudio-x.x.x-amd64.deb
# Red Hat/Fedora sudo dnf install rstudio-x.x.x-x86_64.rpm ```
Step 4: Launch RStudio and Test It
Double-click RStudio (or type rstudio in your terminal). You should see a window with four panes: - Top Left: Script editor (empty for now) - Top Right: Environment and History - Bottom Left: Console (where R code runs) - Bottom Right: Files, Plots, Packages, Help
In the console, type:
Press Enter. If you see [1] 4, everything works.
Now type:
You should see [1] "Hello from R!" If both commands work, RStudio and R are correctly installed and talking to each other.
Installing Positron
Positron is the new kid on the block. It’s free, built on Visual Studio Code, and designed for modern polyglot data science.
Step 1: Install Positron
Navigate to https://positron.posit.co/ and download the installer for your operating system.
On Windows: 1. Download the .exe installer 2. Run it; accept the default installation path 3. Positron installs to C:\Users\[YourUsername]\AppData\Local\Programs\Positron by default
On macOS: 1. Download the .dmg or .zip 2. Open the .dmg and drag Positron.app to Applications 3. Or unzip and move to Applications
On Linux: Download the .deb or .rpm and install:
Step 2: Configure Positron to Use Your R Installation
When you launch Positron for the first time, it will ask you to select the R installation. If Positron doesn’t detect it automatically:
- Press
Ctrl+,(Windows/Linux) orCmd+,(macOS) to open Settings - Search for “R executable”
- Point it to your R installation:
- Windows:
C:\Program Files\R\R-x.x.x\bin\R.exe - macOS:
/Library/Frameworks/R.framework/Resources/bin/R - Linux:
/usr/bin/R
- Windows:
Step 3: Test Positron
- Open the console at the bottom of the window
- Type
2 + 2and press Enter - You should see
[1] 4
If it works, Positron is ready. The experience is similar to RStudio but with a more modern, code-editor feel.
An Introduction to Quarto: Reproducible, Elegant Reporting
Before we dive into Python, let’s pause and talk about Quarto, because it’s fundamental to your workflow as an analyst.
For most of the 20th century, analysts did their work in secret: they ran analyses in software, scribbled notes in journals, created charts in graphics programs, and then wrote reports in Word or PowerPoint, copying numbers by hand and recreating plots. This process was error-prone, unreproducible, and fragile. If a data value changed, the whole process had to start again.
Quarto changes this. Quarto is a system for literate programming—it lets you weave together narrative text (explanation, justification, storytelling) with code and its results in a single document. You write once, and when you render the document, all code runs, all plots regenerate, all tables update. If data changes next month, you re-render once and everything is fresh.
Quarto documents are plain-text .qmd files that contain: - YAML front matter (metadata) - Markdown text (formatted prose) - Code chunks (R, Python, Julia, etc.) - Inline code (small expressions in text)
When you render a Quarto document, it: 1. Reads your .qmd file 2. Executes all code chunks 3. Inserts results (numbers, tables, plots) into the document 4. Renders to HTML, PDF, Word, or other formats
This is why this textbook uses Quarto: every chapter you read was written once, contains working code that is re-run regularly, and every number, table, and plot is guaranteed to be correct because it’s generated from actual code, not copied by hand.
Installing Python: Standalone and with R
You have two choices: install Python standalone, or install it in a way that integrates with R.
Option A: Python via Anaconda (Recommended for Beginners)
Anaconda is a Python distribution that includes Python, a package manager called conda, and a large ecosystem of pre-installed data science packages.
- Go to https://www.anaconda.com/download/
- Download the Anaconda installer for your OS (look for the graphical installer)
- Run the installer and follow the defaults
- Anaconda installs to
~/anaconda3orC:\Users\[You]\anaconda3
To verify:
You should see version numbers for both.
Option B: Python via Miniconda (Lightweight Alternative)
Miniconda is Anaconda’s lightweight cousin—it includes Python and conda but fewer pre-installed packages. Use this if you want a minimal installation.
- Go to https://docs.conda.io/projects/miniconda/en/latest/
- Download the Miniconda installer for your OS
- Run it; install to the default location
Option C: Python via venv (Built-in, No Installation)
Python 3.3+ includes venv, a lightweight virtual environment tool built into the language. If you already have Python installed:
On Windows:
On macOS/Linux:
Your terminal prompt will change to show (my_analytics_env) at the start. Now you can install packages:
Making Python Available to R: reticulate
The R package reticulate lets you call Python code from within R. This is powerful: you can use R for some tasks and Python for others, all in the same document.
In RStudio or Positron, install reticulate:
Now tell reticulate where Python is:
Or, let reticulate find Python automatically:
Test it:
If you see the greeting, reticulate is working.
Installing Key Packages
Now that your languages are installed, you need the essential libraries for data science and analytics.
Essential R Packages
In RStudio or Positron, run:
# Install core data science packages
install.packages(c(
"tidyverse", # Data manipulation (dplyr, ggplot2, tidyr, etc.)
"tidymodels", # Machine learning framework
"plotly", # Interactive visualization
"reticulate", # R-Python integration
"here", # Project-relative file paths
"rmarkdown", # Document rendering (Quarto uses this)
"quarto", # Quarto R interface
"knitr" # Dynamic document generation
))What each does:
tidyverse: A collection of packages for data manipulation and visualization. Includes
dplyr(transforming data),ggplot2(beautiful graphics),readr(reading files),tidyr(reshaping data), and more.tidymodels: A framework for building, evaluating, and tuning machine learning models. It’s to machine learning what tidyverse is to data wrangling.
plotly: Create interactive, web-based visualizations. Perfect for dashboards and reports where users might want to hover over data points.
reticulate: Call Python from R, and pass data between languages seamlessly.
here: Solves file path problems. Instead of writing
../../../data/file.csv, you writehere("data", "file.csv"), and it works no matter where your project lives.
Essential Python Packages
In your terminal (with your virtual environment activated), run:
Or, if using conda:
What each does:
pandas: Data manipulation and analysis. Python’s answer to R’s data frames.
numpy: Numerical computing. Arrays, linear algebra, and mathematical functions.
scikit-learn: Machine learning. Classification, regression, clustering, and more.
matplotlib & seaborn: Visualization. Matplotlib is low-level and flexible; seaborn is higher-level and prettier.
plotly: Interactive visualizations (same as R’s plotly).
statsmodels: Statistical modeling and hypothesis testing.
jupyter: Interactive notebooks for exploratory analysis.
To verify all Python packages installed correctly, open Python and run:
If you see the success message, you’re ready.
Your First Quarto Document
Let’s create and render your first Quarto document. This is your hello world for reproducible analytics.
Creating a New Quarto Document in RStudio
- Go to File → New File → Quarto Document
- You’ll see a dialog asking about output format; keep the default (HTML)
- Click Create
- A new
.qmdfile appears with template content
The file should look like this:
---
title: "My First Quarto Document"
format: html
---
## Quarto
Quarto enables you to weave together content and executable code into finished documents. To learn more about Quarto see <https://quarto.org>.
## Running Code
When you click the **Render** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
\`\`\`{r}
1 + 1
\`\`\`
## EndUnderstanding Quarto Structure
YAML Front Matter (the part between --- lines):
This metadata tells Quarto the document title and that you want HTML output.
Markdown Text: Everything between the YAML and code chunks is formatted text. Use # for headings, **bold** for bold, *italic* for italics, and so on.
Code Chunks:
When you render, this code runs and the result appears in the document.
Rendering Your First Document
Click the blue “Render” button (in RStudio, top-right of the editor). Quarto will: 1. Run all R code chunks 2. Capture the results 3. Convert Markdown to formatted text 4. Combine everything into a single HTML file 5. Open the HTML file in a viewer
You should see a nicely formatted document with your calculation result.
Creating a Quarto Document with Python
To use Python in Quarto, add code chunks like this:
For this to work, Quarto needs to know where Python is. In your YAML front matter, add:
And install the quarto R package (if you haven’t):
Now your Python chunks will execute when you render.
Case Study: Loading and Exploring Nigerian CPI Data
Let’s put everything together. We’ll create a real analytics workflow: loading Nigerian economic data, exploring it, and visualizing it in both R and Python.
The Data: Consumer Price Index
The Consumer Price Index (CPI) measures inflation—the change in prices of goods and services over time. Nigeria’s National Bureau of Statistics (NBS) publishes monthly CPI data. We’ll create a realistic synthetic dataset representing monthly CPI values for Nigeria over the past three years.
Create a new Quarto document called nigerian_cpi_analysis.qmd and add this content:
---
title: "Nigerian CPI Analysis"
format: html
---
# Nigerian Consumer Price Index Analysis
This analysis explores Consumer Price Index trends in Nigeria from 2021 to 2024, demonstrating data loading, exploration, and visualization in both R and Python.
## Loading Data in R
::: {.panel-tabset}
## R
\`\`\`{r}
library(tidyverse)
library(plotly)
# Create synthetic but realistic Nigerian CPI data
set.seed(42)
cpi_data <- tibble(
date = seq(from = as.Date("2021-01-01"), by = "month", length.out = 48),
year = year(date),
month = month(date),
month_name = month.abb[month],
cpi = 100 + cumsum(rnorm(48, mean = 2, sd = 1.5)) + seq(0, 20, length.out = 48)
) |>
mutate(
cpi = round(cpi, 2),
# Add some seasonality
cpi = cpi + 3 * sin(2 * pi * month / 12)
)
head(cpi_data, 10)Python
```{python} import pandas as pd import numpy as np import matplotlib.pyplot as plt from datetime import datetime, timedelta
Create synthetic but realistic Nigerian CPI data
np.random.seed(42) dates = pd.date_range(start=“2021-01-01”, periods=48, freq=“M”)
cpi_values = 100 + np.cumsum(np.random.normal(loc=2, scale=1.5, size=48)) + np.linspace(0, 20, 48)
Add seasonality
months = np.array([d.month for d in dates]) cpi_values = cpi_values + 3 * np.sin(2 * np.pi * months / 12)
cpi_data = pd.DataFrame({ “date”: dates, “year”: [d.year for d in dates], “month”: [d.month for d in dates], “month_name”: [d.strftime(“%b”) for d in dates], “cpi”: np.round(cpi_values, 2) })
print(cpi_data.head(10))
:::
## Exploratory Data Analysis
### Summary Statistics
::: {.panel-tabset}
## R
\`\`\`{r}
# Summary statistics
summary(cpi_data$cpi)
# Grouped statistics by year
cpi_data |>
group_by(year) |>
summarise(
mean_cpi = mean(cpi),
sd_cpi = sd(cpi),
min_cpi = min(cpi),
max_cpi = max(cpi),
.groups = "drop"
)
Python
```{python} # Summary statistics print(cpi_data[“cpi”].describe()) print(“”)
Grouped statistics by year
print(cpi_data.groupby(“year”)[“cpi”].agg([ (“mean”, “mean”), (“std”, “std”), (“min”, “min”), (“max”, “max”)]))
:::
### Visualization: CPI Trends Over Time
::: {.panel-tabset}
## R
\`\`\`{r}
# Create an interactive plot
p <- cpi_data |>
ggplot(aes(x = date, y = cpi, color = as.factor(year))) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
labs(
title = "Nigerian Consumer Price Index (2021-2024)",
x = "Date",
y = "CPI (Index)",
color = "Year"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
legend.position = "bottom"
)
ggplotly(p)
Python
```{python} import plotly.express as px
Create interactive plot using plotly
fig = px.line( cpi_data, x=“date”, y=“cpi”, color=“year”, title=“Nigerian Consumer Price Index (2021-2024)”, labels={“date”: “Date”, “cpi”: “CPI (Index)”, “year”: “Year”}, markers=True )
fig.update_layout( hovermode=“x unified”, template=“plotly_white” )
fig.show()
:::
## Year-over-Year Change
It's often useful to look at how CPI changed from one year to the next, especially when comparing inflation rates.
::: {.panel-tabset}
## R
\`\`\`{r}
# Calculate year-over-year change
cpi_yoy <- cpi_data |>
arrange(month, year) |>
group_by(month) |>
mutate(
yoy_change = cpi - lag(cpi),
yoy_pct_change = ((cpi - lag(cpi)) / lag(cpi)) * 100
) |>
ungroup() |>
filter(!is.na(yoy_change))
head(cpi_yoy, 10)
# Visualization of YoY change
cpi_yoy |>
filter(year > 2021) |>
ggplot(aes(x = as.factor(month), y = yoy_pct_change, fill = as.factor(year))) +
geom_col(position = "dodge") +
labs(
title = "Year-over-Year CPI Change by Month",
x = "Month",
y = "Percent Change (%)",
fill = "Year"
) +
theme_minimal()
Python
```{python} # Calculate year-over-year change cpi_data_sorted = cpi_data.sort_values([“month”, “year”]).reset_index(drop=True)
Group by month and shift to get previous year’s value
cpi_data_sorted[“yoy_change”] = cpi_data_sorted.groupby(“month”)[“cpi”].diff() cpi_data_sorted[“yoy_pct_change”] = (cpi_data_sorted[“yoy_change”] / cpi_data_sorted.groupby(“month”)[“cpi”].shift(1)) * 100
print(cpi_data_sorted.head(10))
Visualization
yoy_filtered = cpi_data_sorted[cpi_data_sorted[“year”] > 2021].dropna(subset=[“yoy_pct_change”])
fig = px.bar( yoy_filtered, x=“month”, y=“yoy_pct_change”, color=“year”, barmode=“group”, title=“Year-over-Year CPI Change by Month”, labels={“month”: “Month”, “yoy_pct_change”: “Percent Change (%)”} )
fig.show()
:::
## Key Insights
From this analysis of Nigerian CPI data, we observe:
1. **Overall Inflation Trend:** CPI increased consistently from 2021 to 2024, reflecting Nigeria's inflation dynamics during this period.
2. **Seasonal Patterns:** There are visible monthly variations in CPI, with peaks and troughs repeating across years—this is typical of price indices.
3. **Year-over-Year Growth:** The year-over-year percentage changes show variation across months, with some months experiencing higher inflation than others.
This is the foundation of time-series analysis, a critical skill in business analytics for understanding trends, forecasting, and decision-making.
Rendering the Document
- Save this as
nigerian_cpi_analysis.qmd - Click Render
- Quarto will execute all R and Python code, generate the plots, and produce a beautiful HTML report
Congratulations! You’ve just created a reproducible analytics document that could be shared with colleagues, stakeholders, or supervisors. If the underlying data changes next month, you simply re-render and everything updates automatically.
Section Review Questions
Further Reading
- RStudio (Posit) Documentation: https://posit.co/resources/
- Positron Documentation & Guide: https://positron.posit.co/docs/
- Quarto Documentation: https://quarto.org/docs/
- Grolemund, G., & Wickham, H. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media. (Classic reference for tidyverse)
- Python for Data Analysis (3rd ed.) by Wes McKinney (2022)—covers pandas, numpy, and data manipulation in Python
Chapter Appendix: Detailed Installation Troubleshooting
Appendix 1.A: Windows-Specific Issues
Problem: R installation fails with “Administrator rights required”
Solution: Right-click the R installer, select “Run as Administrator,” and proceed.
Problem: RStudio cannot find R
Solution: In RStudio, go to Tools → Global Options → General. Under “R Sessions,” click “Change” and manually navigate to C:\Program Files\R\R-x.x.x\bin\x64\R.exe.
Problem: Packages won’t install (“cannot remove prior installation”)
Solution: Close all R instances, navigate to your R library folder (C:\Users\[You]\Documents\R\win-library\[version]), delete the problematic package folder, and try installing again.
Appendix 1.B: macOS-Specific Issues
Problem: “R” command not found in Terminal
Solution: R doesn’t automatically add itself to your PATH. Add this line to your ~/.zprofile or ~/.bash_profile:
Then open a new terminal window.
Problem: “gfortran” error when installing packages
Solution: Some R packages need the Fortran compiler. Install the R tools from https://cran.r-project.org/bin/macosx/tools/ or use:
Problem: RStudio won’t launch on Apple Silicon
Solution: Make sure you downloaded the “ARM64” version of both R and RStudio, not the Intel version.
Appendix 1.C: Linux-Specific Issues
Problem: Package manager doesn’t have the latest R
Solution: Add the CRAN repository to your package manager. For Ubuntu/Debian:
Problem: Missing dependencies (“libssl-dev not found”)
Solution: Install development tools before R:
Appendix 1.D: Verifying Python-R Integration
To confirm reticulate is working properly, run this R code:
library(reticulate)
# Check detected Python installations
reticulate::py_config()
# Run a Python command from R
py_run_string("x = [1, 2, 3, 4, 5]; print(f'Sum: {sum(x)}')")
# Pass data from R to Python
r_vector <- c(10, 20, 30, 40, 50)
py_assign("py_vector", r_vector)
py_run_string("print(f'Python received: {py_vector}')")
# Pass data from Python to R
py_run_string("result = sum([1, 2, 3, 4, 5])")
py$result # Access Python variable 'result' in RIf all commands execute without error, your R-Python integration is correctly configured.
Appendix 1.E: Virtual Environments and Reproducibility
Virtual environments isolate project dependencies. This is a best practice in professional analytics.
Creating a Python virtual environment:
Activating it:
Installing packages only to this environment:
Saving dependencies for reproducibility:
Later, on another machine, recreate the environment:
Now everyone has identical package versions—critical for reproducible research.