Setting Up Your Analytics Environment

Learning Objectives

📘 What You’ll Learn in This Chapter

By the end of this chapter, you will be able to:

Install and configure RStudio (open-source) and Positron (modern IDE) on Windows, macOS, and Linux
Set up Python both as a standalone environment and integrated with R via reticulate
Understand the strengths and trade-offs between R and Python for business analytics
Install and manage key packages for data science in both languages
Create, edit, and render your first Quarto document to both HTML and PDF
Load, explore, and visualize real Nigerian business data in both R and Python
Recognize the importance of reproducible research and literate programming

Why RStudio and Positron? A Comparison of Modern Analytics Environments

When you sit down to write a financial analysis or build a machine learning model, your choice of tools shapes not just how fast you work, but how clearly you think. The environment you choose—the text editor, the console, the file browser—becomes an extension of your mind.

For decades, statisticians and data analysts have relied on R: a language born in 1995 from a community that values statistical rigor, reproducibility, and transparency. R is free, open-source, and has evolved into one of the world’s most powerful ecosystems for data science. Today, you have two main choices for your R environment: RStudio and Positron.

RStudio (now called Posit after its parent company rebranded) is the industry standard. It was built specifically for R and has matured into a complete integrated development environment (IDE). If you’ve worked with RStudio before, it feels like home: a script editor on the left, a console on the right, packages and environment panes below. Countless thousands of data scientists and statisticians around the world use it daily. It’s stable, well-documented, and production-ready.

Positron is newer—Posit released it in 2024 to challenge the IDE space itself. Positron is built on Visual Studio Code, the lightweight, fast, open-source editor from Microsoft that dominates software development. Positron brings a modern feel: faster startup times, a cleaner interface, native support for both R and Python side by side, and rich extensions. It’s built for teams that work polyglot—mixing R and Python in the same project—which is increasingly common in business analytics.

So which should you choose?

Choose RStudio if: You’re primarily working in R, you value stability and established workflows, your organization already uses RStudio, or you prefer a purpose-built environment designed specifically for statistics and data science.
Choose Positron if: You want to work fluidly between R and Python in the same session, you’re comfortable with modern development environments, you appreciate a cleaner UI, or you’re building teams that need to collaborate across languages.

This book will teach you both environments and demonstrate how to use them together. The good news: the underlying languages and workflows are identical. What differs is the look, feel, and some convenience features.

Why both R and Python? In the real world, you’ll encounter both. Many financial institutions and large enterprises use Python for production systems and R for statistical modeling. African tech startups increasingly favor Python for web backends but reach for R for analytical reports. By learning both, you’re not just picking up two syntax patterns—you’re learning two different philosophical approaches to solving problems. R thinks in terms of vectors and data frames; Python thinks in terms of sequences and objects. Together, they make you a more flexible, more thoughtful analyst.

Installing R and RStudio

R and RStudio are separate pieces of software. R is the engine—the actual language interpreter and runtime. RStudio is the dashboard—the environment where you write code, see results, and manage your work. You must install R first, then RStudio.

Step 1: Install R

Navigate to https://cran.r-project.org/, the Comprehensive R Archive Network. CRAN is the official repository of R packages and the source for the R language itself.

On Windows: 1. Click “Download R for Windows” 2. Click “base” 3. Download the latest version (look for a link like “Download R 4.3.x for Windows”) 4. Run the installer (.exe file) 5. Accept the default settings (location: C:\Program Files\R\R-x.x.x) 6. Complete the installation

On macOS: 1. Click “Download R for macOS” 2. Choose the correct architecture: - Apple Silicon (M1/M2/M3 chips): download the “ARM64” file - Intel (older Macs): download the “x86_64” file 3. Open the .pkg file and follow the installer 4. R installs to /Library/Frameworks/R.framework/

On Linux (Ubuntu/Debian):

sudo apt update
sudo apt install r-base r-base-dev

For Red Hat/Fedora:

sudo dnf install R

Step 2: Verify Your R Installation

Open a terminal (Command Prompt on Windows, Terminal on macOS/Linux) and type:

R --version

You should see output like:

R version 4.3.2 (2023-10-31) -- "Eye Holes"
...

If you see a version number, R is installed correctly.

Step 3: Install RStudio

Navigate to https://posit.co/download/rstudio-desktop/. The website will detect your operating system automatically.

On Windows: 1. Download the .exe installer 2. Run it and follow the default options 3. RStudio installs to C:\Program Files\RStudio by default

On macOS: 1. Download the .dmg file 2. Open it and drag RStudio.app to the Applications folder 3. Launch RStudio from Applications

On Linux: 1. Download the appropriate .deb (Debian/Ubuntu) or .rpm (Red Hat/Fedora) file 2. Install it: ```bash # Ubuntu/Debian sudo dpkg -i rstudio-x.x.x-amd64.deb

# Red Hat/Fedora sudo dnf install rstudio-x.x.x-x86_64.rpm ```

Step 4: Launch RStudio and Test It

Double-click RStudio (or type rstudio in your terminal). You should see a window with four panes: - Top Left: Script editor (empty for now) - Top Right: Environment and History - Bottom Left: Console (where R code runs) - Bottom Right: Files, Plots, Packages, Help

In the console, type:

2 + 2

Press Enter. If you see [1] 4, everything works.

Now type:

print("Hello from R!")

You should see [1] "Hello from R!" If both commands work, RStudio and R are correctly installed and talking to each other.

Installing Positron

Positron is the new kid on the block. It’s free, built on Visual Studio Code, and designed for modern polyglot data science.

Step 1: Install Positron

Navigate to https://positron.posit.co/ and download the installer for your operating system.

On Windows: 1. Download the .exe installer 2. Run it; accept the default installation path 3. Positron installs to C:\Users\[YourUsername]\AppData\Local\Programs\Positron by default

On macOS: 1. Download the .dmg or .zip 2. Open the .dmg and drag Positron.app to Applications 3. Or unzip and move to Applications

On Linux: Download the .deb or .rpm and install:

sudo dpkg -i positron-x.x.x-amd64.deb
# or
sudo dnf install positron-x.x.x-x86_64.rpm

Step 2: Configure Positron to Use Your R Installation

When you launch Positron for the first time, it will ask you to select the R installation. If Positron doesn’t detect it automatically:

Press Ctrl+, (Windows/Linux) or Cmd+, (macOS) to open Settings
Search for “R executable”
Point it to your R installation:
- Windows: C:\Program Files\R\R-x.x.x\bin\R.exe
- macOS: /Library/Frameworks/R.framework/Resources/bin/R
- Linux: /usr/bin/R

Step 3: Test Positron

Open the console at the bottom of the window
Type 2 + 2 and press Enter
You should see [1] 4

If it works, Positron is ready. The experience is similar to RStudio but with a more modern, code-editor feel.

An Introduction to Quarto: Reproducible, Elegant Reporting

Before we dive into Python, let’s pause and talk about Quarto, because it’s fundamental to your workflow as an analyst.

For most of the 20th century, analysts did their work in secret: they ran analyses in software, scribbled notes in journals, created charts in graphics programs, and then wrote reports in Word or PowerPoint, copying numbers by hand and recreating plots. This process was error-prone, unreproducible, and fragile. If a data value changed, the whole process had to start again.

Quarto changes this. Quarto is a system for literate programming—it lets you weave together narrative text (explanation, justification, storytelling) with code and its results in a single document. You write once, and when you render the document, all code runs, all plots regenerate, all tables update. If data changes next month, you re-render once and everything is fresh.

Quarto documents are plain-text .qmd files that contain: - YAML front matter (metadata) - Markdown text (formatted prose) - Code chunks (R, Python, Julia, etc.) - Inline code (small expressions in text)

When you render a Quarto document, it: 1. Reads your .qmd file 2. Executes all code chunks 3. Inserts results (numbers, tables, plots) into the document 4. Renders to HTML, PDF, Word, or other formats

This is why this textbook uses Quarto: every chapter you read was written once, contains working code that is re-run regularly, and every number, table, and plot is guaranteed to be correct because it’s generated from actual code, not copied by hand.

Installing Python: Standalone and with R

You have two choices: install Python standalone, or install it in a way that integrates with R.

Option A: Python via Anaconda (Recommended for Beginners)

Anaconda is a Python distribution that includes Python, a package manager called conda, and a large ecosystem of pre-installed data science packages.

Go to https://www.anaconda.com/download/
Download the Anaconda installer for your OS (look for the graphical installer)
Run the installer and follow the defaults
Anaconda installs to ~/anaconda3 or C:\Users\[You]\anaconda3

To verify:

python --version
conda --version

You should see version numbers for both.

Option B: Python via Miniconda (Lightweight Alternative)

Miniconda is Anaconda’s lightweight cousin—it includes Python and conda but fewer pre-installed packages. Use this if you want a minimal installation.

Go to https://docs.conda.io/projects/miniconda/en/latest/
Download the Miniconda installer for your OS
Run it; install to the default location

Option C: Python via venv (Built-in, No Installation)

Python 3.3+ includes venv, a lightweight virtual environment tool built into the language. If you already have Python installed:

python -m venv ~/my_analytics_env

On Windows:

my_analytics_env\Scripts\activate

On macOS/Linux:

source my_analytics_env/bin/activate

Your terminal prompt will change to show (my_analytics_env) at the start. Now you can install packages:

pip install pandas numpy scikit-learn matplotlib seaborn plotly statsmodels

Making Python Available to R: reticulate

The R package reticulate lets you call Python code from within R. This is powerful: you can use R for some tasks and Python for others, all in the same document.

In RStudio or Positron, install reticulate:

install.packages("reticulate")

Now tell reticulate where Python is:

library(reticulate)
use_python("/path/to/python")  # macOS/Linux
use_python("C:\\Users\\[You]\\anaconda3\\python.exe")  # Windows

Or, let reticulate find Python automatically:

library(reticulate)
reticulate::py_config()  # Shows detected Python installations

Test it:

library(reticulate)
py_run_string("print('Hello from Python inside R!')")

If you see the greeting, reticulate is working.

Installing Key Packages

Now that your languages are installed, you need the essential libraries for data science and analytics.

Essential R Packages

📘 The Core R Data Science Stack

The packages listed below form the foundation of modern R analytics. Together, they provide data manipulation, visualization, statistical modeling, and interconnection with Python.

In RStudio or Positron, run:

# Install core data science packages
install.packages(c(
  "tidyverse",      # Data manipulation (dplyr, ggplot2, tidyr, etc.)
  "tidymodels",     # Machine learning framework
  "plotly",         # Interactive visualization
  "reticulate",     # R-Python integration
  "here",           # Project-relative file paths
  "rmarkdown",      # Document rendering (Quarto uses this)
  "quarto",         # Quarto R interface
  "knitr"           # Dynamic document generation
))

What each does:

tidyverse: A collection of packages for data manipulation and visualization. Includes dplyr (transforming data), ggplot2 (beautiful graphics), readr (reading files), tidyr (reshaping data), and more.
tidymodels: A framework for building, evaluating, and tuning machine learning models. It’s to machine learning what tidyverse is to data wrangling.
plotly: Create interactive, web-based visualizations. Perfect for dashboards and reports where users might want to hover over data points.
reticulate: Call Python from R, and pass data between languages seamlessly.
here: Solves file path problems. Instead of writing ../../../data/file.csv, you write here("data", "file.csv"), and it works no matter where your project lives.

Essential Python Packages

In your terminal (with your virtual environment activated), run:

pip install pandas numpy scikit-learn matplotlib seaborn plotly statsmodels jupyter

Or, if using conda:

conda install pandas numpy scikit-learn matplotlib seaborn plotly statsmodels jupyter

What each does:

pandas: Data manipulation and analysis. Python’s answer to R’s data frames.
numpy: Numerical computing. Arrays, linear algebra, and mathematical functions.
scikit-learn: Machine learning. Classification, regression, clustering, and more.
matplotlib & seaborn: Visualization. Matplotlib is low-level and flexible; seaborn is higher-level and prettier.
plotly: Interactive visualizations (same as R’s plotly).
statsmodels: Statistical modeling and hypothesis testing.
jupyter: Interactive notebooks for exploratory analysis.

To verify all Python packages installed correctly, open Python and run:

import pandas as pd
import numpy as np
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
print("All packages imported successfully!")

If you see the success message, you’re ready.

Your First Quarto Document

Let’s create and render your first Quarto document. This is your hello world for reproducible analytics.

Creating a New Quarto Document in RStudio

Go to File → New File → Quarto Document
You’ll see a dialog asking about output format; keep the default (HTML)
Click Create
A new .qmd file appears with template content

The file should look like this:

---
title: "My First Quarto Document"
format: html
---

## Quarto

Quarto enables you to weave together content and executable code into finished documents. To learn more about Quarto see <https://quarto.org>.

## Running Code

When you click the **Render** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

\`\`\`{r}
1 + 1
\`\`\`

## End

Understanding Quarto Structure

YAML Front Matter (the part between --- lines):

---
title: "My First Quarto Document"
format: html
---

This metadata tells Quarto the document title and that you want HTML output.

Markdown Text: Everything between the YAML and code chunks is formatted text. Use # for headings, **bold** for bold, *italic* for italics, and so on.

Code Chunks:

Show code

# This is R code
x <- c(1, 2, 3, 4, 5)
mean(x)
#> [1] 3

When you render, this code runs and the result appears in the document.

Rendering Your First Document

Click the blue “Render” button (in RStudio, top-right of the editor). Quarto will: 1. Run all R code chunks 2. Capture the results 3. Convert Markdown to formatted text 4. Combine everything into a single HTML file 5. Open the HTML file in a viewer

You should see a nicely formatted document with your calculation result.

Creating a Quarto Document with Python

To use Python in Quarto, add code chunks like this:

Show code

# Python code
x = [1, 2, 3, 4, 5]
print(sum(x) / len(x))
#> 3.0

For this to work, Quarto needs to know where Python is. In your YAML front matter, add:

---
title: "My First Quarto Document"
format: html
engine: knitr
---

And install the quarto R package (if you haven’t):

install.packages("quarto")

Now your Python chunks will execute when you render.

Case Study: Loading and Exploring Nigerian CPI Data

Let’s put everything together. We’ll create a real analytics workflow: loading Nigerian economic data, exploring it, and visualizing it in both R and Python.

The Data: Consumer Price Index

The Consumer Price Index (CPI) measures inflation—the change in prices of goods and services over time. Nigeria’s National Bureau of Statistics (NBS) publishes monthly CPI data. We’ll create a realistic synthetic dataset representing monthly CPI values for Nigeria over the past three years.

Create a new Quarto document called nigerian_cpi_analysis.qmd and add this content:

---
title: "Nigerian CPI Analysis"
format: html
---

# Nigerian Consumer Price Index Analysis

This analysis explores Consumer Price Index trends in Nigeria from 2021 to 2024, demonstrating data loading, exploration, and visualization in both R and Python.

## Loading Data in R

::: {.panel-tabset}
## R
\`\`\`{r}
library(tidyverse)
library(plotly)

# Create synthetic but realistic Nigerian CPI data
set.seed(42)
cpi_data <- tibble(
  date = seq(from = as.Date("2021-01-01"), by = "month", length.out = 48),
  year = year(date),
  month = month(date),
  month_name = month.abb[month],
  cpi = 100 + cumsum(rnorm(48, mean = 2, sd = 1.5)) + seq(0, 20, length.out = 48)
) |>
  mutate(
    cpi = round(cpi, 2),
    # Add some seasonality
    cpi = cpi + 3 * sin(2 * pi * month / 12)
  )

head(cpi_data, 10)

Python

```{python} import pandas as pd import numpy as np import matplotlib.pyplot as plt from datetime import datetime, timedelta

Create synthetic but realistic Nigerian CPI data

np.random.seed(42) dates = pd.date_range(start=“2021-01-01”, periods=48, freq=“M”)

cpi_values = 100 + np.cumsum(np.random.normal(loc=2, scale=1.5, size=48)) + np.linspace(0, 20, 48)

Add seasonality

months = np.array([d.month for d in dates]) cpi_values = cpi_values + 3 * np.sin(2 * np.pi * months / 12)

cpi_data = pd.DataFrame({ “date”: dates, “year”: [d.year for d in dates], “month”: [d.month for d in dates], “month_name”: [d.strftime(“%b”) for d in dates], “cpi”: np.round(cpi_values, 2) })

print(cpi_data.head(10))

:::

## Exploratory Data Analysis

### Summary Statistics

::: {.panel-tabset}
## R
\`\`\`{r}
# Summary statistics
summary(cpi_data$cpi)

# Grouped statistics by year
cpi_data |>
  group_by(year) |>
  summarise(
    mean_cpi = mean(cpi),
    sd_cpi = sd(cpi),
    min_cpi = min(cpi),
    max_cpi = max(cpi),
    .groups = "drop"
  )

Python

```{python} # Summary statistics print(cpi_data[“cpi”].describe()) print(“”)

Grouped statistics by year

print(cpi_data.groupby(“year”)[“cpi”].agg([ (“mean”, “mean”), (“std”, “std”), (“min”, “min”), (“max”, “max”)]))

:::

### Visualization: CPI Trends Over Time

::: {.panel-tabset}
## R
\`\`\`{r}
# Create an interactive plot
p <- cpi_data |>
  ggplot(aes(x = date, y = cpi, color = as.factor(year))) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  labs(
    title = "Nigerian Consumer Price Index (2021-2024)",
    x = "Date",
    y = "CPI (Index)",
    color = "Year"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    legend.position = "bottom"
  )

ggplotly(p)

Python

```{python} import plotly.express as px

Create interactive plot using plotly

fig = px.line( cpi_data, x=“date”, y=“cpi”, color=“year”, title=“Nigerian Consumer Price Index (2021-2024)”, labels={“date”: “Date”, “cpi”: “CPI (Index)”, “year”: “Year”}, markers=True )

fig.update_layout( hovermode=“x unified”, template=“plotly_white” )

fig.show()

:::

## Year-over-Year Change

It's often useful to look at how CPI changed from one year to the next, especially when comparing inflation rates.

::: {.panel-tabset}
## R
\`\`\`{r}
# Calculate year-over-year change
cpi_yoy <- cpi_data |>
  arrange(month, year) |>
  group_by(month) |>
  mutate(
    yoy_change = cpi - lag(cpi),
    yoy_pct_change = ((cpi - lag(cpi)) / lag(cpi)) * 100
  ) |>
  ungroup() |>
  filter(!is.na(yoy_change))

head(cpi_yoy, 10)

# Visualization of YoY change
cpi_yoy |>
  filter(year > 2021) |>
  ggplot(aes(x = as.factor(month), y = yoy_pct_change, fill = as.factor(year))) +
  geom_col(position = "dodge") +
  labs(
    title = "Year-over-Year CPI Change by Month",
    x = "Month",
    y = "Percent Change (%)",
    fill = "Year"
  ) +
  theme_minimal()

Python

```{python} # Calculate year-over-year change cpi_data_sorted = cpi_data.sort_values([“month”, “year”]).reset_index(drop=True)

Group by month and shift to get previous year’s value

cpi_data_sorted[“yoy_change”] = cpi_data_sorted.groupby(“month”)[“cpi”].diff() cpi_data_sorted[“yoy_pct_change”] = (cpi_data_sorted[“yoy_change”] / cpi_data_sorted.groupby(“month”)[“cpi”].shift(1)) * 100

print(cpi_data_sorted.head(10))

Visualization

yoy_filtered = cpi_data_sorted[cpi_data_sorted[“year”] > 2021].dropna(subset=[“yoy_pct_change”])

fig = px.bar( yoy_filtered, x=“month”, y=“yoy_pct_change”, color=“year”, barmode=“group”, title=“Year-over-Year CPI Change by Month”, labels={“month”: “Month”, “yoy_pct_change”: “Percent Change (%)”} )

fig.show()

:::

## Key Insights

From this analysis of Nigerian CPI data, we observe:

1. **Overall Inflation Trend:** CPI increased consistently from 2021 to 2024, reflecting Nigeria's inflation dynamics during this period.

2. **Seasonal Patterns:** There are visible monthly variations in CPI, with peaks and troughs repeating across years—this is typical of price indices.

3. **Year-over-Year Growth:** The year-over-year percentage changes show variation across months, with some months experiencing higher inflation than others.

This is the foundation of time-series analysis, a critical skill in business analytics for understanding trends, forecasting, and decision-making.

Rendering the Document

Save this as nigerian_cpi_analysis.qmd
Click Render
Quarto will execute all R and Python code, generate the plots, and produce a beautiful HTML report

Congratulations! You’ve just created a reproducible analytics document that could be shared with colleagues, stakeholders, or supervisors. If the underlying data changes next month, you simply re-render and everything updates automatically.

Section Review Questions

📝 Review Questions: Setting Up Your Environment

What is the difference between R and RStudio, and why do you need both?
When would you choose Positron over RStudio, and what are the trade-offs?
Explain the concept of “literate programming” and why Quarto embodies this principle. What are two advantages of writing analyses in Quarto versus in a traditional Word document?
You want to use Python for machine learning but prefer R’s data manipulation tools. How would you set this up, and what R package makes this seamless?
What do the packages tidyverse, tidymodels, and reticulate do, and in what scenario would you use all three together?

Chapter Appendix: Detailed Installation Troubleshooting

Appendix 1.A: Windows-Specific Issues

Problem: R installation fails with “Administrator rights required”

Solution: Right-click the R installer, select “Run as Administrator,” and proceed.

Problem: RStudio cannot find R

Solution: In RStudio, go to Tools → Global Options → General. Under “R Sessions,” click “Change” and manually navigate to C:\Program Files\R\R-x.x.x\bin\x64\R.exe.

Problem: Packages won’t install (“cannot remove prior installation”)

Solution: Close all R instances, navigate to your R library folder (C:\Users\[You]\Documents\R\win-library\[version]), delete the problematic package folder, and try installing again.

Appendix 1.B: macOS-Specific Issues

Problem: “R” command not found in Terminal

Solution: R doesn’t automatically add itself to your PATH. Add this line to your ~/.zprofile or ~/.bash_profile:

export PATH="/Library/Frameworks/R.framework/Resources/bin:$PATH"

Then open a new terminal window.

Problem: “gfortran” error when installing packages

Solution: Some R packages need the Fortran compiler. Install the R tools from https://cran.r-project.org/bin/macosx/tools/ or use:

brew install gcc

Problem: RStudio won’t launch on Apple Silicon

Solution: Make sure you downloaded the “ARM64” version of both R and RStudio, not the Intel version.

Appendix 1.C: Linux-Specific Issues

Problem: Package manager doesn’t have the latest R

Solution: Add the CRAN repository to your package manager. For Ubuntu/Debian:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/"
sudo apt update
sudo apt install r-base

Problem: Missing dependencies (“libssl-dev not found”)

Solution: Install development tools before R:

sudo apt install build-essential libcurl4-openssl-dev libssl-dev libxml2-dev

Appendix 1.D: Verifying Python-R Integration

To confirm reticulate is working properly, run this R code:

library(reticulate)

# Check detected Python installations
reticulate::py_config()

# Run a Python command from R
py_run_string("x = [1, 2, 3, 4, 5]; print(f'Sum: {sum(x)}')")

# Pass data from R to Python
r_vector <- c(10, 20, 30, 40, 50)
py_assign("py_vector", r_vector)
py_run_string("print(f'Python received: {py_vector}')")

# Pass data from Python to R
py_run_string("result = sum([1, 2, 3, 4, 5])")
py$result  # Access Python variable 'result' in R

If all commands execute without error, your R-Python integration is correctly configured.

Appendix 1.E: Virtual Environments and Reproducibility

Virtual environments isolate project dependencies. This is a best practice in professional analytics.

Creating a Python virtual environment:

python -m venv /path/to/my_project/venv

Activating it:

# macOS/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

Installing packages only to this environment:

pip install pandas numpy scikit-learn

Saving dependencies for reproducibility:

pip freeze > requirements.txt

Later, on another machine, recreate the environment:

python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt

Now everyone has identical package versions—critical for reproducible research.

--- title: "Setting Up Your Analytics Environment" number-sections: false --- ```{python} #| label: python-setup-01-setup #| include: false import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns ``` ## Learning Objectives ::: {.callout-note icon="false"} ## 📘 What You'll Learn in This Chapter By the end of this chapter, you will be able to: - Install and configure RStudio (open-source) and Positron (modern IDE) on Windows, macOS, and Linux - Set up Python both as a standalone environment and integrated with R via reticulate - Understand the strengths and trade-offs between R and Python for business analytics - Install and manage key packages for data science in both languages - Create, edit, and render your first Quarto document to both HTML and PDF - Load, explore, and visualize real Nigerian business data in both R and Python - Recognize the importance of reproducible research and literate programming ::: --- ## Why RStudio and Positron? A Comparison of Modern Analytics Environments When you sit down to write a financial analysis or build a machine learning model, your choice of tools shapes not just how fast you work, but how clearly you think. The environment you choose—the text editor, the console, the file browser—becomes an extension of your mind. For decades, statisticians and data analysts have relied on R: a language born in 1995 from a community that values statistical rigor, reproducibility, and transparency. R is free, open-source, and has evolved into one of the world's most powerful ecosystems for data science. Today, you have two main choices for your R environment: **RStudio** and **Positron**. **RStudio** (now called Posit after its parent company rebranded) is the industry standard. It was built specifically for R and has matured into a complete integrated development environment (IDE). If you've worked with RStudio before, it feels like home: a script editor on the left, a console on the right, packages and environment panes below. Countless thousands of data scientists and statisticians around the world use it daily. It's stable, well-documented, and production-ready. **Positron** is newer—Posit released it in 2024 to challenge the IDE space itself. Positron is built on Visual Studio Code, the lightweight, fast, open-source editor from Microsoft that dominates software development. Positron brings a modern feel: faster startup times, a cleaner interface, native support for both R and Python side by side, and rich extensions. It's built for teams that work polyglot—mixing R and Python in the same project—which is increasingly common in business analytics. So which should you choose? - **Choose RStudio if:** You're primarily working in R, you value stability and established workflows, your organization already uses RStudio, or you prefer a purpose-built environment designed specifically for statistics and data science. - **Choose Positron if:** You want to work fluidly between R and Python in the same session, you're comfortable with modern development environments, you appreciate a cleaner UI, or you're building teams that need to collaborate across languages. This book will teach you both environments and demonstrate how to use them together. The good news: the underlying languages and workflows are identical. What differs is the look, feel, and some convenience features. **Why both R and Python?** In the real world, you'll encounter both. Many financial institutions and large enterprises use Python for production systems and R for statistical modeling. African tech startups increasingly favor Python for web backends but reach for R for analytical reports. By learning both, you're not just picking up two syntax patterns—you're learning two different philosophical approaches to solving problems. R thinks in terms of vectors and data frames; Python thinks in terms of sequences and objects. Together, they make you a more flexible, more thoughtful analyst. --- ## Installing R and RStudio R and RStudio are separate pieces of software. R is the engine—the actual language interpreter and runtime. RStudio is the dashboard—the environment where you write code, see results, and manage your work. You must install R first, then RStudio. ### Step 1: Install R Navigate to [https://cran.r-project.org/](https://cran.r-project.org/), the Comprehensive R Archive Network. CRAN is the official repository of R packages and the source for the R language itself. **On Windows:** 1. Click "Download R for Windows" 2. Click "base" 3. Download the latest version (look for a link like "Download R 4.3.x for Windows") 4. Run the installer (.exe file) 5. Accept the default settings (location: `C:\Program Files\R\R-x.x.x`) 6. Complete the installation **On macOS:** 1. Click "Download R for macOS" 2. Choose the correct architecture: - **Apple Silicon** (M1/M2/M3 chips): download the "ARM64" file - **Intel** (older Macs): download the "x86_64" file 3. Open the .pkg file and follow the installer 4. R installs to `/Library/Frameworks/R.framework/` **On Linux (Ubuntu/Debian):** ```bash sudo apt update sudo apt install r-base r-base-dev ``` For Red Hat/Fedora: ```bash sudo dnf install R ``` ### Step 2: Verify Your R Installation Open a terminal (Command Prompt on Windows, Terminal on macOS/Linux) and type: ```bash R --version ``` You should see output like: ``` R version 4.3.2 (2023-10-31) -- "Eye Holes" ... ``` If you see a version number, R is installed correctly. ### Step 3: Install RStudio Navigate to [https://posit.co/download/rstudio-desktop/](https://posit.co/download/rstudio-desktop/). The website will detect your operating system automatically. **On Windows:** 1. Download the `.exe` installer 2. Run it and follow the default options 3. RStudio installs to `C:\Program Files\RStudio` by default **On macOS:** 1. Download the `.dmg` file 2. Open it and drag RStudio.app to the Applications folder 3. Launch RStudio from Applications **On Linux:** 1. Download the appropriate `.deb` (Debian/Ubuntu) or `.rpm` (Red Hat/Fedora) file 2. Install it: ```bash # Ubuntu/Debian sudo dpkg -i rstudio-x.x.x-amd64.deb # Red Hat/Fedora sudo dnf install rstudio-x.x.x-x86_64.rpm ``` ### Step 4: Launch RStudio and Test It Double-click RStudio (or type `rstudio` in your terminal). You should see a window with four panes: - **Top Left:** Script editor (empty for now) - **Top Right:** Environment and History - **Bottom Left:** Console (where R code runs) - **Bottom Right:** Files, Plots, Packages, Help In the console, type: ```r 2 + 2 ``` Press Enter. If you see `[1] 4`, everything works. Now type: ```r print("Hello from R!") ``` You should see `[1] "Hello from R!"` If both commands work, RStudio and R are correctly installed and talking to each other. --- ## Installing Positron Positron is the new kid on the block. It's free, built on Visual Studio Code, and designed for modern polyglot data science. ### Step 1: Install Positron Navigate to [https://positron.posit.co/](https://positron.posit.co/) and download the installer for your operating system. **On Windows:** 1. Download the `.exe` installer 2. Run it; accept the default installation path 3. Positron installs to `C:\Users\[YourUsername]\AppData\Local\Programs\Positron` by default **On macOS:** 1. Download the `.dmg` or `.zip` 2. Open the .dmg and drag Positron.app to Applications 3. Or unzip and move to Applications **On Linux:** Download the `.deb` or `.rpm` and install: ```bash sudo dpkg -i positron-x.x.x-amd64.deb # or sudo dnf install positron-x.x.x-x86_64.rpm ``` ### Step 2: Configure Positron to Use Your R Installation When you launch Positron for the first time, it will ask you to select the R installation. If Positron doesn't detect it automatically: 1. Press `Ctrl+,` (Windows/Linux) or `Cmd+,` (macOS) to open Settings 2. Search for "R executable" 3. Point it to your R installation: - **Windows:** `C:\Program Files\R\R-x.x.x\bin\R.exe` - **macOS:** `/Library/Frameworks/R.framework/Resources/bin/R` - **Linux:** `/usr/bin/R` ### Step 3: Test Positron 1. Open the console at the bottom of the window 2. Type `2 + 2` and press Enter 3. You should see `[1] 4` If it works, Positron is ready. The experience is similar to RStudio but with a more modern, code-editor feel. --- ## An Introduction to Quarto: Reproducible, Elegant Reporting Before we dive into Python, let's pause and talk about **Quarto**, because it's fundamental to your workflow as an analyst. For most of the 20th century, analysts did their work in secret: they ran analyses in software, scribbled notes in journals, created charts in graphics programs, and then wrote reports in Word or PowerPoint, copying numbers by hand and recreating plots. This process was error-prone, unreproducible, and fragile. If a data value changed, the whole process had to start again. **Quarto** changes this. Quarto is a system for literate programming—it lets you weave together narrative text (explanation, justification, storytelling) with code and its results in a single document. You write once, and when you render the document, all code runs, all plots regenerate, all tables update. If data changes next month, you re-render once and everything is fresh. Quarto documents are plain-text `.qmd` files that contain: - **YAML front matter** (metadata) - **Markdown text** (formatted prose) - **Code chunks** (R, Python, Julia, etc.) - **Inline code** (small expressions in text) When you render a Quarto document, it: 1. Reads your `.qmd` file 2. Executes all code chunks 3. Inserts results (numbers, tables, plots) into the document 4. Renders to HTML, PDF, Word, or other formats This is why this textbook uses Quarto: every chapter you read was written once, contains working code that is re-run regularly, and every number, table, and plot is guaranteed to be correct because it's generated from actual code, not copied by hand. --- ## Installing Python: Standalone and with R You have two choices: install Python standalone, or install it in a way that integrates with R. ### Option A: Python via Anaconda (Recommended for Beginners) **Anaconda** is a Python distribution that includes Python, a package manager called `conda`, and a large ecosystem of pre-installed data science packages. 1. Go to [https://www.anaconda.com/download/](https://www.anaconda.com/download/) 2. Download the Anaconda installer for your OS (look for the graphical installer) 3. Run the installer and follow the defaults 4. Anaconda installs to `~/anaconda3` or `C:\Users\[You]\anaconda3` To verify: ```bash python --version conda --version ``` You should see version numbers for both. ### Option B: Python via Miniconda (Lightweight Alternative) Miniconda is Anaconda's lightweight cousin—it includes Python and conda but fewer pre-installed packages. Use this if you want a minimal installation. 1. Go to [https://docs.conda.io/projects/miniconda/en/latest/](https://docs.conda.io/projects/miniconda/en/latest/) 2. Download the Miniconda installer for your OS 3. Run it; install to the default location ### Option C: Python via venv (Built-in, No Installation) Python 3.3+ includes `venv`, a lightweight virtual environment tool built into the language. If you already have Python installed: ```bash python -m venv ~/my_analytics_env ``` On Windows: ```bash my_analytics_env\Scripts\activate ``` On macOS/Linux: ```bash source my_analytics_env/bin/activate ``` Your terminal prompt will change to show `(my_analytics_env)` at the start. Now you can install packages: ```bash pip install pandas numpy scikit-learn matplotlib seaborn plotly statsmodels ``` ### Making Python Available to R: reticulate The R package **reticulate** lets you call Python code from within R. This is powerful: you can use R for some tasks and Python for others, all in the same document. In RStudio or Positron, install reticulate: ```r install.packages("reticulate") ``` Now tell reticulate where Python is: ```r library(reticulate) use_python("/path/to/python") # macOS/Linux use_python("C:\\Users\\[You]\\anaconda3\\python.exe") # Windows ``` Or, let reticulate find Python automatically: ```r library(reticulate) reticulate::py_config() # Shows detected Python installations ``` Test it: ```r library(reticulate) py_run_string("print('Hello from Python inside R!')") ``` If you see the greeting, reticulate is working. --- ## Installing Key Packages Now that your languages are installed, you need the essential libraries for data science and analytics. ### Essential R Packages ::: {.callout-note icon="false"} ## 📘 The Core R Data Science Stack The packages listed below form the foundation of modern R analytics. Together, they provide data manipulation, visualization, statistical modeling, and interconnection with Python. ::: In RStudio or Positron, run: ```r # Install core data science packages install.packages(c( "tidyverse", # Data manipulation (dplyr, ggplot2, tidyr, etc.) "tidymodels", # Machine learning framework "plotly", # Interactive visualization "reticulate", # R-Python integration "here", # Project-relative file paths "rmarkdown", # Document rendering (Quarto uses this) "quarto", # Quarto R interface "knitr" # Dynamic document generation )) ``` What each does: - **tidyverse:** A collection of packages for data manipulation and visualization. Includes `dplyr` (transforming data), `ggplot2` (beautiful graphics), `readr` (reading files), `tidyr` (reshaping data), and more. - **tidymodels:** A framework for building, evaluating, and tuning machine learning models. It's to machine learning what tidyverse is to data wrangling. - **plotly:** Create interactive, web-based visualizations. Perfect for dashboards and reports where users might want to hover over data points. - **reticulate:** Call Python from R, and pass data between languages seamlessly. - **here:** Solves file path problems. Instead of writing `../../../data/file.csv`, you write `here("data", "file.csv")`, and it works no matter where your project lives. ### Essential Python Packages In your terminal (with your virtual environment activated), run: ```bash pip install pandas numpy scikit-learn matplotlib seaborn plotly statsmodels jupyter ``` Or, if using conda: ```bash conda install pandas numpy scikit-learn matplotlib seaborn plotly statsmodels jupyter ``` What each does: - **pandas:** Data manipulation and analysis. Python's answer to R's data frames. - **numpy:** Numerical computing. Arrays, linear algebra, and mathematical functions. - **scikit-learn:** Machine learning. Classification, regression, clustering, and more. - **matplotlib & seaborn:** Visualization. Matplotlib is low-level and flexible; seaborn is higher-level and prettier. - **plotly:** Interactive visualizations (same as R's plotly). - **statsmodels:** Statistical modeling and hypothesis testing. - **jupyter:** Interactive notebooks for exploratory analysis. To verify all Python packages installed correctly, open Python and run: ```python import pandas as pd import numpy as np import sklearn import matplotlib.pyplot as plt import seaborn as sns print("All packages imported successfully!") ``` If you see the success message, you're ready. --- ## Your First Quarto Document Let's create and render your first Quarto document. This is your hello world for reproducible analytics. ### Creating a New Quarto Document in RStudio 1. Go to File → New File → Quarto Document 2. You'll see a dialog asking about output format; keep the default (HTML) 3. Click Create 4. A new `.qmd` file appears with template content The file should look like this: ```qmd --- title: "My First Quarto Document" format: html --- ## Quarto Quarto enables you to weave together content and executable code into finished documents. To learn more about Quarto see <https://quarto.org>. ## Running Code When you click the **Render** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: \`\`\`{r} 1 + 1 \`\`\` ## End ``` ### Understanding Quarto Structure **YAML Front Matter** (the part between `---` lines): ```yaml --- title: "My First Quarto Document" format: html --- ``` This metadata tells Quarto the document title and that you want HTML output. **Markdown Text:** Everything between the YAML and code chunks is formatted text. Use `#` for headings, `**bold**` for bold, `*italic*` for italics, and so on. **Code Chunks:** ```{r} # This is R code x <- c(1, 2, 3, 4, 5) mean(x) ``` When you render, this code runs and the result appears in the document. ### Rendering Your First Document Click the blue "Render" button (in RStudio, top-right of the editor). Quarto will: 1. Run all R code chunks 2. Capture the results 3. Convert Markdown to formatted text 4. Combine everything into a single HTML file 5. Open the HTML file in a viewer You should see a nicely formatted document with your calculation result. ### Creating a Quarto Document with Python To use Python in Quarto, add code chunks like this: ```{python} # Python code x = [1, 2, 3, 4, 5] print(sum(x) / len(x)) ``` For this to work, Quarto needs to know where Python is. In your YAML front matter, add: ```yaml --- title: "My First Quarto Document" format: html engine: knitr --- ``` And install the `quarto` R package (if you haven't): ```r install.packages("quarto") ``` Now your Python chunks will execute when you render. --- ## Case Study: Loading and Exploring Nigerian CPI Data Let's put everything together. We'll create a real analytics workflow: loading Nigerian economic data, exploring it, and visualizing it in both R and Python. ### The Data: Consumer Price Index The **Consumer Price Index (CPI)** measures inflation—the change in prices of goods and services over time. Nigeria's National Bureau of Statistics (NBS) publishes monthly CPI data. We'll create a realistic synthetic dataset representing monthly CPI values for Nigeria over the past three years. Create a new Quarto document called `nigerian_cpi_analysis.qmd` and add this content: ```qmd --- title: "Nigerian CPI Analysis" format: html --- # Nigerian Consumer Price Index Analysis This analysis explores Consumer Price Index trends in Nigeria from 2021 to 2024, demonstrating data loading, exploration, and visualization in both R and Python. ## Loading Data in R ::: {.panel-tabset} ## R \`\`\`{r} library(tidyverse) library(plotly) # Create synthetic but realistic Nigerian CPI data set.seed(42) cpi_data <- tibble( date = seq(from = as.Date("2021-01-01"), by = "month", length.out = 48), year = year(date), month = month(date), month_name = month.abb[month], cpi = 100 + cumsum(rnorm(48, mean = 2, sd = 1.5)) + seq(0, 20, length.out = 48) ) |> mutate( cpi = round(cpi, 2), # Add some seasonality cpi = cpi + 3 * sin(2 * pi * month / 12) ) head(cpi_data, 10) ``` ## Python \`\`\`{python} import pandas as pd import numpy as np import matplotlib.pyplot as plt from datetime import datetime, timedelta # Create synthetic but realistic Nigerian CPI data np.random.seed(42) dates = pd.date_range(start="2021-01-01", periods=48, freq="M") cpi_values = 100 + np.cumsum(np.random.normal(loc=2, scale=1.5, size=48)) + np.linspace(0, 20, 48) # Add seasonality months = np.array([d.month for d in dates]) cpi_values = cpi_values + 3 * np.sin(2 * np.pi * months / 12) cpi_data = pd.DataFrame({ "date": dates, "year": [d.year for d in dates], "month": [d.month for d in dates], "month_name": [d.strftime("%b") for d in dates], "cpi": np.round(cpi_values, 2) }) print(cpi_data.head(10)) ``` ::: ## Exploratory Data Analysis ### Summary Statistics ::: {.panel-tabset} ## R \`\`\`{r} # Summary statistics summary(cpi_data$cpi) # Grouped statistics by year cpi_data |> group_by(year) |> summarise( mean_cpi = mean(cpi), sd_cpi = sd(cpi), min_cpi = min(cpi), max_cpi = max(cpi), .groups = "drop" ) ``` ## Python \`\`\`{python} # Summary statistics print(cpi_data["cpi"].describe()) print("\n") # Grouped statistics by year print(cpi_data.groupby("year")["cpi"].agg([ ("mean", "mean"), ("std", "std"), ("min", "min"), ("max", "max") ])) ``` ::: ### Visualization: CPI Trends Over Time ::: {.panel-tabset} ## R \`\`\`{r} # Create an interactive plot p <- cpi_data |> ggplot(aes(x = date, y = cpi, color = as.factor(year))) + geom_line(linewidth = 1) + geom_point(size = 2) + labs( title = "Nigerian Consumer Price Index (2021-2024)", x = "Date", y = "CPI (Index)", color = "Year" ) + theme_minimal() + theme( plot.title = element_text(size = 14, face = "bold"), legend.position = "bottom" ) ggplotly(p) ``` ## Python \`\`\`{python} import plotly.express as px # Create interactive plot using plotly fig = px.line( cpi_data, x="date", y="cpi", color="year", title="Nigerian Consumer Price Index (2021-2024)", labels={"date": "Date", "cpi": "CPI (Index)", "year": "Year"}, markers=True ) fig.update_layout( hovermode="x unified", template="plotly_white" ) fig.show() ``` ::: ## Year-over-Year Change It's often useful to look at how CPI changed from one year to the next, especially when comparing inflation rates. ::: {.panel-tabset} ## R \`\`\`{r} # Calculate year-over-year change cpi_yoy <- cpi_data |> arrange(month, year) |> group_by(month) |> mutate( yoy_change = cpi - lag(cpi), yoy_pct_change = ((cpi - lag(cpi)) / lag(cpi)) * 100 ) |> ungroup() |> filter(!is.na(yoy_change)) head(cpi_yoy, 10) # Visualization of YoY change cpi_yoy |> filter(year > 2021) |> ggplot(aes(x = as.factor(month), y = yoy_pct_change, fill = as.factor(year))) + geom_col(position = "dodge") + labs( title = "Year-over-Year CPI Change by Month", x = "Month", y = "Percent Change (%)", fill = "Year" ) + theme_minimal() ``` ## Python \`\`\`{python} # Calculate year-over-year change cpi_data_sorted = cpi_data.sort_values(["month", "year"]).reset_index(drop=True) # Group by month and shift to get previous year's value cpi_data_sorted["yoy_change"] = cpi_data_sorted.groupby("month")["cpi"].diff() cpi_data_sorted["yoy_pct_change"] = (cpi_data_sorted["yoy_change"] / cpi_data_sorted.groupby("month")["cpi"].shift(1)) * 100 print(cpi_data_sorted.head(10)) # Visualization yoy_filtered = cpi_data_sorted[cpi_data_sorted["year"] > 2021].dropna(subset=["yoy_pct_change"]) fig = px.bar( yoy_filtered, x="month", y="yoy_pct_change", color="year", barmode="group", title="Year-over-Year CPI Change by Month", labels={"month": "Month", "yoy_pct_change": "Percent Change (%)"} ) fig.show() ``` ::: ## Key Insights From this analysis of Nigerian CPI data, we observe: 1. **Overall Inflation Trend:** CPI increased consistently from 2021 to 2024, reflecting Nigeria's inflation dynamics during this period. 2. **Seasonal Patterns:** There are visible monthly variations in CPI, with peaks and troughs repeating across years—this is typical of price indices. 3. **Year-over-Year Growth:** The year-over-year percentage changes show variation across months, with some months experiencing higher inflation than others. This is the foundation of time-series analysis, a critical skill in business analytics for understanding trends, forecasting, and decision-making. ``` ### Rendering the Document 1. Save this as `nigerian_cpi_analysis.qmd` 2. Click Render 3. Quarto will execute all R and Python code, generate the plots, and produce a beautiful HTML report Congratulations! You've just created a reproducible analytics document that could be shared with colleagues, stakeholders, or supervisors. If the underlying data changes next month, you simply re-render and everything updates automatically. --- ## Section Review Questions ::: {.callout-caution icon="false"} ## 📝 Review Questions: Setting Up Your Environment 1. What is the difference between R and RStudio, and why do you need both? 2. When would you choose Positron over RStudio, and what are the trade-offs? 3. Explain the concept of "literate programming" and why Quarto embodies this principle. What are two advantages of writing analyses in Quarto versus in a traditional Word document? 4. You want to use Python for machine learning but prefer R's data manipulation tools. How would you set this up, and what R package makes this seamless? 5. What do the packages `tidyverse`, `tidymodels`, and `reticulate` do, and in what scenario would you use all three together? ::: --- ## Further Reading - **RStudio (Posit) Documentation:** https://posit.co/resources/ - **Positron Documentation & Guide:** https://positron.posit.co/docs/ - **Quarto Documentation:** https://quarto.org/docs/ - **Grolemund, G., & Wickham, H. (2017).** *R for Data Science: Import, Tidy, Transform, Visualize, and Model Data.* O'Reilly Media. (Classic reference for tidyverse) - **Python for Data Analysis (3rd ed.)** by Wes McKinney (2022)—covers pandas, numpy, and data manipulation in Python --- ## Chapter Appendix: Detailed Installation Troubleshooting ### Appendix 1.A: Windows-Specific Issues **Problem: R installation fails with "Administrator rights required"** Solution: Right-click the R installer, select "Run as Administrator," and proceed. **Problem: RStudio cannot find R** Solution: In RStudio, go to Tools → Global Options → General. Under "R Sessions," click "Change" and manually navigate to `C:\Program Files\R\R-x.x.x\bin\x64\R.exe`. **Problem: Packages won't install ("cannot remove prior installation")** Solution: Close all R instances, navigate to your R library folder (`C:\Users\[You]\Documents\R\win-library\[version]`), delete the problematic package folder, and try installing again. ### Appendix 1.B: macOS-Specific Issues **Problem: "R" command not found in Terminal** Solution: R doesn't automatically add itself to your PATH. Add this line to your `~/.zprofile` or `~/.bash_profile`: ```bash export PATH="/Library/Frameworks/R.framework/Resources/bin:$PATH" ``` Then open a new terminal window. **Problem: "gfortran" error when installing packages** Solution: Some R packages need the Fortran compiler. Install the R tools from https://cran.r-project.org/bin/macosx/tools/ or use: ```bash brew install gcc ``` **Problem: RStudio won't launch on Apple Silicon** Solution: Make sure you downloaded the "ARM64" version of both R and RStudio, not the Intel version. ### Appendix 1.C: Linux-Specific Issues **Problem: Package manager doesn't have the latest R** Solution: Add the CRAN repository to your package manager. For Ubuntu/Debian: ```bash sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/" sudo apt update sudo apt install r-base ``` **Problem: Missing dependencies ("libssl-dev not found")** Solution: Install development tools before R: ```bash sudo apt install build-essential libcurl4-openssl-dev libssl-dev libxml2-dev ``` ### Appendix 1.D: Verifying Python-R Integration To confirm reticulate is working properly, run this R code: ```r library(reticulate) # Check detected Python installations reticulate::py_config() # Run a Python command from R py_run_string("x = [1, 2, 3, 4, 5]; print(f'Sum: {sum(x)}')") # Pass data from R to Python r_vector <- c(10, 20, 30, 40, 50) py_assign("py_vector", r_vector) py_run_string("print(f'Python received: {py_vector}')") # Pass data from Python to R py_run_string("result = sum([1, 2, 3, 4, 5])") py$result # Access Python variable 'result' in R ``` If all commands execute without error, your R-Python integration is correctly configured. ### Appendix 1.E: Virtual Environments and Reproducibility Virtual environments isolate project dependencies. This is a best practice in professional analytics. **Creating a Python virtual environment:** ```bash python -m venv /path/to/my_project/venv ``` **Activating it:** ```bash # macOS/Linux source venv/bin/activate # Windows venv\Scripts\activate ``` **Installing packages only to this environment:** ```bash pip install pandas numpy scikit-learn ``` **Saving dependencies for reproducibility:** ```bash pip freeze > requirements.txt ``` Later, on another machine, recreate the environment: ```bash python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt ``` Now everyone has identical package versions—critical for reproducible research.