About the Computing Environment

Two Languages, One Workflow

This book uses both R and Python. Rather than asking you to choose one, it treats them as complementary tools that coexist comfortably in the same project — which is precisely how modern data scientists work.

R excels at statistical modelling, publication-quality visualisation (ggplot2), and working with tidy tabular data (tidyverse, tidymodels). The R ecosystem for statistics is mature and extraordinarily rich.
Python excels at machine learning pipelines (scikit-learn), deep learning (PyTorch), text processing (spaCy, transformers), and production-grade data engineering (pandas, polars).

You do not need to master both languages before starting. Start with whichever you know, and use the other tab as a learning resource.

RStudio and Positron

RStudio (from Posit) is the dominant IDE for R users and now supports Python fully via the reticulate package and built-in Python interpreter support. It is free, open-source, and available on Windows, macOS, and Linux.

Positron is Posit’s next-generation IDE, designed from the ground up for multi-language data science. It treats R and Python as equally first-class citizens. If you are starting fresh, Positron is recommended — it is what this book is optimised for.

Both IDEs render Quarto documents (.qmd) natively, which is the file format used throughout this book.

Quarto

Quarto is the document format and publishing system used for this book. A .qmd file is a plain-text document that mixes narrative prose, mathematics (LaTeX), and executable code in R, Python, or both. When you render a .qmd file, Quarto executes the code, captures the output (tables, charts, numbers), and weaves everything into a polished HTML page or PDF document.

This means every chart and table in this book was generated by code that you can run yourself, modify, and learn from. Nothing is static.

Package Philosophy

R and Python packages used in this book are listed in full in Appendix A, with version numbers and installation commands. The general approach is:

Use well-maintained, widely adopted packages with active communities
Prefer the tidyverse / tidymodels ecosystem in R for consistency
Prefer scikit-learn pipelines in Python for reproducibility
Use plotly for interactive visualisations (HTML) and ggplot2/matplotlib for static ones (PDF)

A Note on Computational Resources

All code in this book runs on a standard laptop — no GPU, no cloud computing required — unless explicitly noted (the deep learning chapters, Ch 33–34, include notes on GPU acceleration as an optional enhancement). Runtimes for the heavier computations are noted in the relevant chapters, with caching enabled by default so you do not have to re-run expensive models on every render.

--- title: "About the Computing Environment" number-sections: false --- # About the Computing Environment {.unnumbered} ## Two Languages, One Workflow This book uses both **R** and **Python**. Rather than asking you to choose one, it treats them as complementary tools that coexist comfortably in the same project — which is precisely how modern data scientists work. - **R** excels at statistical modelling, publication-quality visualisation (`ggplot2`), and working with tidy tabular data (`tidyverse`, `tidymodels`). The R ecosystem for statistics is mature and extraordinarily rich. - **Python** excels at machine learning pipelines (`scikit-learn`), deep learning (`PyTorch`), text processing (`spaCy`, `transformers`), and production-grade data engineering (`pandas`, `polars`). You do not need to master both languages before starting. Start with whichever you know, and use the other tab as a learning resource. ## RStudio and Positron **RStudio** (from Posit) is the dominant IDE for R users and now supports Python fully via the `reticulate` package and built-in Python interpreter support. It is free, open-source, and available on Windows, macOS, and Linux. **Positron** is Posit's next-generation IDE, designed from the ground up for multi-language data science. It treats R and Python as equally first-class citizens. If you are starting fresh, Positron is recommended — it is what this book is optimised for. Both IDEs render Quarto documents (`.qmd`) natively, which is the file format used throughout this book. ## Quarto **Quarto** is the document format and publishing system used for this book. A `.qmd` file is a plain-text document that mixes narrative prose, mathematics (LaTeX), and executable code in R, Python, or both. When you render a `.qmd` file, Quarto executes the code, captures the output (tables, charts, numbers), and weaves everything into a polished HTML page or PDF document. This means every chart and table in this book was generated by code that you can run yourself, modify, and learn from. Nothing is static. ## Package Philosophy R and Python packages used in this book are listed in full in Appendix A, with version numbers and installation commands. The general approach is: - Use well-maintained, widely adopted packages with active communities - Prefer the `tidyverse` / `tidymodels` ecosystem in R for consistency - Prefer `scikit-learn` pipelines in Python for reproducibility - Use `plotly` for interactive visualisations (HTML) and `ggplot2`/`matplotlib` for static ones (PDF) ## A Note on Computational Resources All code in this book runs on a standard laptop — no GPU, no cloud computing required — unless explicitly noted (the deep learning chapters, Ch 33–34, include notes on GPU acceleration as an optional enhancement). Runtimes for the heavier computations are noted in the relevant chapters, with caching enabled by default so you do not have to re-run expensive models on every render.