About the Computing Environment
About the Computing Environment
Two Languages, One Workflow
This book uses both R and Python. Rather than asking you to choose one, it treats them as complementary tools that coexist comfortably in the same project — which is precisely how modern data scientists work.
- R excels at statistical modelling, publication-quality visualisation (
ggplot2), and working with tidy tabular data (tidyverse,tidymodels). The R ecosystem for statistics is mature and extraordinarily rich. - Python excels at machine learning pipelines (
scikit-learn), deep learning (PyTorch), text processing (spaCy,transformers), and production-grade data engineering (pandas,polars).
You do not need to master both languages before starting. Start with whichever you know, and use the other tab as a learning resource.
RStudio and Positron
RStudio (from Posit) is the dominant IDE for R users and now supports Python fully via the reticulate package and built-in Python interpreter support. It is free, open-source, and available on Windows, macOS, and Linux.
Positron is Posit’s next-generation IDE, designed from the ground up for multi-language data science. It treats R and Python as equally first-class citizens. If you are starting fresh, Positron is recommended — it is what this book is optimised for.
Both IDEs render Quarto documents (.qmd) natively, which is the file format used throughout this book.
Quarto
Quarto is the document format and publishing system used for this book. A .qmd file is a plain-text document that mixes narrative prose, mathematics (LaTeX), and executable code in R, Python, or both. When you render a .qmd file, Quarto executes the code, captures the output (tables, charts, numbers), and weaves everything into a polished HTML page or PDF document.
This means every chart and table in this book was generated by code that you can run yourself, modify, and learn from. Nothing is static.
Package Philosophy
R and Python packages used in this book are listed in full in Appendix A, with version numbers and installation commands. The general approach is:
- Use well-maintained, widely adopted packages with active communities
- Prefer the
tidyverse/tidymodelsecosystem in R for consistency - Prefer
scikit-learnpipelines in Python for reproducibility - Use
plotlyfor interactive visualisations (HTML) andggplot2/matplotlibfor static ones (PDF)
A Note on Computational Resources
All code in this book runs on a standard laptop — no GPU, no cloud computing required — unless explicitly noted (the deep learning chapters, Ch 33–34, include notes on GPU acceleration as an optional enhancement). Runtimes for the heavier computations are noted in the relevant chapters, with caching enabled by default so you do not have to re-run expensive models on every render.