Research School for Socio-Economic and
Natural Sciences of the Environment
Research School for Socio-Economic and
Natural Sciences of the Environment

Fundamentals of Probability and Statistics Using Python

Date: 09 August 2017 - 11 August 2017
Location: UFZ Leipzig, Germany

Target audience

Doctoral and Postdoctoral researchers from all disciplines working with data. No previous knowledge of statistics or Python are required


This course aims at understanding fundamental concepts in probability and statistical inference, as well as its application using Python.

The course uses a combination of lectures and participatory activities to understand the principles of statistical inference. It will answer questions such as: “Why do I need statistics?”, “What’s in a p-value?”, “How many replicates do I need”, and hopefully help avoid questions such as “Why are my 5 years of data useless?”

The course will cover all steps of statistical inference: design of experiments, data acquisition, basic statistical analyses, and basic graphs. A large portion of the course will be dedicated to practical exercises using the language Python. The course will also include instruction on the use of Python along every step of the process. At the end of the course, participants will have the opportunity to do a project with their own data (or freely available data).

Didactic aim

Understand probability and probability distributions

  • Random sampling
  • Discrete distributions
  • Continuous distributions

Perform and interpret descriptive statistics

  • Mean, median, mode, standard deviation, covariance
  • Graphical display of distributions
  • Graphs for grouped data

Use and interpret basic statistical tests

  • One- and two-sample tests, t-test
  • Null hypothesis significance testing
  • Pearson correlation

Use and interpret (Ordinary Least Squares) linear regression

  • Residuals, and diagnostic plots
  • prediction, and confidence bands
  • Graphical presentation

Experimental design (with relevant to most people in hydrology/hydrogeology)

  • sampling design
  • confounding variables
  • collinearity
  • pseudo-replication
  • estimate statistical power, effect size, and required sample size

> More information, Module 2017-35