*************
Probabilities
*************


Why are we talking about probabilities?
=======================================

.. rst-class:: bignums-tip

1.  Working with mobile robots, it's working with uncertainty.

    *  Uncertainty can come from the error in the motion control.

    *  Uncertainty can come from the measurement errors of the sensors.

2.  To reduce the uncertainty, we need an explicit representation of the uncertainty.

    *  The uncertainty is often represented with probability theory.

Probabilistic inference is the process of calculating these laws for random variables that are derived from other random variables and the observed data.

Discrete Random Variables
=========================

.. important:: This is something that you should already know!


*  We denote :math:`X` a random variable.
*  And :math:`x` a value that `X` could take.
*  :math:`p(X=x)` or :math:`p(x)` represents the probability that :math:`X` takes the value :math:`x`.

.. admonition:: Example

    If you flip a coin that you can obtain ever *heads* or *tails*.

    :math:`p(X=head)=p(X=tail)=\frac{1}{2}`

Discrete probabilities always sum to 1:

.. math::

    \sum_x p(X=x)=1

.. note::

    Probabilities are always non-negative: :math:`p(X=x)\ge 0`.

Continuous Random Variables
===========================

You will see that in robotics, we usually address estimation and decision-making in continuous spaces.

*  We denote :math:`X` is a random continuous variable.
*  We assume that all continuous random variables possess *probability density functions* (PDFs).

.. admonition:: Activity
    :class: activity

    Can you give me one common density function?

A very common one:

*  The one-dimensional normal distribution with mean :math:`\mu` and variance :math:`\sigma^2`.
*  The PDF of a normal distribution is a Gaussian function.

Gaussian function
    **PDF**: :math:`p(x)=(2\pi\sigma^2)^{-\frac{1}{2}}\exp\{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}\}`

    **Abbreviation**: :math:`\mathcal{N}(x;\mu;\sigma^2)`

    :math:`x` is a scalar value.


.. admonition:: Example
    :class: example

    .. code::

        import matplotlib.pyplot as plt
        import numpy as np
        import scipy.stats as stats
        import math

        mu = 0
        variance = 1
        sigma = math.sqrt(variance)
        x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
        plt.plot(x, stats.norm.pdf(x, mu, sigma))
        plt.show()

    .. figure:: ./pyplots/gaussian.png
        :width: 80 %
        :align: center

If we have more than one parameter:

*  :math:`x` becomes a multi-dimensional vector.
*  Normal distributions over vectors are called multivariate

Multivariate normal distribution
    **PDF**: :math:`p(x)=\det (2\pi\Sigma)^{-\frac{1}{2}}\exp\{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\}`

    **covariance matrix**: :math:`\Sigma` is a *positive semidefinite* and *symmetric* matrix.

As discrete probabilities, continuous probabilities always sum to 1:

.. math::

    \int p(x)dx = 1


.. admonition:: Example
    :class: example

    .. code::

        import numpy as np
        import matplotlib.pyplot as plt
        from scipy.stats import multivariate_normal
        from mpl_toolkits.mplot3d import Axes3D

        #Parameters to set
        mu_x = 0
        variance_x = 3

        mu_y = 0
        variance_y = 15

        #Create grid and multivariate normal
        x = np.linspace(-10,10,500)
        y = np.linspace(-10,10,500)
        X, Y = np.meshgrid(x,y)
        pos = np.empty(X.shape + (2,))
        pos[:, :, 0] = X; pos[:, :, 1] = Y
        rv = multivariate_normal([mu_x, mu_y], [[variance_x, 0], [0, variance_y]])

        #Make a 3D plot
        fig = plt.figure()
        ax = fig.gca(projection='3d')
        ax.plot_surface(X, Y, rv.pdf(pos),cmap='viridis',linewidth=0)
        ax.set_xlabel('X axis')
        ax.set_ylabel('Y axis')
        ax.set_zlabel('Z axis')
        plt.show()

    .. figure:: ./pyplots/multivariate.png
        :align: center
        :width: 80 %

Joint and Conditional probability
=================================

Joint distribution
    **Formula**: :math:`p(X=x \text{ and } Y=y) = p(x,y)`

Independence
    **Formula**: :math:`p(x)p(y) = p(x,y)`

    **Definition**: Describes the probability of the event that the random variable :math:`X` takes on the value :math:`x` and that :math:`Y` takes on the value :math:`y`.

Conditional probability
    **Formula**: :math:`p(X=x|Y=y) = p(x|y)`

    **Definition**: Knowing :math:`y`, the probability that :math:`X=x` is conditioned to :math:`y`.

    If :math:`p(y)>0` then the conditional probability is defined as:

    .. math::
    
        p(x|y) = \frac{p(x,y)}{p(y)}

    If :math:`Y` and :math:`X` are independent:

    .. math::

        p(x|y) = \frac{p(x)p(y)}{p(y)} = p(x)


Law of Total Probability
========================

From the definition of the conditional probability and the axioms of probability measures is referred to as *Theorem of total probability*.

In the discrete case: :math:`p(x)=\sum_y p(x|y)p(y)`.

In the continuous case: :math:`p(x) = \int p(x|y)p(y)dy`.

Bayes formulas
==============

Bayes rule
    **Discrete case**: 
    
    .. math::

        p(x|y) = \frac{p(y|x)p(x)}{p(y)} = \frac{p(y|x)p(x)}{\sum_{x'}p(y|x')p(x')}

    **Continuous case**:

    .. math::

        p(x|y) = \frac{p(y|x)p(x)}{p(y)} = \frac{p(y|x)p(x)}{\int p(y|x')p(x')dx'}


Why is this important?
======================

.. rst-class:: bignums-tip

1.  If :math:`x` is a quantity that we would like to infer from :math:`y` (like a position).

2.  The probability :math:`p(x)` will be referred as the *prior probability distribution* and :math:`y` is called the *data* (sensor measurement).

3.  :math:`p(x)` summarize the knowledge we have on :math:`X` prior to incorporating the data :math:`y`.

4.  :math:`p(x|y)` is called the *posterior probability distribution* over :math:`X`.

The Bayes formula can be formulate as:

.. math::

    p(x|y) = \frac{p(y|x)p(x)}{p(y)} = \frac{likelihood \times prior}{evidence}

.. note::

    :math:`p(y)` does not depend on :math:`x`. Thus :math:`p(y)^{-1}` is the called the normalizer and denoted :math:`\eta`.

    :math:`p(x|y) = \eta p(y|x)p(x)`

.. admonition:: Activity
    :class: activity

    You are planning a picnic today, but the morning is cloudy.

    *  Oh no! 50% of all rainy days start off cloudy!
    *  But cloudy mornings are common (about 40% of days start cloudy)
    *  And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)

    **What is the chance of rain during the day?**

    *  We will use `Rain` to mean rain during the day, and `Cloud` to mean cloudy morning.

    *  The chance of `Rain` given `Cloud` is written `P(Rain|Cloud)`.


Conditioning
============

We can condition the Bayes rule on more than one variable.

For example, we can have a condition on :math:`Z = z`:

.. math::

    p(x|y,z) = \frac{p(y|x,z)p(x|z)}{p(y|z)}

as long as :math:`p(y|z)>0`.

It also means that :math:`p(x|y)=\int p(x|y,z)p(z|y)dz`.

Similarly, we can condition the rule for combining probabilities of independent random variables on other variables :math:`z`:

.. math::

    p(x, y | z) = p(x | z) p(y | z)

Such a relation is known as *conditional independence*. And it is equivalent to

*  :math:`p(x|z) = p(x|z,y)`
  
*  :math:`p(y|z) = p(y|z,x)`

Expectations of random variables
================================

The expected value of random variable is denoted :math:`E[X]`.

*  You can think of it as the "average" value attained by random variable.

*  In fact, it also called its **mean**.

Expected value
    **discrete case**: :math:`E[X] = \sum_x xp(X=x)`

    **continuous case**: :math:`E[X]=\int xp(x)dx`

.. admonition:: Activity
    :class: activity

    Calculate the expected value of a roll of die.

Covariance
    The covariance measure the squared expected deviation from the mean.

    :math:`\text{Cov}[X] = E[X-E[X]]^2 =E[X^2]-E[X]^2`