3. Bayes Decision Theory (2)¶

3.1. Loss Function¶

State space \(\Omega\) has \(c\) classes, \(\Omega = \{\omega_1,\dots,\omega_c\}\)
We have \(a\) possible actions \(\{\alpha_1,\dots,\alpha_a\}\)
The loss function \(\lambda(\alpha_i|\omega_j)\) is the loss incurred for taking action \(\alpha_i\) when the class is \(\omega_j\).
Zero-one loss function

http://oa5omjl18.bkt.clouddn.com/2016_09_26_395257ca65dc711c4ca5cef31ada51c.png

Expected loss¶

The expected loss or conditional risk is by definition

\[R(\alpha_i|\mathbf{x}) = \sum_{j=1}^c\lambda(\alpha_i|\omega_j)P(\omega_j|\mathbf{x})\]

Zero-one conditional risk

\[R(\alpha_i|\mathbf{x}) = \sum_{j\not=i}P(\omega_j|\mathbf{x}) = 1 - P(\omega_i|\mathbf{x})\]

Overall risk¶

Define a decision rule \(\alpha(x)\), a mapping from the input feature space to an action \(\mathbb{R}^d\mapsto\{\alpha_1,\dots,\alpha_a\}\)
The overall risk is the expected loss associated with a given decision

rule. .. math:: R=ointR(alpha(mathbf{x}|mathbf{x}))p(mathbf{x})dmathbf{x} * Bayes decision rule gives us a method for minimizing the overall risk. * The Bayes Risk is the best we can do.

Bayes discrimnant¶

\(g_i(\mathbf{x})\) is a discriminant function for the i-th class.
This classifier will assign a class \(\omega_i\) to the feature vector \(\mathbf{x}\) if

\[g_i(\mathbf{x}) > g_j(\mathbf{x}) \forall j \not= i\]

The minimum conditional risk corresponds to the maximum discriminant.

\[g_i(\mathbf{x}) = −R(\alpha_i|\mathbf{x})\]

\(g_i(\mathbf{x})\) can be replaced by \(f(g_i(\mathbf{x}))\) where \(f(.)\) is a monotonically increasing function.

Bayes decision under normal density¶

Multivariate Gaussian in d dimensions

\[p(\mathbf{x})=\frac{1}{(2\pi)^{d/2}|\mathbf{\Sigma}|^{1/2}}\exp[-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu})]\]

If we assume normal densities, i.e., if \(p(\mathbf{x}|\omega_i) ∼ N(\mu_i, \Sigma_i)\), then the general discriminant is of the form

\[g_i(\mathbf{x}) = −\frac{1}{2}(\mathbf{x} − \mu_i)^\mathbf{T}\Sigma^{−1}_i(\mathbf{x} − \mu_i) −\frac{d}{2}\ln 2\pi −\frac{1}{2}\ln|\Sigma_i| + \ln P(\omega_i)\]

Special cases
- \(\Sigma_i=\sigma^2\mathbf{I}\), covariance matrixes of all classes are equal, each feature is indepentant and their variance are equal. Then the discriminant function takes on the form:
\[g_i(\mathbf{x}) = -\frac{||\mathbf{x} − \mu_i||^2}{2\sigma^2} + \ln P(\omega_i)\]

This means that *