3. Bayes Decision Theory (2)

3.1. Loss Function

  • State space \(\Omega\) has \(c\) classes, \(\Omega = \{\omega_1,\dots,\omega_c\}\)
  • We have \(a\) possible actions \(\{\alpha_1,\dots,\alpha_a\}\)
  • The loss function \(\lambda(\alpha_i|\omega_j)\) is the loss incurred for taking action \(\alpha_i\) when the class is \(\omega_j\).
  • Zero-one loss function
http://oa5omjl18.bkt.clouddn.com/2016_09_26_395257ca65dc711c4ca5cef31ada51c.png

Expected loss

  • The expected loss or conditional risk is by definition
\[R(\alpha_i|\mathbf{x}) = \sum_{j=1}^c\lambda(\alpha_i|\omega_j)P(\omega_j|\mathbf{x})\]
  • Zero-one conditional risk
\[R(\alpha_i|\mathbf{x}) = \sum_{j\not=i}P(\omega_j|\mathbf{x}) = 1 - P(\omega_i|\mathbf{x})\]

Overall risk

  • Define a decision rule \(\alpha(x)\), a mapping from the input feature space to an action \(\mathbb{R}^d\mapsto\{\alpha_1,\dots,\alpha_a\}\)
  • The overall risk is the expected loss associated with a given decision

rule. .. math:: R=ointR(alpha(mathbf{x}|mathbf{x}))p(mathbf{x})dmathbf{x} * Bayes decision rule gives us a method for minimizing the overall risk. * The Bayes Risk is the best we can do.

Bayes discrimnant

  • \(g_i(\mathbf{x})\) is a discriminant function for the i-th class.
  • This classifier will assign a class \(\omega_i\) to the feature vector \(\mathbf{x}\) if
\[g_i(\mathbf{x}) > g_j(\mathbf{x}) \forall j \not= i\]
  • The minimum conditional risk corresponds to the maximum discriminant.
\[g_i(\mathbf{x}) = −R(\alpha_i|\mathbf{x})\]
  • \(g_i(\mathbf{x})\) can be replaced by \(f(g_i(\mathbf{x}))\) where \(f(.)\) is a monotonically increasing function.

Bayes decision under normal density

  • Multivariate Gaussian in d dimensions
\[p(\mathbf{x})=\frac{1}{(2\pi)^{d/2}|\mathbf{\Sigma}|^{1/2}}\exp[-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu})]\]
  • If we assume normal densities, i.e., if \(p(\mathbf{x}|\omega_i) ∼ N(\mu_i, \Sigma_i)\), then the general discriminant is of the form
\[g_i(\mathbf{x}) = −\frac{1}{2}(\mathbf{x} − \mu_i)^\mathbf{T}\Sigma^{−1}_i(\mathbf{x} − \mu_i) −\frac{d}{2}\ln 2\pi −\frac{1}{2}\ln|\Sigma_i| + \ln P(\omega_i)\]
  • Special cases
    • \(\Sigma_i=\sigma^2\mathbf{I}\), covariance matrixes of all classes are equal, each feature is indepentant and their variance are equal. Then the discriminant function takes on the form:
    \[g_i(\mathbf{x}) = -\frac{||\mathbf{x} − \mu_i||^2}{2\sigma^2} + \ln P(\omega_i)\]

    This means that *