登录
原创

瑞利商定义和广义瑞利商

专栏苏州谷歌开发者社区
发布于 2020-11-20 阅读 1585
  • 机器学习
原创

最近在学习LDA,公式推导中很重要的部分就是瑞利商和广义瑞利商。

瑞利商定义

瑞利商函数是指这样的函数𝑅(𝐴,𝑥)

<math><semantics><mrow><mi>R</mi><mo>(</mo><mi>A</mi><mo separator="true">,</mo><mi>x</mi><mo>)</mo><mo>=</mo><mfrac><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>A</mi><mi>x</mi></mrow><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>x</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">R(A,x) = \frac{X^{H}Ax}{X^{H}x} </annotation></semantics></math>R(A,x)=XHxXHAx

其中𝐴为𝑛×𝑛的Hermitan矩阵。Hermitan矩阵,就是满足共轭转置矩阵和自己相等的矩阵,<math><semantics><mrow><msup><mi>A</mi><mrow><mi>H</mi></mrow></msup><mo>=</mo><mi>A</mi></mrow><annotation encoding="application/x-tex">A^{H}=A</annotation></semantics></math>AH=A<math><semantics><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup></mrow><annotation encoding="application/x-tex">X^{H}</annotation></semantics></math>XH<math><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math>X的共轭转置矩阵。

共轭转置矩阵
矩阵有实数矩阵和复数矩阵。
转置矩阵仅仅是将矩阵的行与列对换
共轭转置矩阵在将行与列对换后还要讲每个元素共轭一下
共轭就是将形如a+bi的数变成a-bi,实数的共轭是它本身。
所以,实数矩阵的共轭转置矩阵就是转置矩阵,复数矩阵的共轭转置矩阵就是上面所说的行列互换后每个元素取共轭

瑞利商的性质

瑞利商𝑅(𝐴,𝑥)有一个非常重要的性质,即它的最大值等于矩阵𝐴最大的特征值,而最小值等于矩阵𝐴的最小的特征值,也就是满足

<math><semantics><mrow><msub><mi>λ</mi><mrow><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub><mo>≤</mo><mfrac><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>A</mi><mi>x</mi></mrow><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>x</mi></mrow></mfrac><mo>≤</mo><msub><mi>λ</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></mrow><annotation encoding="application/x-tex">\lambda_{min} \leq \frac{X^{H}Ax}{X^{H}x} \leq \lambda_{max} </annotation></semantics></math>λminXHxXHAxλmax

当向量𝑥是标准正交基时,即满足<math><semantics><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>x</mi><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">X^{H}x=1</annotation></semantics></math>XHx=1时,瑞利商退化为:<math><semantics><mrow><mi>R</mi><mo>(</mo><mi>A</mi><mo separator="true">,</mo><mi>x</mi><mo>)</mo><mo>=</mo><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>A</mi><mi>X</mi></mrow><annotation encoding="application/x-tex">R(A,x)=X^{H}AX</annotation></semantics></math>R(A,x)=XHAX,这个形式在谱聚类和PCA中都有出现。

广义瑞利商

广义瑞利商是指这样的函数𝑅(𝐴,𝐵,𝑥):

<math><semantics><mrow><mi>R</mi><mo>(</mo><mi>A</mi><mo separator="true">,</mo><mi>B</mi><mo separator="true">,</mo><mi>x</mi><mo>)</mo><mo>=</mo><mfrac><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>A</mi><mi>x</mi></mrow><mrow><msup><mi>X</mi><mrow><mi>H</mi></mrow></msup><mi>B</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">R(A,B,x) = \frac{X^{H}Ax}{X^{H}Bx} </annotation></semantics></math>R(A,B,x)=XHBxXHAx

其中𝑥为非零向量,而𝐴,𝐵为𝑛×𝑛的Hermitan矩阵。𝐵为正定矩阵

正定矩阵
正定和半正定这两个词的英文分别是positive definite和positive semi-definite,其中,definite是一个形容词,表示“明确的、确定的”等意思。

【定义】(狭义定义)给定一个大小为 𝑛×𝑛 的实对称矩阵<math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math>A,若对于任意长度为<math><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math>n的非零向量<math><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math>x,有 <math><semantics><mrow><msup><mi>X</mi><mrow><mi>T</mi></mrow></msup><mi>A</mi><mi>X</mi><mo>≥</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">X^{T}AX \geq 0</annotation></semantics></math>XTAX0 恒成立,则矩阵<math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math>A是一个正定矩阵。
单位矩阵是正定矩阵 (positive definite)。

半正定矩阵
【定义2】(狭义定义)给定一个大小为 [公式] 的实对称矩阵 [公式] ,若对于任意长度为<math><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math>n的向量<math><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math>x,有 <math><semantics><mrow><msup><mi>X</mi><mrow><mi>T</mi></mrow></msup><mi>A</mi><mi>X</mi><mo>></mo><mn>0</mn></mrow><annotation encoding="application/x-tex">X^{T}AX > 0</annotation></semantics></math>XTAX>0恒成立,则矩阵<math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math>A是一个半正定矩阵。

它的最大值和最小值是什么呢?其实我们只要通过将其通过标准化就可以转化为瑞利商的格式。我们令𝑥=𝐵^{−1/2}𝑥^{′},(𝑥^{′}是新定义的一个向量,待求值)则分母转化为:

<math><semantics><mrow><msup><mi>x</mi><mi>H</mi></msup><mi>B</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">x^HBx</annotation></semantics></math>xHBx
= x'^H(B^{-1/2})^HBB^{-1/2}x'
=x'^HB^{-1/2}BB^{-1/2}x'=x'^Hx'
其中<math><semantics><mrow><mo>(</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mi>H</mi></msup></mrow><annotation encoding="application/x-tex">(B^{-1/2})^H</annotation></semantics></math>(B1/2)H,由于<math><semantics><mrow><mo>(</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mi>H</mi></msup><mo>=</mo><mo>(</mo><msup><mi>B</mi><mrow><mi>H</mi></mrow></msup><msup><mo>)</mo><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup></mrow><annotation encoding="application/x-tex">(B^{-1/2})^H=(B^{H})^{-1/2}</annotation></semantics></math>(B1/2)H=(BH)1/2<math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math>B是Hermitan矩阵,所以<math><semantics><mrow><mo>(</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mi>H</mi></msup><mo>=</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup></mrow><annotation encoding="application/x-tex">(B^{-1/2})^H=B^{-1/2}</annotation></semantics></math>(B1/2)H=B1/2<math><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mi>B</mi><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mo>=</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mi>B</mi><mo>=</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mi>B</mi><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">B^{-1/2}BB^{-1/2}=B^{-1/2}B^{-1/2}B=B^{-1}B=1</annotation></semantics></math>B1/2BB1/2=B1/2B1/2B=B1B=1

而分子转化为:

x^HAx=x'^HB^{-1/2}AB^{-1/2}x'

此时我们的𝑅(𝐴,𝐵,𝑥)转化为𝑅(𝐴,𝐵,𝑥^{′}):

R(A,B,x') = \frac{x'^HB^{-1/2}AB^{-1/2}x'}{x'^Hx'}

利用前面的瑞利商的性质,我们可以很快的知道,𝑅(𝐴,𝐵,𝑥^{′})的最大值为矩阵<math><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mi>A</mi><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup></mrow><annotation encoding="application/x-tex">B^{-1/2}AB^{-1/2}</annotation></semantics></math>B1/2AB1/2的最大特征值。
由于方阵的特征值等于方阵转置的特征值,所以<math><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mi>A</mi><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup></mrow><annotation encoding="application/x-tex">B^{-1/2}AB^{-1/2}</annotation></semantics></math>B1/2AB1/2的特征值等于 <math><semantics><mrow><mo>(</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mi>A</mi><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mrow><mi>T</mi></mrow></msup></mrow><annotation encoding="application/x-tex">(B^{-1/2}AB^{-1/2})^{T}</annotation></semantics></math>(B1/2AB1/2)T的特征值。

<math><semantics><mrow><mo>(</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mi>A</mi><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mrow><mi>T</mi></mrow></msup><mo>=</mo><mo>(</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mrow><mi>T</mi></mrow></msup><mo>(</mo><mi>A</mi><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mrow><mi>T</mi></mrow></msup><mo>=</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mo>(</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mo>)</mo><mrow><mi>T</mi></mrow></msup><msup><mi>A</mi><mrow><mi>T</mi></mrow></msup><mo>=</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mi>A</mi><mo>=</mo><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mi>A</mi></mrow><annotation encoding="application/x-tex">(B^{-1/2}AB^{-1/2})^{T}=(B^{-1/2})^{T}(AB^{-1/2})^{T}=B^{-1/2}(B^{-1/2})^{T}A^{T}=B^{-1/2}B^{-1/2}A=B^{-1}A</annotation></semantics></math>(B1/2AB1/2)T=(B1/2)T(AB1/2)T=B1/2(B1/2)TAT=B1/2B1/2A=B1A

所以𝑅(𝐴,𝐵,𝑥^{′})的最大值威矩阵<math><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mi>A</mi></mrow><annotation encoding="application/x-tex">B^{-1}A</annotation></semantics></math>B1A的最大特征值,而最小值为矩阵<math><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mi>A</mi></mrow><annotation encoding="application/x-tex">B^{-1}A</annotation></semantics></math>B1A的最小特征值。

评论区

PhD Candidate in Machine Learning

0

0

0

举报