Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schur complementary portfolio - difference between theoretical formulation and code #36

Open
Marzio-USI opened this issue May 7, 2024 · 3 comments

Comments

@Marzio-USI
Copy link

When inspecting the Shur complementary portfolio, I encountered some issues.

In the formulation of your presentation this can be found:

1*WJUNzwuJfEXrJmD0Qkot6g
Screenshot 2024-05-07 at 18 06 19

with
$$A^{c}(\gamma)= A - \gamma B D^{-1}C$$
By following the third step of the algorithm (Augment and allocate) starting from

precise/precise/skaters/portfoliostatic/schurportfactory.py line 122

  # 3. Augment and allocate
  Ag, Dg, info = schur_augmentation(A=A, B=B, C=C, D=D, gamma=gamma) <--------HERE

precise/precise/skaters/portfoliostatic/schurportutil.py , line 13

 def schur_augmentation(A,B,C,D, gamma):
    """
       Mess with A, D to try to incorporate some off-diag info
    """
    if gamma>0.0:
        max_gamma = _maximal_gamma(A=A, B=B, C=C, D=D)
        augA, bA = pseudo_schur_complement(A=A, B=B, C=C, D=D, gamma=gamma * max_gamma) # <-------------HERE
        augD, bD = pseudo_schur_complement(A=D, B=C, C=B, D=A, gamma=gamma * max_gamma) # <---------HERE TOO

        augmentation_fail = False
        if not is_positive_def(augA):
            try:
                Ag = nearest_pos_def(augA)
            except np.linalg.LinAlgError:
                augmentation_fail=True
        else:
            Ag = augA
        if not is_positive_def(augD):
            try:
                Dg = nearest_pos_def(augD)
            except np.linalg.LinAlgError:
                augmentation_fail=True
        else:
            Dg = augD

        if augmentation_fail:
            print('Warning: augmentation failed')
            reductionA = 1.0
            reductionD = 1.0
            reductionRatioA = 1.0
            Ag = A
            Dg = D
        else:
            reductionD = np.linalg.norm(Dg)/np.linalg.norm(D)
            reductionA = np.linalg.norm(Ag)/np.linalg.norm(A)
            reductionRatioA = reductionA/reductionD
    else:
        reductionRatioA = 1.0
        reductionA = 1.0
        reductionD = 1.0
        Ag = A
        Dg = D

    info = {'reductionA': reductionA,
                'reductionD': reductionD,
                'reductionRatioA': reductionRatioA}
    return Ag, Dg, info

We arrive at this pseudo_schur_complement function where $A^{c}(\gamma)= A - \gamma B D^{-1}C$ is computed:

precise/precise/skaters/portfoliostatic/schurportutil.py , line 57

def  pseudo_schur_complement(A, B, C, D, gamma, lmbda=None, warn=False):
    """
       Augmented cov matrix for "A" inspired by the Schur complement
    """
    if lmbda is None:
        lmbda=gamma
    try:
        Ac_raw = schur_complement(A=A, B=B, C=C, D=D, gamma=gamma)  
        nA = np.shape(A)[0]
        nD = np.shape(D)[0]
        Ac = to_symmetric(Ac_raw)
        M = symmetric_step_up_matrix(n1=nA, n2=nD)
        Mt = np.transpose(M)
        BDinv = multiply_by_inverse(B, D, throw=False)
        BDinvMt = np.dot(BDinv, Mt)
        Ra = np.eye(nA) - lmbda * BDinvMt
        Ag = inverse_multiply(Ra, Ac, throw=False, warn=False)
    except np.linalg.LinAlgError:
        if warn:
            print('Pseudo-schur failed, falling back to A')
        Ag = A
    n = np.shape(A)[0]
    b = np.ones(shape=(n,1))
    return Ag, b

However after that the following operations are performed:

$$ \begin{aligned} Ag &= Ra^{-1} \cdot Ac\\ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot Ac \\ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(Ac_raw) \\ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(A^{c}(\gamma)) \\ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(A - \gamma B D^{-1}C) \\ \end{aligned} $$

So if we assume that $Ac$ is already symmetric (to simplify the process):

$$ \begin{aligned} Ag &= (I - \lambda B D^{-1}M^T)^{-1} \cdot (A - \gamma B D^{-1}C) \\ \end{aligned} $$

Which does not match what I expect from the presentation:

  1. "Before performing inter-group allocation we make a different modification. We multiply the precision of $A^c$ by $b_Ab_A^T$ element-wise (and similarly, multiply the precision of $D^c$ by $b_Db_D^T$)".

Meaning:

$$ A' = (A - \gamma B D^{-1}C)^{*b_A} $$

Specifically I dont understand where $M$ the symmetric step up matrix comes from and why $b_A$ is never computed.

And again when we perform the sub allocation:
precise/precise/skaters/portfoliostatic/schurportfactory.py lines 132 and 138

# Sub-allocate
wA = hierarchical_schur_complementary_portfolio(cov=Ag, port=port, port_kwargs=port_kwargs,
                                               alloc=alloc, alloc_kwargs=alloc_kwargs,
                                               splitter=splitter, splitter_kwargs=splitter_kwargs,
                                               seriator=seriator, seriator_kwargs=seriator_kwargs,
                                               seriation_depth = seriation_depth-1,
                                               delta=delta, gamma=gamma)
wD = hierarchical_schur_complementary_portfolio(cov=Dg, port=port, port_kwargs=port_kwargs,
                                                alloc=alloc, alloc_kwargs=alloc_kwargs,
                                                splitter=splitter, splitter_kwargs=splitter_kwargs,
                                                seriator=seriator, seriator_kwargs=seriator_kwargs,
                                                seriation_depth=seriation_depth - 1,
                                                delta=delta, gamma=gamma)

the same augmented matrix $Ag$ used to allocate in:

precise/precise/skaters/portfoliostatic/schurportfactory.py line 122

# 3. Augment and allocate
Ag, Dg, info = schur_augmentation(A=A, B=B, C=C, D=D, gamma=gamma)
aA, aD = alloc(covs=[Ag, Dg]) #<----------HERE

is also passed to the next "iteration", while on the slides I found:

  1. The intra-group allocation pertaining to block $A$ is determined by covariance matrix $A_{/ b_A(\lambda)}^c$. In this notation the vector $b_A(\lambda)=\overrightarrow{1}-\lambda B D^{-1} \overrightarrow{1}$. The generalized Schur complement is $A^c(\gamma)=A-\gamma B D^{-1} C$. The notation $A_{/ b}^c$ denotes $A^c /\left(b b^T\right)$ with division performed element-wise.

$$ A'' = A^c(\gamma)/b_Ab_A^T $$

@microprediction
Copy link
Owner

microprediction commented May 9, 2024 via email

@Marzio-USI
Copy link
Author

Marzio-USI commented May 18, 2024

My mistake; in case someone is having the same doubts, this might be helpful:

$$ \Sigma = \begin{bmatrix} A & B\\ C & D \end{bmatrix} $$

Where $\Sigma \in \mathbb{R}^{n \times n}$ and:

  1. $A \in \mathbb{R}^{p \times p}$
  2. $B \in \mathbb{R}^{p \times q}$
  3. $C \in \mathbb{R}^{q \times p}$
  4. $D \in \mathbb{R}^{q \times q}$
  5. $p + q = n$

The minimum variance portfolio:

$$ w \propto \Sigma^{-1} \vec{1} $$

We can view any expression of the form $\Sigma^{-1}$ in terms of the minimum variance portfolio $w(\Sigma)$ and its portfolio variance $\nu(\Sigma)$, viz:

$$ \Sigma^{-1}\vec{1} = \vec{1}^T \Sigma^{-1} \vec{1} \cdot w(\Sigma) = \dfrac{1}{\nu(\Sigma)}w(\Sigma) $$

In particular if $B = 0$ then the global minimum variance allocation is proportional to:

$$ w \propto \Sigma^{-1}\vec{1} = \begin{bmatrix} A & 0\\ 0 & B\\ \end{bmatrix}^{-1} \vec{1} = \begin{bmatrix} A^{-1} \vec{1}\\ D^{-1} \vec{1} \end{bmatrix} = \begin{bmatrix} \dfrac{1}{\nu(A)}w(A)\\ \dfrac{1}{\nu(D)}w(D)\\ \end{bmatrix} $$

Schur Allocation

Instead with Schur allocation we can say that:

$$ \Sigma^{-1} = \begin{bmatrix} A & B\\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} (A - BD^{-1}C)^{-1} & 0 \\ 0 & (D - CA^{-1}B) \end{bmatrix} \begin{bmatrix} I_p & -BD^{-1} \\ -CA^{-1} & I_q \end{bmatrix} $$

Where $I_p$, $I_q$ are identity matrices $\in \mathbb{R}^{p \times p}$ and $\in \mathbb{R}^{q \times q}$ respectively.

Thus, the global minimum variance allocation is proportional to:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} A & B\\ C & D \end{bmatrix}^{-1} \vec{1}\\ &= \begin{bmatrix} (A - BD^{-1}C)^{-1} & 0 \\ 0 & (D - CA^{-1}B) \end{bmatrix} \begin{bmatrix} I_p & -BD^{-1} \\ -CA^{-1} & I_q \end{bmatrix} \cdot \vec{1} \\ \end{aligned} $$

With $A^c = (A- BD^{-1}C)$ and $D^c = (D - CA^{-1}B)$ then:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} (A^c)^{-1} & 0 \\ 0 & (D^c)^{-1} \end{bmatrix} \begin{bmatrix} I_p & -BD^{-1} \\ -CA^{-1} & I_q \end{bmatrix} \cdot \vec{1} \\ &\\ &= \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \\ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \cdot \vec{1} \end{aligned} $$

Which can be transformed as follows:

$$ \begin{aligned} \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \\ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \cdot \vec{1} &= \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \\ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{\vec{1}_p}\\ \mathbf{\vec{1}_q}\\ \end{bmatrix} \\ &\\ &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) \mathbf{\vec{1}_q}\\ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) \mathbf{\vec{1}_p}\\ \end{bmatrix} \end{aligned} $$

We can then define a matrix $M$ such that:

$$ M \mathbf{\vec{1_p}}= \mathbf{\vec{1_q}} $$

To simplify it to:

$$ \begin{aligned} \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \\ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} ;; \vec{1} &= \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \\ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{\vec{1}_p}\\ \mathbf{\vec{1}_q}\\ \end{bmatrix} \\ &\\ &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) \mathbf{\vec{1}_q}\\ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) \mathbf{\vec{1}_p}\\ \end{bmatrix} \\ &\\ &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) M^{(1)}\mathbf{\vec{1}_p} \\ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) M^{(2)}\mathbf{\vec{1}_q}\\ \end{bmatrix} \\ \end{aligned} $$

So that we can group them together

$$ \begin{aligned} \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \\ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \cdot \vec{1} &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) M^{(1)}\mathbf{\vec{1}_p} \\ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) M^{(2)}\mathbf{\vec{1}_q}\\ \end{bmatrix} \\ &\\ &= \begin{bmatrix} \bigg((A^c)^{-1} - (A^c)^{-1}BD^{-1}M^{(1)}\bigg)\mathbf{\vec{1}_p} \\ \bigg((D^c)^{-1} - (D^c)^{-1}CA^{-1}M^{(2)}\bigg)\mathbf{\vec{1}_q}\\ \end{bmatrix} \\ &\\ &= \begin{bmatrix} \bigg((A^c)^{-1}\Big( I_p - BD^{-1}M^{(1)} \Big)\bigg)\mathbf{\vec{1}_p} \\ \bigg((D^c)^{-1}\Big( I_q - CA^{-1}M^{(2)} \Big)\bigg)\mathbf{\vec{1}_q}\\ \end{bmatrix} \end{aligned} $$

Thus following the matrix inverse property:

$$ (A B)^{-1} = B^{-1}A^{-1} $$

We can do:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} A & B\\ C & D \end{bmatrix}^{-1} \vec{1}\\ &\\ &= \begin{bmatrix} \bigg((A^c)^{-1}\Big( I_p - BD^{-1}M^{(1)} \Big)\bigg)\mathbf{\vec{1}_p} \\ \bigg((D^c)^{-1}\Big( I_q - CA^{-1}M^{(2)} \Big)\bigg)\mathbf{\vec{1}_q}\\ \end{bmatrix} \\ &\\ &= \begin{bmatrix} \dfrac{1}{\nu(Ag)}w(Ag)\\ \dfrac{1}{\nu(Dg)}w(Dg)\\ \end{bmatrix} \end{aligned} $$

where

$$ Ag = \bigg( (A^c)^{-1}\Big( I_p - BD^{-1}M^{(1)} \Big) \bigg)^{-1} = \Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} \Big((A^c)^{-1}\Big)^{-1} = \Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} A^c $$

and

$$ Dg = \bigg( (D^c)^{-1}\Big( I_q - CA^{-1}M^{(2)} \Big) \bigg)^{-1} = \Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} \Big((D^c)^{-1}\Big)^{-1} = \Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} D^c $$

In conclusion:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} A & B\\ C & D \end{bmatrix}^{-1} \vec{1}\\ &\\ &= \begin{bmatrix} \dfrac{1}{\nu(Ag)}w(Ag)\\ \dfrac{1}{\nu(Dg)}w(Dg)\\ \end{bmatrix} \\ &\\ &= \begin{bmatrix} \dfrac{1}{\nu\Bigg(\Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} A^c\Bigg)}w\Bigg(\Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} A^c\Bigg)\\ \dfrac{1}{\nu\Bigg(\Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} D^c\Bigg)}w\Bigg(\Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} D^c\Bigg)\\ \end{bmatrix} \end{aligned} $$

One question that still remains is how to choose $M$

From my perspective, $M$ can be defined as follows:

$$ M^{(1)} = \begin{cases} I_p & \text{if } p = q \\ \dfrac{1}{p} \mathbf{\vec{1}_q}\mathbf{\vec{1}_p^T} \end{cases} $$

$$ M^{(2)} = \begin{cases} I_q & \text{if } p = q \\ \dfrac{1}{q} \mathbf{\vec{1}_p}\mathbf{\vec{1}_q^T} \end{cases} $$

Let me know if something is wrong or missing, and if there is a particular motivation for which $M$ is defined as the 'step-up matrix'.

@microprediction
Copy link
Owner

That's exactly right the choice of step-up matrix is a bit arbitrary here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants