$||Ax||^2 \ge ||Bx||^2 \ge ||Ax||^2 - \epsilon ||A||_f^2$

where $||A||_f$ is the Frobenius norm of $A$. Here you can see the parallels with the frequency approximation algorithm; the error is a function of how "big" the stream is, which in this case is the Frobenius norm of the input matrix.

The algorithm works as follows: start with $B$ as a $l \times m$ matrix of all zeroes, and for each input row $A_i$ do the following update:

- Set $B_l = A_i$ (the last row of $B$).
- Compute the singular value decomposition (SVD) of $B$, so we obtain $B = U \Sigma V$ with the standard assumption that the diagonal values of $\Sigma$ are $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_l$.
- "Remove" the smallest singular value from $\Sigma$ by letting

$\bar{\Sigma} = \sqrt{\max(\Sigma - I_l \sigma_l^2, 0)}$

where $I_l$ is the $l \times l$ identity matrix. - Then set $B = \bar{\Sigma}V$ (note that the last row of $B$ is all zeroes after this step because the last row of $\bar{\Sigma}$ is all zeroes by construction).

At the end, just output the current value of $B$. I won't go into any of the proof details (they can be found in the paper), but it's interesting to try to understand what the algorithm is doing intuitively. The SVD can be thought of (very loosely) as breaking down a matrix into three transformations applied to a multidimensional space: a rotation ($V$), followed by a scaling along the axes ($\Sigma$), and lastly another rotation ($U$). So the singular values are the scaling factors of the matrix in orthogonal directions, and we are removing the smallest one from each of these directions equally. As a result, we only lose a fraction of the accuracy in any particular direction (i.e. $Bx$ versus $Ax$ for a specific $x$) compared to the Frobenius norm of $A$ as a whole.

This would be pretty cool even if it was a purely theoretical result since it ties two problems together in a very elegant way, but it gets better. The paper goes on to explore the algorithm's performance with some experiments and observes that the accuracy is quite an improvement over existing techniques for matrix sketching. Moreover, computing the SVD is somewhat expensive, so the author describes how the algorithm can be parallelized as well as a way to reduce the computational cost of the SVD step and only slightly relaxing the guarantees. It's a very nice paper that spans both the theoretical and practical domains for the matrix sketching problem.

This would be pretty cool even if it was a purely theoretical result since it ties two problems together in a very elegant way, but it gets better. The paper goes on to explore the algorithm's performance with some experiments and observes that the accuracy is quite an improvement over existing techniques for matrix sketching. Moreover, computing the SVD is somewhat expensive, so the author describes how the algorithm can be parallelized as well as a way to reduce the computational cost of the SVD step and only slightly relaxing the guarantees. It's a very nice paper that spans both the theoretical and practical domains for the matrix sketching problem.

## No comments:

## Post a Comment