Abstract: The goal of this thesis to contribute towards a computational complexity theory of statistical inference problems. In recent years, researchers have built evidence in favor of an emerging hypothesis that the class of semi-definite programming (SDP) algorithms is optimal among for computationally efficient algorithms for a certain family of estimation problems. In this thesis, we present four main research efforts that refine this hypothesis and initiate preliminary efforts to go beyond it: Optimal algorithms for private and robust estimation We give the first polynomial-time algorithms for privately and robustly estimating a Gaussian distribution with optimal dependence on the dimension in the sample complexity. This adds the fundamental problem of private statistical estimation to a growing list of problems for which SDPs are optimal among polynomial-time algorithms. Limitations of SDPs: Given independent standard Gaussian points in dimension $d$, for what values of $(n, d)$ does there exist with high probability an origin-symmetric ellipsoid that simultaneously passes through all of the points? Based on strong numerical evidence, it was conjectured that the ellipsoid fitting problem transitions from feasible to infeasible as the number of points $n$ increases, with a sharp threshold at $n \sim d^2/4$; we resolve this conjecture up to logarithmic factors. A corollary of this result is that a canonical SDP-based algorithm fails to successfully solve inference problems involving low-rank matrix decompositions, independent component analysis, and principal component analysis. New algorithms for discrepancy certification: We initiate the study of the algorithmic problem of certifying lower bounds on the discrepancy of random matrices, which has connections to conjecturally-hard average-case problems such as negatively-spiked PCA, the number-balancing problem and refuting random constraint satisfaction problems. We give the first polynomial-time algorithms with non-trivial guarantees, strictly outperforming a canonical SDP-based algorithm. Our algorithms are among the first to harness the power of lattice basis reduction techniques to solve statistical estimation problems. Fast spectral algorithms: We study the algorithmic problem of estimating the mean of a heavy-tailed random vector in high dimensions given i.i.d.\ samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem were known but have high runtime due to the use of SDPs. We give a fast spectral algorithm for this problem that also has optimal statistical performance. Our work establishes yet another fundamental statistical estimation problem for which the power of SDPs is matched by simpler, more practical algorithms.
No Comments.