Learning Ridge Functions With Randomized Sampling In High Dimensions
We study the problem of learning ridge functions of the form f(x) = g(aT x), x ∈ ℝd, from random samples. Assuming g to be a twice continuously differentiable function, we leverage techniques from low rank matrix recovery literature to derive a uniform approximation guarantee for estimation of the ridge function f. Our new analysis removes the de facto compressibility assumption on the parameter a for learning in the existing literature. Interestingly the price to pay in high dimensional settings is not major. For example, when g is thrice continuously differentiable in an open neighbourhood of the origin, the sampling complexity changes from O(log d) to O(d) or from equation to O(d2+q/2-q) to O(d4), depending on the behaviour of g' and g" at the origin, with 0 <; q <; 1 characterizing the sparsity of a.