The ever-growing number of edge devices (e.g., smartphones) and the exploding volume of sensitive data they produce, call for distributed machine learning techniques that are privacy-preserving. Given the increasing computing capabilities of modern edge devices, these techniques can be realized by pushing the sensitive-data-dependent tasks of machine learning to the edge devices and thus avoid disclosing sensitive data. In spite of the privacy benefits, existing techniques following this new computing paradigm are limited in addressing three important challenges. First, for many applications, such as news recommenders, data needs to be processed fast, before it becomes obsolete. Second, given the large amount of uncontrolled edge devices, some of them may undergo arbitrary (Byzantine) failures and deviate from the distributed learning protocol with potentially negative consequences such as learning divergence or even biased predictions. Third, privacy-preserving learning protocols call for formal privacy guarantees that imply additional tuning cost for the learning protocols. To address the fast data challenge, we introduce FLeet, the first system for online learning at the edge. FLeet employs two core components, namely I-Prof and AdaSGD. I-Prof is a lightweight regression-based profiler that adjusts the size of the sensitive-data-dependent tasks on highly heterogeneous mobile devices. AdaSGD is a staleness-aware learning algorithm that is robust to asynchronous updates. To make learning secure against Byzantine failures, we present AggregaThor and Kardam. AggregaThor is a scalable framework that facilitates synchronous learning. AggregaThor tolerates Byzantine failures mainly based on a filtering component that compares the updates sent from every device at each synchronous round. Given the tolerance to failures, we boost the network layer of AggregaThor by introducing a communication protocol based on unreliable links. Kardam operates in an asynchronous setup and employs two components: (a) a filtering component that, based on statistical properties of the learning procedure, tolerates Byzantine failures and (b) a dampening component that adjusts to stale information and enables fast convergence. To address the privacy guarantees challenge, we present DP-SCD, a differentially private version of an algorithm with relatively low tuning cost, namely stochastic coordinate descent. DP-SCD is based on the insight that under independent noise addition (necessary for the privacy guarantees), the consistency of the auxiliary information that stochastic coordinate descent employs, holds in expectation. We give convergence guarantees for DP-SCD and demonstrate its superiority in terms of tuning without any significant impact on the privacy-utility trade-off.