The present invention concerns computer-implemented methods for training a machine learning model using Stochastic Gradient Descent, SGD. In one embodiment, the method is performed by a first computer in a distributed computing environment and comprises performing a learning round, comprising broadcasting a parameter vector to a plurality of worker computers in the distributed computing environment, and upon receipt of one or more respective estimate vectors from a subset of the worker computers, determining an updated parameter vector for use in a next learning round based on the one or more received estimate vectors, wherein the determining comprises ignoring an estimate vector received from a given worker computer when a sending frequency of the given worker computer is above a threshold value. The method aggregates the gradients in an asynchronous communication model with unbounded communication delays.