Privacy and Communication Efficiency in Distributed Computing and Learning
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
With the unprecedented rate of the amount of data generated daily, it has become difficult and inefficient to process and utilize such large-scale data on a single machine. One popular and practical way of speeding up the process is to outsource the computationally intensive tasks to powerful distributed servers. However, distributing tasks to servers incurs significant communication overhead. In addition, the heterogeneity in computing power among servers causes the straggler effect, where a few slower machines severely degrade the overall performance of the distributed system. Furthermore, security and privacy of the data and the user can be an issue if the data contain sensitive information and if servers are untrustworthy.In this dissertation, we study two different distributed system models. The first model we considered consists of a user and several distributed servers. In this model, the user uploads the local data to the servers and downloads the result once the servers finish the requested computation. Since the data are often represented in the form of a matrix, we focus on how to distribute matrix operations such as matrix multiplication and fast Fourier transform. We address the above challenges by devising novel coding schemes that simultaneously secure sensitive data, keep user’s intention private and improve communication efficiency for performing fundamental matrix operations distributedly. We also study the synergy between coding and randomization to further speed up distributed matrix multiplication. The second model we considered appears more commonly in distributed machine learning setup. We focus on a popular distributed learning framework called federated learning, where a group of users wish to jointly train a learning model with the help of a parameter server. The training is typically done in an iterative manner using gradient-based optimization algorithms. We study a variation of federated learning called wireless federated learning where the exchanges of gradients and updated models between the users and the parameter server occur over wireless channels. Since the training involves gradient aggregation from multiple users, the superposition property of wireless channels can naturally support this operation. However, this also incurs significant communication overhead and privacy concern. Hence, in our works, we focus on (a) devising a scheme that takes the informativeness of the gradients and the underlying channel condition into account when quantizing the gradient vectors; (b) investigating how much coordination is needed between users in terms of transmit power alignment in order to ensure the convergence of the model; (c) adding the privacy constraint and studying the joint benefit of wireless aggregation and user sampling on DP guarantees for wireless FL; and (d) studying the impact of non-i.i.d. data and adaptive artificial noise on the convergence and privacy leakage.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeElectrical & Computer Engineering