Error rate: the total proportion of classification errors
Accuracy: accuracy
Error: the difference between the actual output and the real output, which is empirical error or training error on the training sample and generalization error on the new sample
Over fitting: degradation of generalization performance
The purpose of Federated learning is to obtain a shared global model for all nodes to use.However, due to the data distribution of non IID, some local models trained only with local data perform better than global models, which makes these nodes reluctant to participate in the Federation process.This article will introduce some technologies currently used to personalize the global model to improve the effect on the independent nodes.
Decoupled Neural Interfaces using Synthetic Gradients
Abstract
The training of neural network usually needs to calculate the graph forward and then propagate the error back to update the weight.So in a sense, all layers of the network are locked in because they have to wait for the rest of the network to reason forward and propagate back before they can update it.In this work, we introduce the future computing model of network graph to break the constraint by decoupling the modules.These models only use local information to predict the results of subgraphs.Especially, when we focus on modeling the error gradient: by using the composite gradient of modeling to replace the real back-propagation error gradient, so that the subgraphs are coupled, and the subgraphs can be updated independently and asynchronously, that is, we implement the decoupling neural network interface.We show the results of the feedforward model, in which each layer is trained asynchronously; the results of RNN can predict the future gradient of one layer, thus prolonging the time of RNN effective modeling; and the results of hierarchical RNN system are marked with scales at different times.Finally, we prove that in addition to the prediction gradient, the same framework can also be used to predict the input, which leads to the decoupling of the model in forward and backward propagation, which is equal to two independent networks. They can learn together to form a separate functional network.
Most of the modern deep learning models need high computing power, but for embedded devices, this kind of high computing power is lacking.Therefore, for this kind of equipment, the model that can reduce the calculation and maintain the performance is very important.Knowledge distillation is one of the methods to solve this kind of problem.Traditional knowledge distillation method is to transfer knowledge from teachers to students directly in a stage.We propose a phased training method to improve the knowledge conversion.This method can even use only a part of the training teacher model data, and does not affect the effect.This compression model can be regarded as a supplement to other techniques.
Federated Meta-Learning with Fast Convergence and Efficient Communication
Abstract
In this paper, we propose a federated meta learning framework, fedmeta, which shares the meta learner instead of the previous global model.In this paper, we evaluate the leaf data set and the actual data set, and prove that the communication cost of fedmeta is reduced by 2.82-4.33 times, and the convergence speed is faster. Compared with fedavg, it even increases by 3.23-14.84 percentage points.In addition, fedmeta retains user privacy because it only shares parameterized algorithms without data.
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
Abstract
In this paper, we propose to use meta learning objectives to maximize the transfer speed of changing distribution, so as to learn how to acquire knowledge in modularity.In particular, we are concerned about how to incorporate joint distribution into appropriate conditions when consistent with causality.For example, if the distribution of one of the variables is local.In this case, we need to prove that there are only a few non local parameters in the causality graph.It is observed that this will lead to faster self-adaptive, and use this property to define a meta learning alternative score, which will not only facilitate continuous graph parameterization, but also contribute to the correct causality diagram.Finally, we consider the AI agent aspect (e.g., robot autonomous discovery of its environment), and we consider how the same goal can discover the causal variable itself, because the observed low-level variable has no causal significance.Experiments in a bivariate example verify the proposed ideas and theoretical results.