Abstract
Distributed machine learning is becoming a popular model-training method due to privacy, computational scalability, and bandwidth capacities. In this work, we explore scalable distributed-training versions of two algorithms commonly used in object detection. A novel distributed training algorithm using Mean Weight Matrix Aggregation (MWMA) is proposed for Linear Support Vector Machine (L-SVM) object detection based in Histogram of Orientated Gradients (HOG). In addition, a novel Weighted Bin Aggregation (WBA) algorithm is proposed for distributed training of Ensemble of Regression Trees (ERT) landmark localization. Both algorithms do not restrict the location of model aggregation and allow custom architectures for model distribution. For this work, a Pool-Based Local Training and Aggregation (PBLTA) architecture for both algorithms is explored. The application of both algorithms in the medical field is examined using a paradigm from the fields of psychology and neuroscience – eyeblink conditioning with infants – where models need to be trained on facial images while protecting participant privacy. Using distributed learning, models can be trained without sending image data to other nodes. The custom software has been made available for public use on GitHub: https://github.com/SLWZwaard/DMT. Results show that the aggregation of models for the HOG algorithm using MWMA not only preserves the accuracy of the model but also allows for distributed learning with an accuracy increase of 0.9% compared with traditional learning. Furthermore, WBA allows for ERT model aggregation with an accuracy increase of 8% when compared to single-node models.
Introduction
Distributed machine learning allows models to be partially trained on a local basis and then aggregated into a new model, for example on a central server [1]. For research involving images or videos of human participants, this method serves the important purpose of protecting an individual’s privacy by keeping the data local to the collection point [2]. However, this approach has other advantages, including distribution of the required processing power needed for machine learning, which eliminates the need for heavy central servers, as well as reduction of data communication bandwidth requirements for transferring large amounts of data between nodes. In order to enable distributed training, algorithms need to be developed for effective model aggregation that result in little (or no) loss in accuracy. In this paper, we focus on aggregation of a Linear Support Vector Machine (L-SVM) classifier-based object detection using a Histogram of Orientated Gradients (HOG) feature extractor, the HOG-algorithm [3], [4], as well as a landmark localization algorithm called Ensemble of Regression Trees (ERT) [5], [6]. Both are common algorithms used in face- and feature-detection applications [7], [8]. The HOG-algorithm is used to detect objects such as human faces or road signs. For each object and its orientation, a specific model needs to be trained. For localizing landmarks on an object, such as dots around a person’s eye, ERT is used to place a predefined shape of landmarks on the object. The predefined shape is then shifted and warped into place over multiple iterations. This ERT model needs to be trained for each specific situation or context within a dataset. In order to train new models, annotated datasets are needed. However, these datasets should be specific for the conditions of the test setup to achieve the most accurate results. For face detection, specific datasets might be needed for multiple facial orientations or ages. For general face and object detection, publicly available datasets are often available. However, if specific images introduce noise or other variability, such as uncommon objects or equipment visible on the face, publicly available datasets do not help. Privacy is also a major limiting factor for image availability due to, for example, GDPRcompliance issues [9]. In this work, new distributed-training algorithms for the HOG-algorithm and ERT are explored to allow training of new models across multiple nodes without sharing the original data. For the HOG-algorithm, we introduce a Mean Weight Matrix Aggregation (MWMA) training algorithm, while for ERT models, a Weighted Bin Aggregation (WBA) training algorithm is proposed. For both algorithms, a Pool-Based Local Training and Aggregation (PBLTA) distribution architecture is examined, where models can be shared with a pool to all participating parties. Without loss of generality, we prototype and test the two algorithms using a cutting-edge case which motivated this work in the first place and is borrowed from the fields of psychology and neuroscience, called eyeblink conditioning (EBC). EBC is a behavioral, Pavlovian-training [10] experiment where the participant trains a response to a repeated stimulus, providing a general biomarker for neurodevelopmental diseases [11]. Privacy concerns are currently limiting the training of new models in EBC, where we use computer vision to automate the eye-tracking experiments. These cutting-edge experiments are currently being conducted as a joint project between the Princeton Neuroscience Institute (PNI), the Department of Psychology at Princeton University, and the Department of Neuroscience at the Erasmus Medical Center (EMC). EBC measurements of both adult and infant populations are taken through the use of camera equipment, rather than physically attached sensors, in order to increase patient comfort.
Conclusions
This work investigated the implementation of a distributed machine learning approach to train models for both face detection and landmark localization from distributed training data, without sharing data between the collaborating research sites. The new algorithms were explored using an eyeblink conditioning (EBC) case study, for which new models needed to be trained. As training data for the EBC case study were distributed across both the Princeton Neuroscience Institute and the Erasmus Medical Center, and sharing of data between participants was not possible due to privacy concerns, traditional machine learning did not allow the full use of the training data. Distributed machine learning, on the other hand, does not require raw data to be sent between nodes. This work proposed two new algorithms for distributing the training of models for the HOG-algorithm and an ERT landmark localizer, and showed how new models can be aggregated from models trained on local datasets to achieve distributed machine learning. Both proposed algorithms are independent of the location of their model aggregation and have linear scalable performance, allowing custom model distribution architectures. Both the WBA and MWMA algorithms and local model training software have been made available for public use on the following GitHub repository: https://github.com/SLWZwaard/DMT. Results showed that for the HOG-algorithm it was possible to combine trained models using the proposed Mean Weight Matrix Aggregation (MWMA) algorithm, leading to an increase of recall accuracy of 0.9% when compared to traditional training. The distribution of ERT training using the Weighted Bin Aggregation (WBA) algorithm was shown to be possible, while reducing the error rate by at least 8% over the best individual locally trained model without data sharing, but with a 17% increase in error rate compared to traditional training. For model distribution, a Pool-Based Local Training and Aggregation (PBLTA) architecture was proposed. Using poolbased storage, models can be shared between collaborators at different research sites for both algorithms. Models trained or aggregated locally can be shared to the pool, and downloaded models can be aggregated into new models. This allows collaborators to choose what models they want to use and what models they want to share with others. With new distributed algorithms to train, aggregate models, and the PBLTA architecture, new models can be created for the eyeblink conditioning paradigm without privacy concerns. While these algorithms were explored for this case study, they are not limited to facial landmark localization and can generalize to others forms of object detection and landmark localization.
References
[1] J. Konecnˇ y, B. McMahan, and D. Ramage, “Federated optimization: ´ Distributed optimization beyond the datacenter,” 2015.
[2] D. Enthoven and Z. Al-Ars, “An overview of federated deep learning privacy attacks and defensive strategies,” arXiv:2004.04676, 2020.
[3] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, 2005, pp. 886– 893.
[4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.
[5] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1867–1874.
[6] W.-Y. Loh, “Classification and regression trees,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, pp. 14–23, 2011.
[7] S. Zafeiriou, C. Zhang, and Z. Zhang, “A survey on face detection in the wild: past, present and future,” Computer Vision and Image Understanding, vol. 138, pp. 1–24, 2015.
[8] P. Bakker, H.-J. Boele, Z. Al-Ars, and C. Strydis, “Real-time face and landmark localization for eyeblink detection,” arXiv:2006.00816, 2020.
[9] G. Chassang, “The impact of the eu general data protection regulation on scientific research,” ecancermedicalscience, vol. 11, 2017.
[10] I. P. Pavlov and G. V. Anrep, Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. Oxford University Press: Humphrey Milford, 1927.
[11] B. C. Reeb-Sutherland and N. A. Fox, “Eyeblink conditioning: A noninvasive biomarker for neurodevelopmental disorders,” Journal of Autism and Developmental Disorders, vol. 45, pp. 376–394, 2015.