ML and K-anonymity

This page consists of papers that have k-anonymity and applications in ML

Anonymizing Machine Learning Models : Very recent in 2021 oct

A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms

(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 5, No. 11, 2014 A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms Hayden Wimmer College of Business Bloomsburg University Bloomsburg, PA USA Abstract—While research has been conducted in machine learning algorithms and in privacy preserving in data mining (PPDM), a gap in the literature exists which combines the aforementioned areas to determine how PPDM affects common machine learning algorithms. The aim of this research is to narrow this literature gap by investigating how a common PPDM algorithm, K-Anonymity, affects common machine learning and data mining algorithms, namely neural networks, logistic regression, decision trees, and Bayesian classifiers. This applied research reveals practical implications for applying PPDM to data mining and machine learning and serves as a critical first step learning how to apply PPDM to machine learning algorithms and the effects of PPDM on machine learning. Results indicate that certain machine learning algorithms are more suited for use with PPDM techniques. Keywords—Privacy Preserving; Data Mining; Machine Learning; Decision Tree; Neural Network; Logistic Regression; Bayesian Classifier I. INTRODUCTION Knowledge discovery in databases (KDD), or Data Mining (DM), seeks to uncover patterns and relationships contained in data. Privacy of information has come under increasing scrutiny with the advent of regulations such as HIPAA [1, 2]. Simply removing fields or obscuring the records would distort the knowledge contained within the data. This necessity led to the inception of privacy preserving in data mining, or PPDM. PPDM algorithms attempt to de-identify data while maintaining the knowledge contained within. The goal of PPDM research is minimal knowledge distortion; however, some knowledge may be lost when applying PPDM. Machine learning techniques are frequently employed in KDD, or data mining. This research aims to understand the effects of PPDM on common machine learning algorithms and serves as a first step toward mapping the effects of PPDM algorithms on machine learning algorithms. Specifically, this research compares artificial neural networks (ANN), Bayesian Classifier, Decision Stump, C4.5 Decision Tree Induction, Logistic Regression, and Classification and Regression Trees (CART). This work has practical implications for data science and analytics as applied by academics and practitioners alike. The remainder of this paper is structured as follows: section 2 provides a background of machine learning algorithms and privacy preserving in data mining, section 3 presents the methodology and results, and section 4 discusses conclusions and future directions. Loreen Powell College of Business Bloomsburg University Bloomsburg, PA USA

IBM article about anonymizing machine learning models

Anonymizing Machine Learning Models | IBM Research Publications

On the Role of Data Anonymization in Machine Learning Privacy

Module 1 and 2, apply naive bayes

https://towardsdatascience.com/laplace-smoothing-in-naïve-bayes-algorithm-9c237a8bdece#:~:text=Conclusion,the positive and negative reviews.

https://towardsdatascience.com/understanding-naïve-bayes-algorithm-f9816f6f74c0