When the sensitive attribute has a binary value, vs when the sensitive attribute can have many values, is this an important factor that needs to be considered ?
Eg : In the Adult dataset, when income is chosen, it is either > 50000 or < 50000, but while occupation is chosen as the sensitive attribute, then it can have a lot of values, in the later case is it better to have a taxonomy tree for the sensitive attribute ? The risk of disclosure is much higher in the case of having only a binary value
In each release of the table, the level of the attributes are noted and in the subsequent releases, it is not allowed to go beyond this level, but is this calculation feasible ?
One way this can be achieved is to ensure that the iterations never reach the highest level of anonymisation, it should only be allowed to reach a threshold value of generalisation in the process, i,e there should be a cap on the generalisation level, in order to ensure that there are enough tuples to satisfy the k-anonymity criteria, duplicate tuples must be added where and when needed The highest and lowest levels of generalisation in each iteration can be regulated, for elements not satisfying this, duplicate tuples or noise should be introduced in the table to maintain the k anonymity criteria
The amount of redundant data is limited to 1 to k-1 values, as it is required just to maintain the equivalence class, This is the maximum amount of noise/duplicate data than can be added to the table
Utility with Shannons Attribute Entropy Function
Priority Function of attributes
Probabilty Distribution Of Attributes
METRICS ALGO
Adding Noise? threshold for additional tuples added. keep track of these tuples?
Generalisation Thresholds
Should we have a dynamic k as well ?
To tell ma'am on 10th
Census data, collected and published for 15 yrs - 3 iterations