SoK: Efficient Privacy-preserving Clustering

This repository contains the source code for the PETS'21 paper HMS+21 by Aditya Hegde, Helen Möllering, Thomas Schneider, and Hossein Yalame.

Components

A brief description of the subdirectories in the codebase is given below. The README in each subdirectory provides more information on compilation and usage.

he_meanshift: An implementation of the HE-Meanshift protocol presented by Cheon et al. in CKP19.
hc_protocols: An implementation of the hierarchical clustering protocols of Meng et al. in MPO19.
utils: Scripts to automate simple tasks and aid in analysis.
data: A sample dataset to use as input. See the Datasets section for more details.

Building the Project

Docker

All required dependencies to compile and run the project are available through the docker image. To use docker run the following:

docker pull adishegde/sok-ppcluster:latest
docker run -it adishegde/sok-ppcluster:latest

To locally build the docker image run the following:

docker build -t sokppcluster .
docker run -it sokppcluster

We observed the build process to require at least 4GB RAM which must be explicitly set in case of Windows and MacOS.

Manual

The code is written in C++17 and uses cmake. The he_meanshift and hc_protocols implementations have different external dependencies and can be built separately using the instructions given in their respective READMEs.

Datasets

The datasets we use for evaluating clustering quality are available at the public GitHub repository gagolews/clustering_benchmarks_v1. While the above repository provides datasets in text format saved as .gz files, the C++ benchmark programs require the input dataset to be in Numpy's .npy format. The utils/transform_data.py program can be used to convert the .gz file into .npy format. Please refer to the README in the utils directory for usage information.

A sample dataset in the above formats along with the corresponding ground truth as created using Sci-kit learn's make_blobs function is available in the data directory. It consists of 128 data records each having 1 attribute and consists of 2 clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
hc_protocols		hc_protocols
he_meanshift		he_meanshift
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENCE.md		LICENCE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

hc_protocols

hc_protocols

he_meanshift

he_meanshift

utils

utils

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENCE.md

LICENCE.md

README.md

README.md

Repository files navigation

SoK: Efficient Privacy-preserving Clustering

Components

Building the Project

Docker

Manual

Datasets

About

Releases

Packages

Contributors 3

Languages

License

encryptogroup/SoK_ppClustering

Folders and files

Latest commit

History

Repository files navigation

SoK: Efficient Privacy-preserving Clustering

Components

Building the Project

Docker

Manual

Datasets

About

Resources

License

Stars

Watchers

Forks

Languages