Index of /data-release/data
Some files are compressed with XZ. To decompress: on Unix-like systems, use XZ Utils (xz -d OR unxz); on Windows, use 7-zip.
Release-DB-Read-Demo.ipynb:
An interactive notebook demonstrating the use of the data.
release_db.sqlite:
Our main database.
See database-fields.pdf for the field meanings.
non_policy_release_db.sqlite:
Policies that were removed by our classifier.
See database-fields.pdf for the field meanings.
pdfs.tar:
Two folders, one for policies, one for non-policies, which contain PDFs for those policies that were serves as PDFs. Filenames follow the following format:
<integer>_<year>_<phase>_<timestamp>_<site_url>_<filename>_privacy.pdf
For more information on how this data was collected and inspiration for uses, you can find our paper at https://privacypolicies.cs.princeton.edu/paper