Princeton-Leuven Longitudinal Corpus of Privacy Policies

Our dataset of historical privacy policies is available as a public Github repository.

The links in the table point to text (text) , HTML (HTML) and metadata (metadata) extracted from the privacy policies archived by the Wayback Machine. Check our project website or read our paper for more detailed information about our dataset.

The table below contains links to a sample of historical privacy policies from 1,000 sites. Please click the button below to load the full dataset (130,620 websites, 1M+ privacy policies) into the table.