Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...
In 2016, a group of researchers, publishers and research funders published the first guidelines to make data “findable, accessible, interoperable and re-usable ...