Input dataset files for the provenance index database This directory contains input files used to generate provenance DB. Each file is a CSV file with 3 columns: revision hash, author date,[directory] The third columnm (root directory IDs) may not be populated. Revisions should be sorted by commit date (needed for an efficient provenance database creation). Provided datasets are: | name | N | first date | last date | description | ese2005-begin2m | 2M | 1970-01-01 00:00:01 | 1988-06-30 08:13:36 | first 2 million revisions of the ESE2005 dataset | ese2005-end2m | 2M | 2004-09-27 20:56:03 | 2005-01-01 00:00:00 | last 2 million revisions of the ESE2005 dataset | ese2005 | 38M | 1970-01-01 00:00:01 | 2005-01-01 00:00:00 | ESE2005 dataset: same as ese2018 but limited to revisions which date is between EPOCH and 2005-01-01 | ese2018 | 1B | 1787-08-06 17:00:00 | 6912-01-03 15:38:42+00:00 | ESE2018 dataset; all the revisions of the Software Heritage Archive at date 2018-02-13 | sample_10k | 10k | 2008-10-06 16:02:56+00:00 | 2016-06-22 06:13:17+00:00 | a small sample of 10k revisions, useful for testing