...
Combine gateway log files for a given date (assuming at this point that the datestamp in the filename is reliable, and that the log entries are for a roughly equal span of time.
Write Awk scripts to extract file sizes into one file, extract deletion times into another file.
File sizes look like:
path,date,size,writetime
File deletion times look like:
path,date,deletetime
Correlate these entries by a simple sqlite join query:
.mode csv
.headers on
.import atlas-file-sizes.csv sizes
.import atlas-delete-times.csv times
select distinct sizes.path, size, deletetime, round(size*1.0/deletetime, 2) as "deleterate"
from sizes inner join times using (path)
Running that query produce a correlation between file sizes and their deletion times looking like:
path,size,deletetime,deleterate
This can be plotted with e.g. GNUplot , or query statistics on the command line produced with datamash