Intermediate scripts for file size vs deletion duration
The task - determine file deletion rate.
Focus - During DC24, file deletion rates were not adequate.
Approach -
capture gateway log files for the DC24 period, in the first instance
Extract file size log entries and deletion time log entries, either for all VOs, or a selected VO
Correlate the file sizes and deletion times by the pathname of the file
Display (plot) or further analyse the correlated data.
Work breakdown -
Combine gateway log files for a given date (assuming at this point that the datestamp in the filename is reliable, and that the log entries are for a roughly equal span of time.
Write Awk scripts to extract file sizes into one file, extract deletion times into another file.
File sizes look like:
path,date,size,writetime
File deletion times look like:
path,date,deletetime
Correlate these entries by a simple sqlite join query:
.mode csv
.headers on
.import atlas-file-sizes.csv sizes
.import atlas-delete-times.csv times
select distinct sizes.path, size, deletetime, round(size*1.0/deletetime, 2) as "deleterate"
from sizes inner join times using (path)
Running that query produce a correlation between file sizes and their deletion times looking like:
path,size,deletetime,deleterate
This can be plotted with e.g. GNUplot or query statistics on the command line with datamash