Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The task - determine file

...

deletion rate.

Focus - During DC24, file deletion rates were not adequate.

Approach -

  • capture gateway log files for the DC24 period, in the first instance

  • Extract file size log entries and deletion time log entries, either for all VOs, or a selected VO

  • Correlate the file sizes and deletion times by the pathname of the file

  • Display (plot) or further analyse the correlated data.

Work breakdown -

  • Combine gateway log files for a given date (assuming at this point that the datestamp in the filename is reliable, and that the log entries are for a roughly equal span of time.

  • Write Awk scripts to extract file sizes into one file, extract deletion times into another file.

    • File sizes look like:

      • path,date,size,writetime

    • File deletion times look like:

      • path,date,deletetime

  • Correlate these entries by a simple sqlite join query:

    • .mode csv
      .headers on
      .import atlas-file-sizes.csv sizes
      .import atlas-delete-times.csv times
      select distinct sizes.path, size, deletetime, round(size*1.0/deletetime, 2) as "deleterate"
      from sizes inner join times using (path)

  • Running that query produce a correlation between file sizes and their deletion times looking like:

    • path,size,deletetime,deleterate

  • This can be plotted with e.g. GNUplot or query statistics on the command line with datamash