Profiler Callbacks

Throughput and Simple Profilers for fastai. Inspired by PyTorch Lightning’s SimpleProfiler.

Since fastxtend profilers change the fastai data loading loop, they are not imported by any of the fastxtend all imports and need to be imported seperately:

from fastxtend.callback import profiler
Warning

Throughput and Simple Profiler are untested on distributed training.

Jump to usage examples.

Events

fastai callbacks do not have an event which is called directly before drawing a batch. fastxtend profilers add a new callback event called before_draw.

With a fastxtend profiler imported, a callback can implement actions on the following events:

  • after_create: called after the Learner is created
  • before_fit: called before starting training or inference, ideal for initial setup.
  • before_epoch: called at the beginning of each epoch, useful for any behavior you need to reset at each epoch.
  • before_train: called at the beginning of the training part of an epoch.
  • before_draw: called at the beginning of each batch, just before drawing said batch.
  • before_batch: called at the beginning of each batch, just after drawing said batch. It can be used to do any setup necessary for the batch (like hyper-parameter scheduling) or to change the input/target before it goes in the model (change of the input with techniques like mixup for instance).
  • after_pred: called after computing the output of the model on the batch. It can be used to change that output before it’s fed to the loss.
  • after_loss: called after the loss has been computed, but before the backward pass. It can be used to add any penalty to the loss (AR or TAR in RNN training for instance).
  • before_backward: called after the loss has been computed, but only in training mode (i.e. when the backward pass will be used)
  • before_step: called after the backward pass, but before the update of the parameters. It can be used to do any change to the gradients before said update (gradient clipping for instance).
  • after_step: called after the step and before the gradients are zeroed.
  • after_batch: called at the end of a batch, for any clean-up before the next one.
  • after_train: called at the end of the training phase of an epoch.
  • before_validate: called at the beginning of the validation phase of an epoch, useful for any setup needed specifically for validation.
  • after_validate: called at the end of the validation part of an epoch.
  • after_epoch: called at the end of an epoch, for any clean-up before the next one.
  • after_fit: called at the end of training, for final clean-up.

Throughput

The Throughput profiler only measures the step, draw, and batch. To use, both ThroughputCallback and ThroughputPostCallback must be added to the Learner. The recommended way to use is via Learner.profile.


source

ThroughputCallback

 ThroughputCallback (show_report:bool=True, plain:bool=False,
                     markdown:bool=False, save_csv:bool=False,
                     csv_name:str='throughput.csv',
                     rolling_average:int=10, drop_first_batch:bool=True)

Adds a throughput profiler to the fastai Learner. Optionally showing formatted report or saving unformatted results as csv.

Pair with ThroughputPostCallback to profile training performance.

Post fit, access report & results via Learner.profile_report & Learner.profile_results.

Type Default Details
show_report bool True Display formatted report post profile
plain bool False For Jupyter Notebooks, display plain report
markdown bool False Display markdown formatted report
save_csv bool False Save raw results to csv
csv_name str throughput.csv CSV save location
rolling_average int 10 Number of batches to average throughput over
drop_first_batch bool True Drop the first batch from profiling

source

ThroughputPostCallback

 ThroughputPostCallback ()

Required pair with ThroughputCallback to profile training performance. Removes itself after training is over.

Simple Profiler

To use, both SimpleProfilerCallback and SimpleProfilerPostCallback must be added to the Learner. The recommended way to use is via Learner.profile.


source

SimpleProfilerCallback

 SimpleProfilerCallback (show_report:bool=True, plain:bool=False,
                         markdown:bool=False, save_csv:bool=False,
                         csv_name:str='simpleprofiler.csv',
                         rolling_average:int=10,
                         drop_first_batch:bool=True)

Adds a simple profiler to the fastai Learner. Optionally showing formatted report or saving unformatted results as csv.

Pair with SimpleProfilerPostCallback to profile training performance.

Post fit, access report & results via Learner.profile_report & Learner.profile_results.

Type Default Details
show_report bool True Display formatted report post profile
plain bool False For Jupyter Notebooks, display plain report
markdown bool False Display markdown formatted report
save_csv bool False Save raw results to csv
csv_name str simpleprofiler.csv CSV save location
rolling_average int 10 Number of batches to average throughput over
drop_first_batch bool True Drop the first batch from profiling

source

SimpleProfilerPostCallback

 SimpleProfilerPostCallback ()

Required pair with SimpleProfilerCallback to profile training performance. Removes itself after training is over.

Convenience Method

Learner.profile is the easy and recommended way to use a fastxtend profiler.


source

ProfileMode

 ProfileMode (value, names=None, module=None, qualname=None, type=None,
              start=1)

Profile enum for Learner.profile


source

Learner.profile

 Learner.profile (mode:__main__.ProfileMode=<ProfileMode.Throughput:
                  'throughput'>, show_report:bool=True, plain:bool=False,
                  markdown:bool=False, save_csv:bool=False,
                  csv_name:str='profiler.csv', rolling_average:int=10,
                  drop_first_batch:bool=True)

Run a fastxtend profiler which removes itself when finished training.

Type Default Details
mode ProfileMode ProfileMode.Throughput Which profiler to use. Throughput or Simple.
show_report bool True Display formatted report post profile
plain bool False For Jupyter Notebooks, display plain report
markdown bool False Display markdown formatted report
save_csv bool False Save raw results to csv
csv_name str profiler.csv CSV save location
rolling_average int 10 Number of batches to average throughput over
drop_first_batch bool True Drop the first batch from profiling

Output

The Simple Profiler report contains the following items divided in three Phases (Fit, Train, & Valid)

Fit:

  • fit: total time fitting the model takes.
  • epoch: duration of both training and validation epochs. Often epoch total time is the same amount of elapsed time as fit.
  • train: duration of each training epoch.
  • valid: duration of each validation epoch.

Train:

  • step: total duration of all batch steps including drawing the batch. Measured from before_draw to after_batch.
  • draw: time spent waiting for a batch to be drawn. Measured from before_draw to before_batch. Ideally this value should be as close to zero as possible.
  • batch: total duration of all batch steps except drawing the batch. Measured from before_batch to after_batch.
  • forward: duration of the forward pass and any additional batch modifications. Measured from before_batch to after_pred.
  • loss: duration of calculating loss. Measured from after_pred to after_loss.
  • backward: duration of the backward pass. Measured from before_backward to before_step.
  • opt_step: duration of the optimizer step. Measured from before_step to after_step.
  • zero_grad: duration of the zero_grad step. Measured from after_step to after_batch.

Valid:

  • step: total duration of all batch steps including drawing the batch. Measured from before_draw to after_batch.
  • draw: time spent waiting for a batch to be drawn. Measured from before_draw to before_batch. Ideally this value should be as close to zero as possible.
  • batch: total duration of all batch steps except drawing the batch. Measured from before_batch to after_batch.
  • predict: duration of the prediction pass and any additional batch modifications. Measured from before_batch to after_pred.
  • loss: duration of calculating loss. Measured from after_pred to after_loss.

The Throughput profiler only contains step, draw, and batch.

Examples

These examples are trained on Imagenette with an image size of 224 and batch size of 64 on a 3080 Ti.

learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=adam(foreach=True),
                metrics=Accuracy()).to_channelslast().profile()
learn.fit_one_cycle(2, 3e-3)
epoch train_loss valid_loss accuracy time
0 1.501953 1.734705 0.472357 00:18
1 1.040516 0.913281 0.712866 00:16
Profiling Results
Phase Action Mean Duration Duration Std Dev Number of Calls Samples/Second Total Time Percent of Total
fit - - 1 - 35.63 s 100%
epoch 17.81 s 838.2ms 2 - 35.63 s 100%
train 14.24 s 797.1ms 2 678 28.49 s 80%
valid 3.565 s 39.48ms 2 1,311 7.130 s 20%
train step 86.62ms 41.67ms 293 739 25.38 s 71%
draw 4.269ms 37.39ms 293 -38 1.251 s 4%
batch 82.35ms 4.472ms 293 777 24.13 s 68%
valid step 43.05ms 63.38ms 123 1,470 5.295 s 15%
draw 14.46ms 60.89ms 123 -744 1.779 s 5%
batch 28.59ms 11.42ms 123 2,214 3.516 s 10%
Batch dropped. train and valid phases show 1 less batch than fit.
learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=adam(foreach=True),
                metrics=Accuracy()).to_channelslast().profile(ProfileMode.Simple)
learn.fit_one_cycle(2, 3e-3)
epoch train_loss valid_loss accuracy time
0 1.497550 2.453694 0.428535 00:17
1 0.997146 0.888791 0.723057 00:17
Profiling Results
Phase Action Mean Duration Duration Std Dev Number of Calls Samples/Second Total Time Percent of Total
fit - - 1 - 34.55 s 100%
epoch 17.27 s 44.73ms 2 - 34.54 s 100%
train 13.64 s 4.756ms 2 709 27.28 s 79%
valid 3.629 s 48.68ms 2 1,291 7.259 s 21%
train step 87.64ms 44.58ms 293 730 25.68 s 74%
draw 4.428ms 39.70ms 293 -39 1.297 s 4%
batch 83.22ms 6.353ms 293 769 24.38 s 71%
forward 16.65ms 5.732ms 293 3,843 4.880 s 14%
loss 771.3µs 196.1µs 293 82,977 226.0ms 1%
backward 19.10ms 5.501ms 293 3,351 5.597 s 16%
opt_step 45.46ms 5.934ms 293 1,408 13.32 s 39%
zero_grad 1.106ms 298.9µs 293 - 324.1ms 1%
valid step 43.94ms 67.12ms 123 1,441 5.404 s 16%
draw 15.77ms 63.35ms 123 -807 1.940 s 6%
batch 28.16ms 11.90ms 123 2,248 3.464 s 10%
predict 26.60ms 11.17ms 123 2,379 3.272 s 9%
loss 1.353ms 1.795ms 123 46,800 166.4ms 0%
Batch dropped. train and valid phases show 1 less batch than fit.

New Training Loop

The show_training_loop output below shows where the new before_draw event fits into the training loop.

learn = synth_learner()
learn.show_training_loop()
Start Fit
   - before_fit     : [TrainEvalCallback, Recorder, ProgressCallback]
  Start Epoch Loop
     - before_epoch   : [Recorder, ProgressCallback]
    Start Train
       - before_train   : [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - before_draw    : []
         - before_batch   : [CastToTensor]
         - after_pred     : []
         - after_loss     : []
         - before_backward: []
         - before_step    : []
         - after_step     : []
         - after_cancel_batch: []
         - after_batch    : [TrainEvalCallback, Recorder, ProgressCallback]
      End Batch Loop
    End Train
     - after_cancel_train: [Recorder]
     - after_train    : [Recorder, ProgressCallback]
    Start Valid
       - before_validate: [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - **CBs same as train batch**: []
      End Batch Loop
    End Valid
     - after_cancel_validate: [Recorder]
     - after_validate : [Recorder, ProgressCallback]
  End Epoch Loop
   - after_cancel_epoch: []
   - after_epoch    : [Recorder]
End Fit
 - after_cancel_fit: []
 - after_fit      : [ProgressCallback]

Logging

Profiler callbacks support logging to Weights & Biases and TensorBoard via the LogDispatch callback. If either the fastai.callback.wandb.WandbCallback or fastai.callback.tensorboard.TensorBoardCallback are added to Learner, will automatically logs samples/second for draw, batch, forward, loss, backward, and opt_step.

If Weights & Biases is installed Simple Profiler also logs two tables to active wandb run:

  • profile_report: formatted report from Simple Profiler
  • profile_results: raw results from Simple Profiler