Population based training

Population Based Training (PBT) allows you to train your models in a smarter way. It takes care of finding not only the best set of hyperparameters, but it also able to find the best hyperparameters schedule during training. For instance, having a fixed learning rate during training is often suboptimal, so PBT helps you find out when and how you should change your learning rate.

If you want to use Population Based Training with Schedy, you only need to know the following:

PBT is an improvement over random search: it is able to focus on the most promising jobs using to strategies:

  • An exploit strategy, in which the least promising jobs are thrown away, and replaced by copies of the most promising ones. This allows you not to waste resources on the wrong jobs.
  • An explore strategy, that tries new values for the hyperparameters of the most promising jobs during training. For instance, this is what allows you to find the optimal learning rate schedule of a neural network.

An example using PBT to finetune an Image Recognition neural network can be found on our GitHub repository.

Creating an experiment

An experiment using Population Based Training can be created this way:

import schedy
db = schedy.SchedyDB()
experiment = schedy.PopulationBasedTraining(
    'MNIST with PBT',
    schedy.pbt.MAXIMIZE,
    'max_accuracy',
    exploit=schedy.pbt.Truncate(),
    explore={
        'learning_rate': schedy.pbt.Perturb(),
        'dropout_rate': schedy.pbt.Perturb(),
    },
    initial_distributions={
        'num_layers': schedy.random.Choice(range(1, 10)),
        'activations': schedy.random.Choice(['relu', 'tanh']),
        'kernel_size': schedy.random.Choice([3, 5, 7]),
        'num_filters': schedy.random.Choice([2, 4, 8, 16, 32, 64, 128, 256, 512]),
        'learning_rate': schedy.random.LogUniform(1e-6, 1e-1),
        'dropout_rate': schedy.random.Uniform(0.0, 0.8),
    },
    population_size=20,
)
db.add_experiment(experiment)

The first argument (MNIST with PBT) is the name of the experiment.

The second argument tells Schedy that we are trying to maximize (schedy.pbt.MAXIMIZE) the result specified in the third argument, the max_accuracy obtained by the network.

The argument called exploit tells us that we are using the Truncate strategy to exploit results (i.e. if we are working on a job that scored in the bottom 20%, explore a job from the top 20% instead, see schedy.pbt.Truncate).

The argument called explore tells us that we are using the Perturb strategy to explore the learning rate and the dropout rate. This strategy multiplies the values of these hyperparameters by a random number (see schedy.pbt.Perturb).

Remember the exploration modifies the value of your hyperparameters during training so you should only use it when it makes sense. For instance, it is possible to change the value of the learning rate while training (it does not change the model in itself), but it is not possible to change the number of layers (it usually does not make sense to create/remove weights while training).

Using schedy.pbt.Truncate as your exploit strategy, and schedy.pbt.Perturb as your explore strategy is usually a sensible default.

The argument called initial_distributions tells Schedy how to pick values for the initial jobs, as those are basically created using random search. The available distributions are the same as the ones used for random search. The next argument, population_size, specifies the number of initial jobs that should be created before starting to exploit/explore.

Note: Specifying the population size and the initial distributions is optional. You can also create the initial jobs by hand, using schedy push in the command line or schedy.Experiment.add_job() . This allows you to choose the initial value of your hyperparameters by hand, instead of using random search.

Creating the worker

Creating a worker that will work efficiently with PBT requires a few more steps than other experiment types (e.g. Random Search).

Let’s have a little reminder. When using random search, the basic worker followed these steps:

import schedy

db = schedy.SchedyDB()
experiment = db.get_experiment('MyExperiment')
with experiment.next_job() as job:
    model = create_model(job)
    train(model) # Full training until convergence
    job.results = evaluate(model)

When using PBT, you should be doing something along those lines instead:

import schedy

db = schedy.SchedyDB()
experiment = db.get_experiment('MyExperiment')
with experiment.next_job() as job:
    model = create_model(job)
    if 'model_path' in job.results:
        model.load(job.results['model_path'])
    partial_train(model) # Partial training for a limited amount of time
    job.results = evaluate(model)
    model_save_path = 'dump_dir/' + job.id + '.mdl'
    model.save(model_save_path)
    job.results['model_path'] = model_save_path

For every job it receives, the worker follows these three simple steps:

  • Try to reload the model if it exists
  • Train the model a bit more
  • Save the model

As you can see, instead of training the model until convergence, you should only train it for a limited amount of time (e.g. five epochs, 30 minutes…). You should then save your model to a location that can be accessed by all workers (here we suppose that all workers have accessed to the dump_dir directory, and we save the model as dump_dir/<job_id>.mdl). You should also record the location of your model into the job’s results.

The reason for this is that Schedy might choose to ask another worker to resume the work on your job later, by copying the job’s hyperparameters and results to a new job, and sending it to a new worker. This is why this worker starts by checking whether there is a result called model_path, and if there is, it reloads the weights from this location.

Everything else is handled by Schedy. All you need to do is to reload the model if it exists, to train it a bit more, then to save it.

We provide examples here, and a more detailed description of the PBT experiments in the API reference, here and here.