jdit.trainer

SupTrainer

class jdit.trainer.SupTrainer(nepochs: int, logdir: str, gpu_ids_abs: Union[list, tuple] = ())[source]

this is a super class of all trainers

It defines: * The basic tools, Performance(), Watcher(), Loger(). * The basic loop of epochs. * Learning rate decay and model check point.

debug()[source]

Debug the trainer.

It will check the function

  • self._record_configs() save all module’s configures.
  • self.train_epoch() train one epoch with several samples. So, it is vary fast.
  • self.valid_epoch() valid one epoch using dataset_valid.
  • self._change_lr() do learning rate change.
  • self._check_point() do model check point.
  • self.test() do test by using dataset_test.

Before debug, it will reset the datasets and only pick up several samples to do fast test. For test, it build a log_debug directory to save the log.

Returns:bool. It will return True, if passes all the tests.
dist_train(process_bar_header: str = None, process_bar_position: int = None, subbar_disable=False, record_configs=True, show_network=False, **kwargs)[source]

The main training loop of epochs.

Parameters:
  • process_bar_header – The tag name of process bar header, which is used in tqdm(desc=process_bar_header)
  • process_bar_position – The process bar’s position. It is useful in multitask, which is used in tqdm(position=process_bar_position)
  • subbar_disable – If show the info of every training set,
  • record_configs – If record the training processing data.
  • show_network – If show the structure of network. It will cost extra memory,
  • kwargs – Any other parameters that passing to tqdm() to control the behavior of process bar.
get_data_from_batch(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]

Split your data from one batch data to specify . If your dataset return something like

return input_data, label.

It means that two values need unpack. So, you need to split the batch data into two parts, like this

input, ground_truth = batch_data[0], batch_data[1]

Caution

Don’t forget to move these data to device, by using input.to(device) .

Parameters:
  • batch_data – One batch data from dataloader.
  • device – the device that data will be located.
Returns:

The certain variable with correct device location.

Example:

# load and unzip the data from one batch tuple (input, ground_truth)
input, ground_truth = batch_data[0], batch_data[1]
# move these data to device
return input.to(device), ground_truth.to(device)
plot_graphs_lazy()[source]

Plot model graph on tensorboard. To plot all models graphs in trainer, by using variable name as model name.

Returns:
train(process_bar_header: str = None, process_bar_position: int = None, subbar_disable=False, record_configs=True, show_network=False, **kwargs)[source]

The main training loop of epochs.

Parameters:
  • process_bar_header – The tag name of process bar header, which is used in tqdm(desc=process_bar_header)
  • process_bar_position – The process bar’s position. It is useful in multitask, which is used in tqdm(position=process_bar_position)
  • subbar_disable – If show the info of every training set,
  • record_configs – If record the training processing data.
  • show_network – If show the structure of network. It will cost extra memory,
  • kwargs – Any other parameters that passing to tqdm() to control the behavior of process bar.
train_epoch(subbar_disable=False)[source]

You get train loader and do a loop to deal with data.

Caution

You must record your training step on self.step in your loop by doing things like this self.step += 1.

Example:

for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)):
    self.step += 1
    self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device)
    self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:

Single Model Trainer

SupSingleModelTrainer

class jdit.trainer.SupSingleModelTrainer(logdir, nepochs, gpu_ids_abs, net: jdit.model.Model, opt: jdit.optimizer.Optimizer, datasets: jdit.dataset.DataLoadersFactory)[source]

This is a Single Model Trainer. It means you only have one model.

input, gound_truth output = model(input) loss(output, gound_truth)
compute_loss() -> (<Mock name='mock.Tensor' id='140662767683008'>, <class 'dict'>)[source]

Rewrite this method to compute your own loss Discriminator. Use self.input, self.output and self.ground_truth to compute loss. You should return a loss for the first position. You can return a dict of loss that you want to visualize on the second position.like

Example:

var_dic = {}
var_dic["LOSS"] = loss_d = (self.output ** 2 - self.groundtruth ** 2) ** 0.5
return: loss, var_dic
compute_valid() → dict[source]

Rewrite this method to compute your validation values. Use self.input, self.output and self.ground_truth to compute valid loss. You can return a dict of validation values that you want to visualize.

Example:

# It will do the same thing as ``compute_loss()``
var_dic, _ = self.compute_loss()
return var_dic
get_data_from_batch(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]

Load and wrap data from the data lodaer.

Split your one batch data to specify variable.

Example:

# batch_data like this [input_Data, ground_truth_Data]
input_cpu, ground_truth_cpu = batch_data[0], batch_data[1]
# then move them to device and return them
return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters:
  • batch_data – one batch data load from DataLoader
  • device – A device variable. torch.device
Returns:

input Tensor, ground_truth Tensor

train_epoch(subbar_disable=False)[source]

You get train loader and do a loop to deal with data.

Caution

You must record your training step on self.step in your loop by doing things like this self.step += 1.

Example:

for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)):
    self.step += 1
    self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device)
    self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
valid_epoch()[source]

Validate model each epoch.

It will be called each epoch, when training finish. So, do same verification here.

Example:

avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():

self.output = self.net(self.input) dic: dict = self.compute_valid()
if avg_dic == {}:
avg_dic: dict = dic
else:
for key in dic.keys():
avg_dic[key] += dic[key]
for key in avg_dic.keys():
avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid

self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()

ClassificationTrainer

class jdit.trainer.ClassificationTrainer(logdir, nepochs, gpu_ids, net, opt, datasets, num_class)[source]

this is a classification trainer.

compute_loss()[source]

Compute the main loss and observed values.

Compute the loss and other values shown in tensorboard scalars visualization. You should return a main loss for doing backward propagation.

So, if you want some values visualized. Make a dict() with key name is the variable’s name. The training logic is :

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.output = self.net(self.input) self._train_iteration(self.opt, self.compute_loss, csv_filename=”Train”)

So, you have self.net, self.input, self.output, self.ground_truth to compute your own loss here.

Note

Only the main loss will do backward propagation, which is the first returned variable. If you have the joint loss, please add them up and return one main loss.

Note

All of your variables in returned dict() will never do backward propagation with model.train(). However, It still compute grads, without using with torch.autograd.no_grad(). So, you can compute any grads variables for visualization.

Example:

var_dic = {}
labels = self.ground_truth.squeeze().long()
var_dic["MSE"] = loss = nn.MSELoss()(self.output, labels)
return loss, var_dic
compute_valid()[source]

Compute the valid_epoch variables for visualization.

Compute the validations. For the validations will only be used in tensorboard scalars visualization. So, if you want some variables visualized. Make a dict() with key name is the variable’s name. You have self.net, self.input, self.output, self.ground_truth to compute your own validations here.

Note

All of your variables in returned dict() will never do backward propagation with model.eval(). However, It still compute grads, without using with torch.autograd.no_grad(). So, you can compute some grads variables for visualization.

Example::
var_dic = {} labels = self.ground_truth.squeeze().long() var_dic[“CEP”] = nn.CrossEntropyLoss()(self.output, labels) return var_dic
get_data_from_batch(batch_data, device)[source]

If you have different behavior. You need to rewrite thisd method and the method sllf.train_epoch()

Parameters:
  • batch_data – A Tensor loads from dataset
  • device – compute device
Returns:

Tensors,

valid_epoch()[source]

Validate model each epoch.

It will be called each epoch, when training finish. So, do same verification here.

Example:

avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():

self.output = self.net(self.input) dic: dict = self.compute_valid()
if avg_dic == {}:
avg_dic: dict = dic
else:
for key in dic.keys():
avg_dic[key] += dic[key]
for key in avg_dic.keys():
avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid

self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()

AutoEncoderTrainer

class jdit.trainer.AutoEncoderTrainer(logdir, nepochs, gpu_ids, net, opt, datasets)[source]

this is a autoencoder-decoder trainer. Image to Image

compute_loss()[source]

Compute the main loss and observed values.

Compute the loss and other values shown in tensorboard scalars visualization. You should return a main loss for doing backward propagation.

So, if you want some values visualized. Make a dict() with key name is the variable’s name. The training logic is :

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.output = self.net(self.input) self._train_iteration(self.opt, self.compute_loss, csv_filename=”Train”)

So, you have self.net, self.input, self.output, self.ground_truth to compute your own loss here.

Note

Only the main loss will do backward propagation, which is the first returned variable. If you have the joint loss, please add them up and return one main loss.

Note

All of your variables in returned dict() will never do backward propagation with model.train(). However, It still compute grads, without using with torch.autograd.no_grad(). So, you can compute any grads variables for visualization.

Example:

var_dic = {}
var_dic["CEP"] = loss = nn.MSELoss(reduction="mean")(self.output, self.ground_truth)
return loss, var_dic
compute_valid()[source]

Compute the valid_epoch variables for visualization.

Compute the caring variables. For the caring variables will only be used in tensorboard scalars visualization. So, if you want some variables visualized. Make a dict() with key name is the variable’s name.

Note

All of your variables in returned dict() will never do backward propagation with model.eval(). However, It still compute grads, without using with torch.autograd.no_grad(). So, you can compute some grads variables for visualization.

Example::
var_dic = {} var_dic[“CEP”] = loss = nn.MSELoss(reduction=”mean”)(self.output, self.ground_truth) return var_dic
get_data_from_batch(batch_data, device)[source]

If you have different behavior. You need to rewrite thisd method and the method sllf.train_epoch()

Parameters:
  • batch_data – A Tensor loads from dataset
  • device – compute device
Returns:

Tensors,

valid_epoch()[source]

Validate model each epoch.

It will be called each epoch, when training finish. So, do same verification here.

Example:

avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():

self.output = self.net(self.input) dic: dict = self.compute_valid()
if avg_dic == {}:
avg_dic: dict = dic
else:
for key in dic.keys():
avg_dic[key] += dic[key]
for key in avg_dic.keys():
avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid

self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()

Generative Adversarial Networks Trainer

SupGanTrainer

class jdit.trainer.SupGanTrainer(logdir, nepochs, gpu_ids_abs, netG: jdit.model.Model, netD: jdit.model.Model, optG: jdit.optimizer.Optimizer, optD: jdit.optimizer.Optimizer, datasets: jdit.dataset.DataLoadersFactory)[source]
compute_d_loss() -> (<Mock name='mock.Tensor' id='140662767683008'>, <class 'dict'>)[source]

Rewrite this method to compute your own loss Discriminator.

You should return a loss for the first position. You can return a dict of loss that you want to visualize on the second position.like

Example:

d_fake = self.netD(self.fake.detach())
d_real = self.netD(self.ground_truth)
var_dic = {}
var_dic["GP"] = gp = gradPenalty(self.netD, self.ground_truth, self.fake, input=self.input,
                                 use_gpu=self.use_gpu)
var_dic["WD"] = w_distance = (d_real.mean() - d_fake.mean()).detach()
var_dic["LOSS_D"] = loss_d = d_fake.mean() - d_real.mean() + gp + sgp
return: loss_d, var_dic
compute_g_loss() -> (<Mock name='mock.Tensor' id='140662767683008'>, <class 'dict'>)[source]

Rewrite this method to compute your own loss of Generator.

You should return a loss for the first position. You can return a dict of loss that you want to visualize on the second position.like

Example:

d_fake = self.netD(self.fake)
var_dic = {}
var_dic["JC"] = jc = jcbClamp(self.netG, self.input, use_gpu=self.use_gpu)
var_dic["LOSS_D"] = loss_g = -d_fake.mean() + jc
return: loss_g, var_dic
compute_valid() → dict[source]

Rewrite this method to compute your validation values.

You can return a dict of validation values that you want to visualize.

Example:

# It will do the same thing as ``compute_g_loss()`` and ``self.compute_d_loss()``
g_loss, _ = self.compute_g_loss()
d_loss, _ = self.compute_d_loss()
var_dic = {"LOSS_D": d_loss, "LOSS_G": g_loss}
return var_dic
d_turn = 1

The training times of Discriminator every ones Generator training.

get_data_from_batch(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]

Load and wrap data from the data lodaer.

Split your one batch data to specify variable.

Example:

# batch_data like this [input_Data, ground_truth_Data]
input_cpu, ground_truth_cpu = batch_data[0], batch_data[1]
# then move them to device and return them
return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters:
  • batch_data – one batch data load from DataLoader
  • device – A device variable. torch.device
Returns:

input Tensor, ground_truth Tensor

train_epoch(subbar_disable=False)[source]

You get train loader and do a loop to deal with data.

Caution

You must record your training step on self.step in your loop by doing things like this self.step += 1.

Example:

for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)):
    self.step += 1
    self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device)
    self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
valid_epoch()[source]

Validate model each epoch.

It will be called each epoch, when training finish. So, do same verification here.

Example:

avg_dic: dict = {}
self.netG.eval()
self.netD.eval()
# Load data from loader_valid.
for iteration, batch in enumerate(self.datasets.loader_valid, 1):
    self.input, self.ground_truth = self.get_data_from_batch(batch)
    with torch.no_grad():
        self.fake = self.netG(self.input)
        # You can write this function to apply your computation.
        dic: dict = self.compute_valid()
    if avg_dic == {}:
        avg_dic: dict = dic
    else:
        for key in dic.keys():
            avg_dic[key] += dic[key]

for key in avg_dic.keys():
    avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid

self.watcher.scalars(avg_dic, self.step, tag="Valid")
self._watch_images(tag="Valid")
self.netG.train()
self.netD.train()

Pix2pixGanTrainer

class jdit.trainer.Pix2pixGanTrainer(logdir, nepochs, gpu_ids_abs, netG, netD, optG, optD, datasets)[source]
compute_d_loss()[source]

Rewrite this method to compute your own loss Discriminator.

You should return a loss for the first position. You can return a dict of loss that you want to visualize on the second position.like The training logic is :

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:

self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)

So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:

d_fake = self.netD(self.fake.detach())
d_real = self.netD(self.ground_truth)
var_dic = {}
var_dic["LS_LOSSD"] = loss_d = 0.5 * (torch.mean((d_real - 1) ** 2) + torch.mean(d_fake ** 2))
return loss_d, var_dic
compute_g_loss()[source]

Rewrite this method to compute your own loss of Generator.

You should return a loss for the first position. You can return a dict of loss that you want to visualize on the second position.like The training logic is :

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:

self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)

So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:

d_fake = self.netD(self.fake, self.input)
var_dic = {}
var_dic["LS_LOSSG"] = loss_g = 0.5 * torch.mean((d_fake - 1) ** 2)
return loss_g, var_dic
compute_valid()[source]

Rewrite this method to compute valid_epoch values.

You can return a dict of values that you want to visualize.

Note

This method is under torch.no_grad():. So, it will never compute grad. If you want to compute grad, please use torch.enable_grad(): to wrap your operations.

Example:

d_fake = self.netD(self.fake.detach())
d_real = self.netD(self.ground_truth)
var_dic = {}
var_dic["WD"] = w_distance = (d_real.mean() - d_fake.mean()).detach()
return var_dic
get_data_from_batch(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]

Load and wrap data from the data lodaer.

Split your one batch data to specify variable.

Example:

# batch_data like this [input_Data, ground_truth_Data]
input_cpu, ground_truth_cpu = batch_data[0], batch_data[1]
# then move them to device and return them
return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters:
  • batch_data – one batch data load from DataLoader
  • device – A device variable. torch.device
Returns:

input Tensor, ground_truth Tensor

test()[source]

Test your model when you finish all epochs.

This method will call when all epochs finish.

Example:

for index, batch in enumerate(self.datasets.loader_test, 1):
    # For test only have input without groundtruth
    input = batch.to(self.device)
    self.netG.eval()
    with torch.no_grad():
        fake = self.netG(input)
    self.watcher.image(fake, self.current_epoch, tag="Test/fake", grid_size=(4, 4), shuffle=False)
self.netG.train()
valid_epoch()[source]

Validate model each epoch.

It will be called each epoch, when training finish. So, do same verification here.

Example:

avg_dic: dict = {}
self.netG.eval()
self.netD.eval()
# Load data from loader_valid.
for iteration, batch in enumerate(self.datasets.loader_valid, 1):
    self.input, self.ground_truth = self.get_data_from_batch(batch)
    with torch.no_grad():
        self.fake = self.netG(self.input)
        # You can write this function to apply your computation.
        dic: dict = self.compute_valid()
    if avg_dic == {}:
        avg_dic: dict = dic
    else:
        for key in dic.keys():
            avg_dic[key] += dic[key]

for key in avg_dic.keys():
    avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid

self.watcher.scalars(avg_dic, self.step, tag="Valid")
self._watch_images(tag="Valid")
self.netG.train()
self.netD.train()

GenerateGanTrainer

class jdit.trainer.GenerateGanTrainer(logdir, nepochs, gpu_ids_abs, netG, netD, optG, optD, datasets, latent_shape)[source]
compute_d_loss()[source]

Rewrite this method to compute your own loss Discriminator.

You should return a loss for the first position. You can return a dict of loss that you want to visualize on the second position.like The train logic is :

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:

self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)

So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:

d_fake = self.netD(self.fake.detach())
d_real = self.netD(self.ground_truth)
var_dic = {}
var_dic["LS_LOSSD"] = loss_d = 0.5 * (torch.mean((d_real - 1) ** 2) + torch.mean(d_fake ** 2))
return loss_d, var_dic
compute_g_loss()[source]

Rewrite this method to compute your own loss of Generator.

You should return a loss for the first position. You can return a dict of loss that you want to visualize on the second position.like The train logic is :

self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:

self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)

So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:

d_fake = self.netD(self.fake, self.input)
var_dic = {}
var_dic["LS_LOSSG"] = loss_g = 0.5 * torch.mean((d_fake - 1) ** 2)
return loss_g, var_dic
compute_valid()[source]

The train logic is : self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:

self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)

So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute validations.

Returns:
d_turn = 1

The training times of Discriminator every ones Generator training.

get_data_from_batch(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]

Load and wrap data from the data lodaer.

Split your one batch data to specify variable.

Example:

# batch_data like this [input_Data, ground_truth_Data]
input_cpu, ground_truth_cpu = batch_data[0], batch_data[1]
# then move them to device and return them
return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters:
  • batch_data – one batch data load from DataLoader
  • device – A device variable. torch.device
Returns:

input Tensor, ground_truth Tensor

valid_epoch()[source]

Validate model each epoch.

It will be called each epoch, when training finish. So, do same verification here.

Example:

avg_dic: dict = {}
self.netG.eval()
self.netD.eval()
# Load data from loader_valid.
for iteration, batch in enumerate(self.datasets.loader_valid, 1):
    self.input, self.ground_truth = self.get_data_from_batch(batch)
    with torch.no_grad():
        self.fake = self.netG(self.input)
        # You can write this function to apply your computation.
        dic: dict = self.compute_valid()
    if avg_dic == {}:
        avg_dic: dict = dic
    else:
        for key in dic.keys():
            avg_dic[key] += dic[key]

for key in avg_dic.keys():
    avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid

self.watcher.scalars(avg_dic, self.step, tag="Valid")
self._watch_images(tag="Valid")
self.netG.train()
self.netD.train()

instances

instances.FashionClassTrainer

jdit.trainer.instances.start_fashionClassTrainer(gpus=(), nepochs=10, run_type='train')[source]

” An example of fashion-mnist classification