jdit.trainer¶
SupTrainer¶
-
class
jdit.trainer.
SupTrainer
(nepochs: int, logdir: str, gpu_ids_abs: Union[list, tuple] = ())[source]¶ this is a super class of all trainers
It defines: * The basic tools,
Performance()
,Watcher()
,Loger()
. * The basic loop of epochs. * Learning rate decay and model check point.-
debug
()[source]¶ Debug the trainer.
It will check the function
self._record_configs()
save all module’s configures.self.train_epoch()
train one epoch with several samples. So, it is vary fast.self.valid_epoch()
valid one epoch using dataset_valid.self._change_lr()
do learning rate change.self._check_point()
do model check point.self.test()
do test by using dataset_test.
Before debug, it will reset the
datasets
and only pick up several samples to do fast test. For test, it build alog_debug
directory to save the log.Returns: bool. It will return True
, if passes all the tests.
-
dist_train
(process_bar_header: str = None, process_bar_position: int = None, subbar_disable=False, record_configs=True, show_network=False, **kwargs)[source]¶ The main training loop of epochs.
Parameters: - process_bar_header – The tag name of process bar header,
which is used in
tqdm(desc=process_bar_header)
- process_bar_position – The process bar’s position. It is useful in multitask,
which is used in
tqdm(position=process_bar_position)
- subbar_disable – If show the info of every training set,
- record_configs – If record the training processing data.
- show_network – If show the structure of network. It will cost extra memory,
- kwargs – Any other parameters that passing to
tqdm()
to control the behavior of process bar.
- process_bar_header – The tag name of process bar header,
which is used in
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]¶ Split your data from one batch data to specify . If your dataset return something like
return input_data, label
.It means that two values need unpack. So, you need to split the batch data into two parts, like this
input, ground_truth = batch_data[0], batch_data[1]
Caution
Don’t forget to move these data to device, by using
input.to(device)
.Parameters: - batch_data – One batch data from dataloader.
- device – the device that data will be located.
Returns: The certain variable with correct device location.
Example:
# load and unzip the data from one batch tuple (input, ground_truth) input, ground_truth = batch_data[0], batch_data[1] # move these data to device return input.to(device), ground_truth.to(device)
-
plot_graphs_lazy
()[source]¶ Plot model graph on tensorboard. To plot all models graphs in trainer, by using variable name as model name.
Returns:
-
train
(process_bar_header: str = None, process_bar_position: int = None, subbar_disable=False, record_configs=True, show_network=False, **kwargs)[source]¶ The main training loop of epochs.
Parameters: - process_bar_header – The tag name of process bar header,
which is used in
tqdm(desc=process_bar_header)
- process_bar_position – The process bar’s position. It is useful in multitask,
which is used in
tqdm(position=process_bar_position)
- subbar_disable – If show the info of every training set,
- record_configs – If record the training processing data.
- show_network – If show the structure of network. It will cost extra memory,
- kwargs – Any other parameters that passing to
tqdm()
to control the behavior of process bar.
- process_bar_header – The tag name of process bar header,
which is used in
-
train_epoch
(subbar_disable=False)[source]¶ You get train loader and do a loop to deal with data.
Caution
You must record your training step on
self.step
in your loop by doing things like thisself.step += 1
.Example:
for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)): self.step += 1 self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device) self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
-
Single Model Trainer¶
SupSingleModelTrainer¶
-
class
jdit.trainer.
SupSingleModelTrainer
(logdir, nepochs, gpu_ids_abs, net: jdit.model.Model, opt: jdit.optimizer.Optimizer, datasets: jdit.dataset.DataLoadersFactory)[source]¶ This is a Single Model Trainer. It means you only have one model.
input, gound_truth output = model(input) loss(output, gound_truth)-
compute_loss
() -> (<Mock name='mock.Tensor' id='140662767683008'>, <class 'dict'>)[source]¶ Rewrite this method to compute your own loss Discriminator. Use self.input, self.output and self.ground_truth to compute loss. You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.likeExample:
var_dic = {} var_dic["LOSS"] = loss_d = (self.output ** 2 - self.groundtruth ** 2) ** 0.5 return: loss, var_dic
-
compute_valid
() → dict[source]¶ Rewrite this method to compute your validation values. Use self.input, self.output and self.ground_truth to compute valid loss. You can return a
dict
of validation values that you want to visualize.Example:
# It will do the same thing as ``compute_loss()`` var_dic, _ = self.compute_loss() return var_dic
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
train_epoch
(subbar_disable=False)[source]¶ You get train loader and do a loop to deal with data.
Caution
You must record your training step on
self.step
in your loop by doing things like thisself.step += 1
.Example:
for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)): self.step += 1 self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device) self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):
self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():
self.output = self.net(self.input) dic: dict = self.compute_valid()- if avg_dic == {}:
- avg_dic: dict = dic
- else:
- for key in dic.keys():
- avg_dic[key] += dic[key]
- for key in avg_dic.keys():
- avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid
self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()
-
ClassificationTrainer¶
-
class
jdit.trainer.
ClassificationTrainer
(logdir, nepochs, gpu_ids, net, opt, datasets, num_class)[source]¶ this is a classification trainer.
-
compute_loss
()[source]¶ Compute the main loss and observed values.
Compute the loss and other values shown in tensorboard scalars visualization. You should return a main loss for doing backward propagation.
So, if you want some values visualized. Make a
dict()
with key name is the variable’s name. The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.output = self.net(self.input) self._train_iteration(self.opt, self.compute_loss, csv_filename=”Train”)So, you have self.net, self.input, self.output, self.ground_truth to compute your own loss here.
Note
Only the main loss will do backward propagation, which is the first returned variable. If you have the joint loss, please add them up and return one main loss.
Note
All of your variables in returned
dict()
will never do backward propagation withmodel.train()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute any grads variables for visualization.Example:
var_dic = {} labels = self.ground_truth.squeeze().long() var_dic["MSE"] = loss = nn.MSELoss()(self.output, labels) return loss, var_dic
-
compute_valid
()[source]¶ Compute the valid_epoch variables for visualization.
Compute the validations. For the validations will only be used in tensorboard scalars visualization. So, if you want some variables visualized. Make a
dict()
with key name is the variable’s name. You have self.net, self.input, self.output, self.ground_truth to compute your own validations here.Note
All of your variables in returned
dict()
will never do backward propagation withmodel.eval()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute some grads variables for visualization.- Example::
- var_dic = {} labels = self.ground_truth.squeeze().long() var_dic[“CEP”] = nn.CrossEntropyLoss()(self.output, labels) return var_dic
-
get_data_from_batch
(batch_data, device)[source]¶ If you have different behavior. You need to rewrite thisd method and the method sllf.train_epoch()
Parameters: - batch_data – A Tensor loads from dataset
- device – compute device
Returns: Tensors,
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):
self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():
self.output = self.net(self.input) dic: dict = self.compute_valid()- if avg_dic == {}:
- avg_dic: dict = dic
- else:
- for key in dic.keys():
- avg_dic[key] += dic[key]
- for key in avg_dic.keys():
- avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid
self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()
-
AutoEncoderTrainer¶
-
class
jdit.trainer.
AutoEncoderTrainer
(logdir, nepochs, gpu_ids, net, opt, datasets)[source]¶ this is a autoencoder-decoder trainer. Image to Image
-
compute_loss
()[source]¶ Compute the main loss and observed values.
Compute the loss and other values shown in tensorboard scalars visualization. You should return a main loss for doing backward propagation.
So, if you want some values visualized. Make a
dict()
with key name is the variable’s name. The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.output = self.net(self.input) self._train_iteration(self.opt, self.compute_loss, csv_filename=”Train”)So, you have self.net, self.input, self.output, self.ground_truth to compute your own loss here.
Note
Only the main loss will do backward propagation, which is the first returned variable. If you have the joint loss, please add them up and return one main loss.
Note
All of your variables in returned
dict()
will never do backward propagation withmodel.train()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute any grads variables for visualization.Example:
var_dic = {} var_dic["CEP"] = loss = nn.MSELoss(reduction="mean")(self.output, self.ground_truth) return loss, var_dic
-
compute_valid
()[source]¶ Compute the valid_epoch variables for visualization.
Compute the caring variables. For the caring variables will only be used in tensorboard scalars visualization. So, if you want some variables visualized. Make a
dict()
with key name is the variable’s name.Note
All of your variables in returned
dict()
will never do backward propagation withmodel.eval()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute some grads variables for visualization.- Example::
- var_dic = {} var_dic[“CEP”] = loss = nn.MSELoss(reduction=”mean”)(self.output, self.ground_truth) return var_dic
-
get_data_from_batch
(batch_data, device)[source]¶ If you have different behavior. You need to rewrite thisd method and the method sllf.train_epoch()
Parameters: - batch_data – A Tensor loads from dataset
- device – compute device
Returns: Tensors,
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):
self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():
self.output = self.net(self.input) dic: dict = self.compute_valid()- if avg_dic == {}:
- avg_dic: dict = dic
- else:
- for key in dic.keys():
- avg_dic[key] += dic[key]
- for key in avg_dic.keys():
- avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid
self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()
-
Generative Adversarial Networks Trainer¶
SupGanTrainer¶
-
class
jdit.trainer.
SupGanTrainer
(logdir, nepochs, gpu_ids_abs, netG: jdit.model.Model, netD: jdit.model.Model, optG: jdit.optimizer.Optimizer, optD: jdit.optimizer.Optimizer, datasets: jdit.dataset.DataLoadersFactory)[source]¶ -
compute_d_loss
() -> (<Mock name='mock.Tensor' id='140662767683008'>, <class 'dict'>)[source]¶ Rewrite this method to compute your own loss Discriminator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.likeExample:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["GP"] = gp = gradPenalty(self.netD, self.ground_truth, self.fake, input=self.input, use_gpu=self.use_gpu) var_dic["WD"] = w_distance = (d_real.mean() - d_fake.mean()).detach() var_dic["LOSS_D"] = loss_d = d_fake.mean() - d_real.mean() + gp + sgp return: loss_d, var_dic
-
compute_g_loss
() -> (<Mock name='mock.Tensor' id='140662767683008'>, <class 'dict'>)[source]¶ Rewrite this method to compute your own loss of Generator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.likeExample:
d_fake = self.netD(self.fake) var_dic = {} var_dic["JC"] = jc = jcbClamp(self.netG, self.input, use_gpu=self.use_gpu) var_dic["LOSS_D"] = loss_g = -d_fake.mean() + jc return: loss_g, var_dic
-
compute_valid
() → dict[source]¶ Rewrite this method to compute your validation values.
You can return a
dict
of validation values that you want to visualize.Example:
# It will do the same thing as ``compute_g_loss()`` and ``self.compute_d_loss()`` g_loss, _ = self.compute_g_loss() d_loss, _ = self.compute_d_loss() var_dic = {"LOSS_D": d_loss, "LOSS_G": g_loss} return var_dic
-
d_turn
= 1¶ The training times of Discriminator every ones Generator training.
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
train_epoch
(subbar_disable=False)[source]¶ You get train loader and do a loop to deal with data.
Caution
You must record your training step on
self.step
in your loop by doing things like thisself.step += 1
.Example:
for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)): self.step += 1 self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device) self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.netG.eval() self.netD.eval() # Load data from loader_valid. for iteration, batch in enumerate(self.datasets.loader_valid, 1): self.input, self.ground_truth = self.get_data_from_batch(batch) with torch.no_grad(): self.fake = self.netG(self.input) # You can write this function to apply your computation. dic: dict = self.compute_valid() if avg_dic == {}: avg_dic: dict = dic else: for key in dic.keys(): avg_dic[key] += dic[key] for key in avg_dic.keys(): avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid self.watcher.scalars(avg_dic, self.step, tag="Valid") self._watch_images(tag="Valid") self.netG.train() self.netD.train()
-
Pix2pixGanTrainer¶
-
class
jdit.trainer.
Pix2pixGanTrainer
(logdir, nepochs, gpu_ids_abs, netG, netD, optG, optD, datasets)[source]¶ -
compute_d_loss
()[source]¶ Rewrite this method to compute your own loss Discriminator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["LS_LOSSD"] = loss_d = 0.5 * (torch.mean((d_real - 1) ** 2) + torch.mean(d_fake ** 2)) return loss_d, var_dic
-
compute_g_loss
()[source]¶ Rewrite this method to compute your own loss of Generator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake, self.input) var_dic = {} var_dic["LS_LOSSG"] = loss_g = 0.5 * torch.mean((d_fake - 1) ** 2) return loss_g, var_dic
-
compute_valid
()[source]¶ Rewrite this method to compute valid_epoch values.
You can return a
dict
of values that you want to visualize.Note
This method is under
torch.no_grad():
. So, it will never compute grad. If you want to compute grad, please usetorch.enable_grad():
to wrap your operations.Example:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["WD"] = w_distance = (d_real.mean() - d_fake.mean()).detach() return var_dic
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
test
()[source]¶ Test your model when you finish all epochs.
This method will call when all epochs finish.
Example:
for index, batch in enumerate(self.datasets.loader_test, 1): # For test only have input without groundtruth input = batch.to(self.device) self.netG.eval() with torch.no_grad(): fake = self.netG(input) self.watcher.image(fake, self.current_epoch, tag="Test/fake", grid_size=(4, 4), shuffle=False) self.netG.train()
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.netG.eval() self.netD.eval() # Load data from loader_valid. for iteration, batch in enumerate(self.datasets.loader_valid, 1): self.input, self.ground_truth = self.get_data_from_batch(batch) with torch.no_grad(): self.fake = self.netG(self.input) # You can write this function to apply your computation. dic: dict = self.compute_valid() if avg_dic == {}: avg_dic: dict = dic else: for key in dic.keys(): avg_dic[key] += dic[key] for key in avg_dic.keys(): avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid self.watcher.scalars(avg_dic, self.step, tag="Valid") self._watch_images(tag="Valid") self.netG.train() self.netD.train()
-
GenerateGanTrainer¶
-
class
jdit.trainer.
GenerateGanTrainer
(logdir, nepochs, gpu_ids_abs, netG, netD, optG, optD, datasets, latent_shape)[source]¶ -
compute_d_loss
()[source]¶ Rewrite this method to compute your own loss Discriminator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The train logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["LS_LOSSD"] = loss_d = 0.5 * (torch.mean((d_real - 1) ** 2) + torch.mean(d_fake ** 2)) return loss_d, var_dic
-
compute_g_loss
()[source]¶ Rewrite this method to compute your own loss of Generator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The train logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake, self.input) var_dic = {} var_dic["LS_LOSSG"] = loss_g = 0.5 * torch.mean((d_fake - 1) ** 2) return loss_g, var_dic
-
compute_valid
()[source]¶ The train logic is : self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute validations.
Returns:
-
d_turn
= 1¶ The training times of Discriminator every ones Generator training.
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140662772174632'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.netG.eval() self.netD.eval() # Load data from loader_valid. for iteration, batch in enumerate(self.datasets.loader_valid, 1): self.input, self.ground_truth = self.get_data_from_batch(batch) with torch.no_grad(): self.fake = self.netG(self.input) # You can write this function to apply your computation. dic: dict = self.compute_valid() if avg_dic == {}: avg_dic: dict = dic else: for key in dic.keys(): avg_dic[key] += dic[key] for key in avg_dic.keys(): avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid self.watcher.scalars(avg_dic, self.step, tag="Valid") self._watch_images(tag="Valid") self.netG.train() self.netD.train()
-