jdit.dataset

Dataloaders_factory

class jdit.dataset.DataLoadersFactory(root: str, batch_size: int, num_workers=-1, shuffle=True, subdata_size=1)[source]

This is a super class of dataloader.

It defines same basic attributes and methods.

  • For training data: train_dataset, loader_train, nsteps_train . Others such as valid_epoch and test have the same naming format.
  • For transform, you can define your own transforms.
  • If you don’t have test set, it will be replaced by valid_epoch dataset.

It will build dataset following these setps:

  1. build_transforms() To build transforms for training dataset and valid_epoch. You can rewrite this method for your own transform. It will be used in build_datasets()

  2. build_datasets() You must rewrite this method to load your own dataset by passing datasets to self.dataset_train and self.dataset_valid . self.dataset_test is optional. If you don’t pass a test dataset, it will be replaced by self.dataset_valid .

    Example:

    def build_transforms(self, resize=32):
        self.train_transform_list = self.valid_transform_list = [
            transforms.Resize(resize),
            transforms.ToTensor(),
            transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
    # Inherit this class and write this method.
    def build_datasets(self):
        self.dataset_train = datasets.CIFAR10(root, train=True, download=True,
            transform=transforms.Compose(self.train_transform_list))
        self.dataset_valid = datasets.CIFAR10(root, train=False, download=True,
            transform=transforms.Compose(self.valid_transform_list))
    
  3. build_loaders() It will use dataset, and passed parameters to build dataloaders for self.loader_train, self.loader_valid and self.loader_test.

  • root is the root path of datasets.
  • batch_shape is the size of data loader. shape is (Batchsize, Channel, Height, Width)
  • num_workers is the number of threads, using to load data. If you pass -1, it will use the max number of threads, according to your cpu. Default: -1
  • shuffle is whether shuffle the data. Default: True
build_datasets()[source]

You must to rewrite this method to load your own datasets.

  • self.dataset_train . Assign a training dataset to this.
  • self.dataset_valid . Assign a valid_epoch dataset to this.
  • self.dataset_test is optional. Assign a test dataset to this. If not, it will be replaced by self.dataset_valid .

Example:

self.dataset_train = datasets.CIFAR10(root, train=True, download=True,
                                      transform=transforms.Compose(self.train_transform_list))
self.dataset_valid = datasets.CIFAR10(root, train=False, download=True,
                                      transform=transforms.Compose(self.valid_transform_list))
build_loaders()[source]

Build datasets The previous function self.build_datasets() has created datasets. Use these datasets to build their’s dataloaders

build_transforms(resize: int = 32)[source]

This will build transforms for training and valid_epoch.

You can rewrite this method to build your own transforms. Don’t forget to register your transforms to self.train_transform_list and self.valid_transform_list

The following is the default set.

self.train_transform_list = self.valid_transform_list = [
    transforms.Resize(resize),
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]

HandMNIST

class jdit.dataset.HandMNIST(root='datasets/hand_data', batch_size=64, num_workers=-1)[source]

Hand writing mnist dataset.

Example:

>>> data = HandMNIST(r"../datasets/mnist")
use 8 thread!
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!
>>> data.dataset_train
Dataset MNIST
Number of datapoints: 60000
Split: train
Root Location: data
Transforms (if any): Compose(
                         Resize(size=32, interpolation=PIL.Image.BILINEAR)
                         ToTensor()
                         Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
                     )
Target Transforms (if any): None
>>> # We don't set test dataset, so they are the same.
>>> data.dataset_valid is data.dataset_test
True
>>> # Number of steps at batch size 128.
>>> data.nsteps_train
469
>>> # Total samples of training datset.
>>> len(data.dataset_train)
60000
>>> # The batch size of sample load is 1. So, we get length of loader is equal to samples amount.
>>> len(data.samples_train)
6000
build_datasets()[source]

Build datasets by using datasets.MNIST in pytorch

build_transforms(resize: int = 32)[source]

This will build transforms for training and valid_epoch.

You can rewrite this method to build your own transforms. Don’t forget to register your transforms to self.train_transform_list and self.valid_transform_list

The following is the default set.

self.train_transform_list = self.valid_transform_list = [
    transforms.Resize(resize),
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]

FashionMNIST

class jdit.dataset.FashionMNIST(root='datasets/fashion_data', batch_size=64, num_workers=-1)[source]
build_datasets()[source]

You must to rewrite this method to load your own datasets.

  • self.dataset_train . Assign a training dataset to this.
  • self.dataset_valid . Assign a valid_epoch dataset to this.
  • self.dataset_test is optional. Assign a test dataset to this. If not, it will be replaced by self.dataset_valid .

Example:

self.dataset_train = datasets.CIFAR10(root, train=True, download=True,
                                      transform=transforms.Compose(self.train_transform_list))
self.dataset_valid = datasets.CIFAR10(root, train=False, download=True,
                                      transform=transforms.Compose(self.valid_transform_list))
build_transforms(resize: int = 32)[source]

This will build transforms for training and valid_epoch.

You can rewrite this method to build your own transforms. Don’t forget to register your transforms to self.train_transform_list and self.valid_transform_list

The following is the default set.

self.train_transform_list = self.valid_transform_list = [
    transforms.Resize(resize),
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]

Cifar10

class jdit.dataset.Cifar10(root='datasets/cifar10', batch_size=32, num_workers=-1)[source]
build_datasets()[source]

You must to rewrite this method to load your own datasets.

  • self.dataset_train . Assign a training dataset to this.
  • self.dataset_valid . Assign a valid_epoch dataset to this.
  • self.dataset_test is optional. Assign a test dataset to this. If not, it will be replaced by self.dataset_valid .

Example:

self.dataset_train = datasets.CIFAR10(root, train=True, download=True,
                                      transform=transforms.Compose(self.train_transform_list))
self.dataset_valid = datasets.CIFAR10(root, train=False, download=True,
                                      transform=transforms.Compose(self.valid_transform_list))

Lsun

class jdit.dataset.Lsun(root, batch_size=32, num_workers=-1)[source]
build_datasets()[source]

You must to rewrite this method to load your own datasets.

  • self.dataset_train . Assign a training dataset to this.
  • self.dataset_valid . Assign a valid_epoch dataset to this.
  • self.dataset_test is optional. Assign a test dataset to this. If not, it will be replaced by self.dataset_valid .

Example:

self.dataset_train = datasets.CIFAR10(root, train=True, download=True,
                                      transform=transforms.Compose(self.train_transform_list))
self.dataset_valid = datasets.CIFAR10(root, train=False, download=True,
                                      transform=transforms.Compose(self.valid_transform_list))
build_transforms(resize: int = 32)[source]

This will build transforms for training and valid_epoch.

You can rewrite this method to build your own transforms. Don’t forget to register your transforms to self.train_transform_list and self.valid_transform_list

The following is the default set.

self.train_transform_list = self.valid_transform_list = [
    transforms.Resize(resize),
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]

get_mnist_dataloaders

jdit.dataset.get_mnist_dataloaders(root='..\\data', batch_size=128)[source]

MNIST dataloader with (32, 32) sized images.

get_fashion_mnist_dataloaders

jdit.dataset.get_fashion_mnist_dataloaders(root='.\\dataset\\fashion_data', batch_size=128, resize=32, transform_list=None, num_workers=-1)[source]

Fashion MNIST dataloader with (32, 32) sized images.

get_lsun_dataloader

jdit.dataset.get_lsun_dataloader(path_to_data='/data/dgl/LSUN', dataset='bedroom_train', batch_size=64)[source]

LSUN dataloader with (128, 128) sized images.

path_to_data : str
One of ‘bedroom_val’ or ‘bedroom_train’