jdit.dataset¶
Dataloaders_factory¶
-
class
jdit.dataset.
DataLoadersFactory
(root: str, batch_size: int, num_workers=-1, shuffle=True, subdata_size=1)[source]¶ This is a super class of dataloader.
It defines same basic attributes and methods.
- For training data:
train_dataset
,loader_train
,nsteps_train
. Others such asvalid_epoch
andtest
have the same naming format. - For transform, you can define your own transforms.
- If you don’t have test set, it will be replaced by valid_epoch dataset.
It will build dataset following these setps:
build_transforms()
To build transforms for training dataset and valid_epoch. You can rewrite this method for your own transform. It will be used inbuild_datasets()
build_datasets()
You must rewrite this method to load your own dataset by passing datasets toself.dataset_train
andself.dataset_valid
.self.dataset_test
is optional. If you don’t pass a test dataset, it will be replaced byself.dataset_valid
.Example:
def build_transforms(self, resize=32): self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])] # Inherit this class and write this method. def build_datasets(self): self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
build_loaders()
It will use dataset, and passed parameters to build dataloaders forself.loader_train
,self.loader_valid
andself.loader_test
.
root
is the root path of datasets.batch_shape
is the size of data loader. shape is(Batchsize, Channel, Height, Width)
num_workers
is the number of threads, using to load data. If you pass -1, it will use the max number of threads, according to your cpu. Default: -1shuffle
is whether shuffle the data. Default:True
-
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
build_loaders
()[source]¶ Build datasets The previous function
self.build_datasets()
has created datasets. Use these datasets to build their’s dataloaders
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
- For training data:
HandMNIST¶
-
class
jdit.dataset.
HandMNIST
(root='datasets/hand_data', batch_size=64, num_workers=-1)[source]¶ Hand writing mnist dataset.
Example:
>>> data = HandMNIST(r"../datasets/mnist") use 8 thread! Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz Processing... Done! >>> data.dataset_train Dataset MNIST Number of datapoints: 60000 Split: train Root Location: data Transforms (if any): Compose( Resize(size=32, interpolation=PIL.Image.BILINEAR) ToTensor() Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) ) Target Transforms (if any): None >>> # We don't set test dataset, so they are the same. >>> data.dataset_valid is data.dataset_test True >>> # Number of steps at batch size 128. >>> data.nsteps_train 469 >>> # Total samples of training datset. >>> len(data.dataset_train) 60000 >>> # The batch size of sample load is 1. So, we get length of loader is equal to samples amount. >>> len(data.samples_train) 6000
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
-
FashionMNIST¶
-
class
jdit.dataset.
FashionMNIST
(root='datasets/fashion_data', batch_size=64, num_workers=-1)[source]¶ -
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
-
Cifar10¶
-
class
jdit.dataset.
Cifar10
(root='datasets/cifar10', batch_size=32, num_workers=-1)[source]¶ -
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
Lsun¶
-
class
jdit.dataset.
Lsun
(root, batch_size=32, num_workers=-1)[source]¶ -
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
-