DataModule¶

class flash.core.data.data_module.DataModule(train_input=None, val_input=None, test_input=None, predict_input=None, data_fetcher=None, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, val_split=None, batch_size=None, num_workers=0, sampler=None, pin_memory=True, persistent_workers=False)[source]¶

A basic DataModule class for all Flash tasks. This class includes references to a Input and a BaseDataFetcher.

Parameters

train_input¶ (Optional[Input]) – Input dataset for training. Defaults to None.
val_input¶ (Optional[Input]) – Input dataset for validating model performance during training. Defaults to None.
test_input¶ (Optional[Input]) – Input dataset to test model performance. Defaults to None.
predict_input¶ (Optional[Input]) – Input dataset for predicting. Defaults to None.
data_fetcher¶ (Optional[BaseDataFetcher]) – The BaseDataFetcher to attach to the InputTransform. If None, the output from configure_data_fetcher() will be used.
transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
val_split¶ (Optional[float]) – An optional float which gives the relative amount of the training dataset to use for the validation dataset.
batch_size¶ (Optional[int]) – The batch size to be used by the DataLoader.
num_workers¶ (int) – The number of workers to use for parallelized loading.
sampler¶ (Union[Callable, Sampler, Type[Sampler], None]) – A sampler following the Sampler type. Will be passed to the DataLoader for the training dataset. Defaults to None.

Examples

You can provide the sampler to use for the train dataloader using the sampler argument. The sampler can be a function or type that needs the dataset as an argument:

>>> datamodule = DataModule(train_input, sampler=SequentialSampler, batch_size=1)
>>> print(datamodule.train_dataloader().sampler)  
<torch.utils.data.sampler.SequentialSampler object at ...>

Alternatively, you can pass a sampler instance:

>>> datamodule = DataModule(train_input, sampler=WeightedRandomSampler([0.1, 0.5], 2), batch_size=1)
>>> print(datamodule.train_dataloader().sampler)  
<torch.utils.data.sampler.WeightedRandomSampler object at ...>