TabularRegressionData¶

class flash.tabular.regression.data.TabularRegressionData(train_input=None, val_input=None, test_input=None, predict_input=None, data_fetcher=None, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, val_split=None, batch_size=None, num_workers=0, sampler=None, pin_memory=True, persistent_workers=False)[source]¶

The TabularRegressionData class is a DataModule with a set of classmethods for loading data for tabular regression.

classmethod from_csv(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_file=None, val_file=None, test_file=None, predict_file=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionCSVInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶

Creates a TabularRegressionData object from the given CSV files.

Note

The categorical_fields, numerical_fields, and target_field do not need to be provided if parameters are passed instead. These can be obtained from the parameters attribute of the TabularData object that contains your training data.

The targets will be extracted from the target_field in the CSV files. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

categorical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the CSV files containing categorical data.
numerical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the CSV files containing numerical data.
target_field¶ (Optional[str]) – The field (column name) in the CSV files containing the targets.
parameters¶ (Optional[Dict[str, Any]]) – Parameters to use if categorical_fields, numerical_fields, and target_field are not provided (e.g. when loading data for inference or validation).
train_file¶ (Optional[str]) – The path to the CSV file to use when training.
val_file¶ (Optional[str]) – The path to the CSV file to use when validating.
test_file¶ (Optional[str]) – The path to the CSV file to use when testing.
predict_file¶ (Optional[str]) – The path to the CSV file to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
data_module_kwargs¶ (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

TabularRegressionData

Returns

The constructed TabularRegressionData.

Examples

The files can be in Comma Separated Values (CSV) format with either a .csv or .txt extension.

We have a train_data.csv with the following contents:

age,animal,weight
2,cat,6
4,dog,10
1,cat,5

and a predict_data.csv with the following contents:

animal,weight
dog,7
dog,12
cat,5

>>> from flash import Trainer
>>> from flash.tabular import TabularRegressor, TabularRegressionData
>>> datamodule = TabularRegressionData.from_csv(
...     "animal",
...     "weight",
...     "age",
...     train_file="train_data.csv",
...     predict_file="predict_data.csv",
...     batch_size=4,
... )
>>> model = TabularRegressor.from_data(datamodule, backbone="tabnet")
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

Alternatively, the files can be in Tab Separated Values (TSV) format with a .tsv extension.

We have a train_data.tsv with the following contents:

age animal  weight
 cat     6
 dog     10
 cat     5

and a predict_data.tsv with the following contents:

animal  weight
dog     7
dog     12
cat     5

>>> from flash import Trainer
>>> from flash.tabular import TabularRegressor, TabularRegressionData
>>> datamodule = TabularRegressionData.from_csv(
...     "animal",
...     "weight",
...     "age",
...     train_file="train_data.tsv",
...     predict_file="predict_data.tsv",
...     batch_size=4,
... )
>>> model = TabularRegressor.from_data(datamodule, backbone="tabnet")
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

classmethod from_data_frame(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_data_frame=None, val_data_frame=None, test_data_frame=None, predict_data_frame=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionDataFrameInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶

Creates a TabularRegressionData object from the given data frames.

Note

The categorical_fields, numerical_fields, and target_field do not need to be provided if parameters are passed instead. These can be obtained from the parameters attribute of the TabularData object that contains your training data.

The targets will be extracted from the target_field in the data frames. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

categorical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the data frames containing categorical data.
numerical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the data frames containing numerical data.
target_field¶ (Optional[str]) – The field (column name) in the data frames containing the targets.
parameters¶ (Optional[Dict[str, Any]]) – Parameters to use if categorical_fields, numerical_fields, and target_field are not provided (e.g. when loading data for inference or validation).
train_data_frame¶ (Optional[DataFrame]) – The DataFrame to use when training.
val_data_frame¶ (Optional[DataFrame]) – The DataFrame to use when validating.
test_data_frame¶ (Optional[DataFrame]) – The DataFrame to use when testing.
predict_data_frame¶ (Optional[DataFrame]) – The DataFrame to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
data_module_kwargs¶ (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

TabularRegressionData

Returns

The constructed TabularRegressionData.

Examples

We have a DataFrame train_data with the following contents:

>>> train_data.head(3)
   age animal  weight
0    2    cat       6
1    4    dog      10
2    1    cat       5

and a DataFrame predict_data with the following contents:

>>> predict_data.head(3)
  animal  weight
0    dog       7
1    dog      12
2    cat       5

>>> from flash import Trainer
>>> from flash.tabular import TabularRegressor, TabularRegressionData
>>> datamodule = TabularRegressionData.from_data_frame(
...     "animal",
...     "weight",
...     "age",
...     train_data_frame=train_data,
...     predict_data_frame=predict_data,
...     batch_size=4,
... )
>>> model = TabularRegressor.from_data(datamodule, backbone="tabnet")
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

classmethod from_dicts(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_dict=None, val_dict=None, test_dict=None, predict_dict=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionDictInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶

Creates a TabularRegressionData object from the given dictionary.

Note

The categorical_fields, numerical_fields, and target_field do not need to be provided if parameters are passed instead. These can be obtained from the parameters attribute of the TabularData object that contains your training data.

The targets will be extracted from the target_field in the data frames. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

categorical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the dictionary containing categorical data.
numerical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the dictionary containing numerical data.
target_field¶ (Optional[str]) – The field (column name) in the dictionary containing the targets.
parameters¶ (Optional[Dict[str, Any]]) – Parameters to use if categorical_fields, numerical_fields, and target_field are not provided (e.g. when loading data for inference or validation).
train_dict¶ (Optional[Dict[str, List[Any]]]) – The dictionary to use when training.
val_dict¶ (Optional[Dict[str, List[Any]]]) – The dictionary to use when validating.
test_dict¶ (Optional[Dict[str, List[Any]]]) – The dictionary to use when testing.
predict_dict¶ (Optional[Dict[str, List[Any]]]) – The dictionary to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
data_module_kwargs¶ (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

TabularRegressionData

Returns

The constructed TabularRegressionData.

Examples

We have a dictionary train_data with the following contents:

{
    "age": [2, 4, 1],
    "animal": ["cat", "dog", "cat"],
    "weight": [6, 10, 5]
}

and a dictionary predict_data with the following contents:

{
    "animal": ["dog", "dog", "cat"],
    "weight": [7, 12, 5]
}

>>> from flash import Trainer
>>> from flash.tabular import TabularRegressor, TabularRegressionData
>>> datamodule = TabularRegressionData.from_dicts(
...     "animal",
...     "weight",
...     "age",
...     train_dict=train_data,
...     predict_dict=predict_data,
...     batch_size=4,
... )
>>> model = TabularRegressor.from_data(datamodule, backbone="tabnet")
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

classmethod from_lists(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_list=None, val_list=None, test_list=None, predict_list=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionListInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶

Creates a TabularRegressionData object from the given data (in the form of list of a tuple or a dictionary).

Note

The categorical_fields, numerical_fields, and target_field do not need to be provided if parameters are passed instead. These can be obtained from the parameters attribute of the TabularData object that contains your training data.

The targets will be extracted from the target_field in the data frames. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

categorical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the dictionary containing categorical data.
numerical_fields¶ (Union[str, List[str], None]) – The fields (column names) in the dictionary containing numerical data.
target_field¶ (Optional[str]) – The field (column name) in the dictionary containing the targets.
parameters¶ (Optional[Dict[str, Any]]) – Parameters to use if categorical_fields, numerical_fields, and target_field are not provided (e.g. when loading data for inference or validation).
train_list¶ (Optional[List[Union[tuple, dict]]]) – The data to use when training.
val_list¶ (Optional[List[Union[tuple, dict]]]) – The data to use when validating.
test_list¶ (Optional[List[Union[tuple, dict]]]) – The data to use when testing.
predict_list¶ (Optional[List[Union[tuple, dict]]]) – The data to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
data_module_kwargs¶ (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

TabularRegressionData

Returns

The constructed TabularRegressionData.

Examples

We have a list of dictionaries train_data with the following contents:

[
    {"age": 2, animal": "cat", "weight": 6},
    {"age": 4, animal": "dog", "weight": 10},
    {"age": 1, animal": "cat", "weight": 5},
]

and a list of dictionaries predict_data with the following contents:

[
    {"animal": "dog", "weight": 7},
    {"animal": "dog", "weight": 12},
    {"animal": "cat", "weight": 5},
]

>>> from flash import Trainer
>>> from flash.tabular import TabularRegressor, TabularRegressionData
>>> datamodule = TabularRegressionData.from_lists(
...     "animal",
...     "weight",
...     "age",
...     train_list=train_data,
...     predict_list=predict_data,
...     batch_size=4,
... )
>>> model = TabularRegressor.from_data(datamodule, backbone="tabnet")
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...