TabularRegressionData¶
- class flash.tabular.regression.data.TabularRegressionData(train_input=None, val_input=None, test_input=None, predict_input=None, data_fetcher=None, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, val_split=None, batch_size=None, num_workers=0, sampler=None, pin_memory=True, persistent_workers=False)[source]¶
The
TabularRegressionData
class is aDataModule
with a set of classmethods for loading data for tabular regression.- classmethod from_csv(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_file=None, val_file=None, test_file=None, predict_file=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionCSVInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶
Creates a
TabularRegressionData
object from the given CSV files.Note
The
categorical_fields
,numerical_fields
, andtarget_field
do not need to be provided ifparameters
are passed instead. These can be obtained from theparameters
attribute of theTabularData
object that contains your training data.The targets will be extracted from the
target_field
in the CSV files. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.- Parameters
categorical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the CSV files containing categorical data.numerical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the CSV files containing numerical data.target_field¶ (
Optional
[str
]) – The field (column name) in the CSV files containing the targets.parameters¶ (
Optional
[Dict
[str
,Any
]]) – Parameters to use ifcategorical_fields
,numerical_fields
, andtarget_field
are not provided (e.g. when loading data for inference or validation).train_file¶ (
Optional
[str
]) – The path to the CSV file to use when training.val_file¶ (
Optional
[str
]) – The path to the CSV file to use when validating.test_file¶ (
Optional
[str
]) – The path to the CSV file to use when testing.predict_file¶ (
Optional
[str
]) – The path to the CSV file to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TabularRegressionData
.
Examples
The files can be in Comma Separated Values (CSV) format with either a
.csv
or.txt
extension.We have a
train_data.csv
with the following contents:age,animal,weight 2,cat,6 4,dog,10 1,cat,5
and a
predict_data.csv
with the following contents:animal,weight dog,7 dog,12 cat,5
>>> from flash import Trainer >>> from flash.tabular import TabularRegressor, TabularRegressionData >>> datamodule = TabularRegressionData.from_csv( ... "animal", ... "weight", ... "age", ... train_file="train_data.csv", ... predict_file="predict_data.csv", ... batch_size=4, ... ) >>> model = TabularRegressor.from_data(datamodule, backbone="tabnet") >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
Alternatively, the files can be in Tab Separated Values (TSV) format with a
.tsv
extension.We have a
train_data.tsv
with the following contents:age animal weight 2 cat 6 4 dog 10 1 cat 5
and a
predict_data.tsv
with the following contents:animal weight dog 7 dog 12 cat 5
>>> from flash import Trainer >>> from flash.tabular import TabularRegressor, TabularRegressionData >>> datamodule = TabularRegressionData.from_csv( ... "animal", ... "weight", ... "age", ... train_file="train_data.tsv", ... predict_file="predict_data.tsv", ... batch_size=4, ... ) >>> model = TabularRegressor.from_data(datamodule, backbone="tabnet") >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
- classmethod from_data_frame(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_data_frame=None, val_data_frame=None, test_data_frame=None, predict_data_frame=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionDataFrameInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶
Creates a
TabularRegressionData
object from the given data frames.Note
The
categorical_fields
,numerical_fields
, andtarget_field
do not need to be provided ifparameters
are passed instead. These can be obtained from theparameters
attribute of theTabularData
object that contains your training data.The targets will be extracted from the
target_field
in the data frames. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.- Parameters
categorical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the data frames containing categorical data.numerical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the data frames containing numerical data.target_field¶ (
Optional
[str
]) – The field (column name) in the data frames containing the targets.parameters¶ (
Optional
[Dict
[str
,Any
]]) – Parameters to use ifcategorical_fields
,numerical_fields
, andtarget_field
are not provided (e.g. when loading data for inference or validation).train_data_frame¶ (
Optional
[DataFrame
]) – The DataFrame to use when training.val_data_frame¶ (
Optional
[DataFrame
]) – The DataFrame to use when validating.test_data_frame¶ (
Optional
[DataFrame
]) – The DataFrame to use when testing.predict_data_frame¶ (
Optional
[DataFrame
]) – The DataFrame to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TabularRegressionData
.
Examples
We have a DataFrame
train_data
with the following contents:>>> train_data.head(3) age animal weight 0 2 cat 6 1 4 dog 10 2 1 cat 5
and a DataFrame
predict_data
with the following contents:>>> predict_data.head(3) animal weight 0 dog 7 1 dog 12 2 cat 5
>>> from flash import Trainer >>> from flash.tabular import TabularRegressor, TabularRegressionData >>> datamodule = TabularRegressionData.from_data_frame( ... "animal", ... "weight", ... "age", ... train_data_frame=train_data, ... predict_data_frame=predict_data, ... batch_size=4, ... ) >>> model = TabularRegressor.from_data(datamodule, backbone="tabnet") >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
- classmethod from_dicts(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_dict=None, val_dict=None, test_dict=None, predict_dict=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionDictInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶
Creates a
TabularRegressionData
object from the given dictionary.Note
The
categorical_fields
,numerical_fields
, andtarget_field
do not need to be provided ifparameters
are passed instead. These can be obtained from theparameters
attribute of theTabularData
object that contains your training data.The targets will be extracted from the
target_field
in the data frames. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.- Parameters
categorical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the dictionary containing categorical data.numerical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the dictionary containing numerical data.target_field¶ (
Optional
[str
]) – The field (column name) in the dictionary containing the targets.parameters¶ (
Optional
[Dict
[str
,Any
]]) – Parameters to use ifcategorical_fields
,numerical_fields
, andtarget_field
are not provided (e.g. when loading data for inference or validation).train_dict¶ (
Optional
[Dict
[str
,List
[Any
]]]) – The dictionary to use when training.val_dict¶ (
Optional
[Dict
[str
,List
[Any
]]]) – The dictionary to use when validating.test_dict¶ (
Optional
[Dict
[str
,List
[Any
]]]) – The dictionary to use when testing.predict_dict¶ (
Optional
[Dict
[str
,List
[Any
]]]) – The dictionary to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TabularRegressionData
.
Examples
We have a dictionary
train_data
with the following contents:{ "age": [2, 4, 1], "animal": ["cat", "dog", "cat"], "weight": [6, 10, 5] }
and a dictionary
predict_data
with the following contents:{ "animal": ["dog", "dog", "cat"], "weight": [7, 12, 5] }
>>> from flash import Trainer >>> from flash.tabular import TabularRegressor, TabularRegressionData >>> datamodule = TabularRegressionData.from_dicts( ... "animal", ... "weight", ... "age", ... train_dict=train_data, ... predict_dict=predict_data, ... batch_size=4, ... ) >>> model = TabularRegressor.from_data(datamodule, backbone="tabnet") >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
- classmethod from_lists(categorical_fields=None, numerical_fields=None, target_field=None, parameters=None, train_list=None, val_list=None, test_list=None, predict_list=None, input_cls=<class 'flash.tabular.regression.input.TabularRegressionListInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶
Creates a
TabularRegressionData
object from the given data (in the form of list of a tuple or a dictionary).Note
The
categorical_fields
,numerical_fields
, andtarget_field
do not need to be provided ifparameters
are passed instead. These can be obtained from theparameters
attribute of theTabularData
object that contains your training data.The targets will be extracted from the
target_field
in the data frames. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.- Parameters
categorical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the dictionary containing categorical data.numerical_fields¶ (
Union
[str
,List
[str
],None
]) – The fields (column names) in the dictionary containing numerical data.target_field¶ (
Optional
[str
]) – The field (column name) in the dictionary containing the targets.parameters¶ (
Optional
[Dict
[str
,Any
]]) – Parameters to use ifcategorical_fields
,numerical_fields
, andtarget_field
are not provided (e.g. when loading data for inference or validation).train_list¶ (
Optional
[List
[Union
[tuple
,dict
]]]) – The data to use when training.val_list¶ (
Optional
[List
[Union
[tuple
,dict
]]]) – The data to use when validating.test_list¶ (
Optional
[List
[Union
[tuple
,dict
]]]) – The data to use when testing.predict_list¶ (
Optional
[List
[Union
[tuple
,dict
]]]) – The data to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TabularRegressionData
.
Examples
We have a list of dictionaries
train_data
with the following contents:[ {"age": 2, animal": "cat", "weight": 6}, {"age": 4, animal": "dog", "weight": 10}, {"age": 1, animal": "cat", "weight": 5}, ]
and a list of dictionaries
predict_data
with the following contents:[ {"animal": "dog", "weight": 7}, {"animal": "dog", "weight": 12}, {"animal": "cat", "weight": 5}, ]
>>> from flash import Trainer >>> from flash.tabular import TabularRegressor, TabularRegressionData >>> datamodule = TabularRegressionData.from_lists( ... "animal", ... "weight", ... "age", ... train_list=train_data, ... predict_list=predict_data, ... batch_size=4, ... ) >>> model = TabularRegressor.from_data(datamodule, backbone="tabnet") >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...