TranslationData¶
- class flash.text.seq2seq.translation.data.TranslationData(train_input=None, val_input=None, test_input=None, predict_input=None, data_fetcher=None, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, val_split=None, batch_size=None, num_workers=0, sampler=None, pin_memory=True, persistent_workers=False)[source]¶
The
TranslationData
class is aDataModule
with a set of classmethods for loading data for text translation.- classmethod from_csv(input_field, target_field=None, train_file=None, val_file=None, test_file=None, predict_file=None, input_cls=<class 'flash.text.seq2seq.core.input.Seq2SeqCSVInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶
Load the
TranslationData
from CSV files containing input text snippets and their corresponding target text snippets.Input text snippets will be extracted from the
input_field
column in the CSV files. Target text snippets will be extracted from thetarget_field
column in the CSV files. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.- Parameters
input_field¶ (
str
) – The field (column name) in the CSV files containing the input text snippets.target_field¶ (
Optional
[str
]) – The field (column name) in the CSV files containing the target text snippets.train_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The CSV file to use when training.val_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The CSV file to use when validating.test_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The CSV file to use when testing.predict_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The CSV file to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TranslationData
.
Examples
The files can be in Comma Separated Values (CSV) format with either a
.csv
or.txt
extension.The file
train_data.csv
contains the following:pig latin,english ayay entencesay inyay igpay atinlay,a sentence in pig latin ellohay orldway,hello world
The file
predict_data.csv
contains the following:pig latin ayay entencesay orfay edictionpray
>>> from flash import Trainer >>> from flash.text import TranslationTask, TranslationData >>> datamodule = TranslationData.from_csv( ... "pig latin", ... "english", ... train_file="train_data.csv", ... predict_file="predict_data.csv", ... batch_size=2, ... ) >>> model = TranslationTask() >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
Alternatively, the files can be in Tab Separated Values (TSV) format with a
.tsv
extension.The file
train_data.tsv
contains the following:pig latin english ayay entencesay inyay igpay atinlay a sentence in pig latin ellohay orldway hello world
The file
predict_data.tsv
contains the following:pig latin ayay entencesay orfay edictionpray
>>> from flash import Trainer >>> from flash.text import TranslationTask, TranslationData >>> datamodule = TranslationData.from_csv( ... "pig latin", ... "english", ... train_file="train_data.tsv", ... predict_file="predict_data.tsv", ... batch_size=2, ... ) >>> model = TranslationTask() >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
- classmethod from_hf_datasets(input_field, target_field=None, train_hf_dataset=None, val_hf_dataset=None, test_hf_dataset=None, predict_hf_dataset=None, input_cls=<class 'flash.text.seq2seq.core.input.Seq2SeqInputBase'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶
Load the
TranslationData
from Hugging FaceDataset
objects containing input text snippets and their corresponding target text snippets.Input text snippets will be extracted from the
input_field
column in theDataset
objects. Target text snippets will be extracted from thetarget_field
column in theDataset
objects. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.- Parameters
input_field¶ (
str
) – The field (column name) in theDataset
objects containing the input text snippets.target_field¶ (
Optional
[str
]) – The field (column name) in theDataset
objects containing the target text snippets.train_hf_dataset¶ (
Optional
[object
]) – TheDataset
to use when training.val_hf_dataset¶ (
Optional
[object
]) – TheDataset
to use when validating.test_hf_dataset¶ (
Optional
[object
]) – TheDataset
to use when testing.predict_hf_dataset¶ (
Optional
[object
]) – TheDataset
to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TranslationData
.
Examples
>>> from datasets import Dataset >>> from flash import Trainer >>> from flash.text import TranslationTask, TranslationData >>> train_data = Dataset.from_dict( ... { ... "pig latin": ["ayay entencesay inyay igpay atinlay", "ellohay orldway"], ... "english": ["a sentence in pig latin", "hello world"], ... } ... ) >>> predict_data = Dataset.from_dict( ... { ... "pig latin": ["ayay entencesay orfay edictionpray"], ... } ... ) >>> datamodule = TranslationData.from_hf_datasets( ... "pig latin", ... "english", ... train_hf_dataset=train_data, ... predict_hf_dataset=predict_data, ... batch_size=2, ... ) >>> model = TranslationTask() >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
- classmethod from_json(input_field, target_field=None, train_file=None, val_file=None, test_file=None, predict_file=None, input_cls=<class 'flash.text.seq2seq.core.input.Seq2SeqJSONInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, field=None, **data_module_kwargs)[source]¶
Load the
TranslationData
from JSON files containing input text snippets and their corresponding target text snippets.Input text snippets will be extracted from the
input_field
column in the JSON files. Target text snippets will be extracted from thetarget_field
column in the JSON files. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.- Parameters
input_field¶ (
str
) – The field (column name) in the JSON objects containing the input text snippets.target_field¶ (
Optional
[str
]) – The field (column name) in the JSON objects containing the target text snippets.train_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The JSON file to use when training.val_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The JSON file to use when validating.test_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The JSON file to use when testing.predict_file¶ (
Union
[str
,bytes
,PathLike
,None
]) – The JSON file to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.field¶ (
Optional
[str
]) – The field that holds the data in the JSON file.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TranslationData
.
Examples
The file
train_data.json
contains the following:{"pig latin":"ayay entencesay inyay igpay atinlay","english":"a sentence in pig latin"} {"pig latin":"ellohay orldway","english":"hello world"}
The file
predict_data.json
contains the following:{"pig latin":"ayay entencesay orfay edictionpray"}
>>> from flash import Trainer >>> from flash.text import TranslationTask, TranslationData >>> datamodule = TranslationData.from_json( ... "pig latin", ... "english", ... train_file="train_data.json", ... predict_file="predict_data.json", ... batch_size=2, ... ) Downloading... >>> model = TranslationTask() >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
- classmethod from_lists(train_data=None, train_targets=None, val_data=None, val_targets=None, test_data=None, test_targets=None, predict_data=None, input_cls=<class 'flash.text.seq2seq.core.input.Seq2SeqListInput'>, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]¶
Load the
TranslationData
from lists of input text snippets and corresponding lists of target text snippets.To learn how to customize the transforms applied for each stage, read our customizing transforms guide.
- Parameters
train_data¶ (
Optional
[List
[str
]]) – The list of input text snippets to use when training.train_targets¶ (
Optional
[List
[str
]]) – The list of target text snippets to use when training.val_data¶ (
Optional
[List
[str
]]) – The list of input text snippets to use when validating.val_targets¶ (
Optional
[List
[str
]]) – The list of target text snippets to use when validating.test_data¶ (
Optional
[List
[str
]]) – The list of input text snippets to use when testing.test_targets¶ (
Optional
[List
[str
]]) – The list of target text snippets to use when testing.predict_data¶ (
Optional
[List
[str
]]) – The list of input text snippets to use when predicting.input_cls¶ (
Type
[Input
]) – TheInput
type to use for loading the data.transform¶ (
TypeVar
(INPUT_TRANSFORM_TYPE
,Type
[flash.core.data.io.input_transform.InputTransform],Callable
,Tuple
[Union
[StrEnum
,str
],Dict
[str
,Any
]],Union
[StrEnum
,str
],None
)) – TheInputTransform
type to use.transform_kwargs¶ (
Optional
[Dict
]) – Dict of keyword arguments to be provided when instantiating the transforms.data_module_kwargs¶ (
Any
) – Additional keyword arguments to provide to theDataModule
constructor.
- Return type
- Returns
The constructed
TranslationData
.
Examples
>>> from flash import Trainer >>> from flash.text import TranslationTask, TranslationData >>> datamodule = TranslationData.from_lists( ... train_data=["ayay entencesay inyay igpay atinlay", "ellohay orldway"], ... train_targets=["a sentence in pig latin", "hello world"], ... predict_data=["ayay entencesay orfay edictionpray"], ... batch_size=2, ... ) >>> model = TranslationTask() >>> trainer = Trainer(fast_dev_run=True) >>> trainer.fit(model, datamodule=datamodule) Training... >>> trainer.predict(model, datamodule=datamodule) Predicting...
- input_transform_cls¶