QuestionAnsweringData¶

class flash.text.question_answering.data.QuestionAnsweringData(train_input=None, val_input=None, test_input=None, predict_input=None, data_fetcher=None, val_split=None, batch_size=None, num_workers=0, sampler=None, pin_memory=True, persistent_workers=False)[source]¶

The QuestionAnsweringData class is a DataModule with a set of classmethods for loading data for extractive question answering.

classmethod from_csv(train_file=None, val_file=None, test_file=None, predict_file=None, train_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, val_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, test_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, predict_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, input_cls=<class 'flash.text.question_answering.input.QuestionAnsweringCSVInput'>, transform_kwargs=None, question_column_name='question', context_column_name='context', answer_column_name='answer', **data_module_kwargs)[source]¶

Load the QuestionAnsweringData from CSV files containing questions, contexts and their corresponding answers.

Question snippets will be extracted from the question_column_name column in the CSV files. Context snippets will be extracted from the context_column_name column in the CSV files. Answer snippets will be extracted from the answer_column_name column in the CSV files. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

train_file¶ (Union[str, bytes, PathLike, None]) – The CSV file containing the training data.
val_file¶ (Union[str, bytes, PathLike, None]) – The CSV file containing the validation data.
test_file¶ (Union[str, bytes, PathLike, None]) – The CSV file containing the testing data.
predict_file¶ (Union[str, bytes, PathLike, None]) – The CSV file containing the data to use when predicting.
train_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when training.
val_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when validating.
test_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when testing.
predict_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
question_column_name¶ (str) – The key in the JSON file to recognize the question field.
context_column_name¶ (str) – The key in the JSON file to recognize the context field.
answer_column_name¶ (str) – The key in the JSON file to recognize the answer field.

Return type

QuestionAnsweringData

Returns

The constructed QuestionAnsweringData.

Examples

The file train_data.csv contains the following:

id,context,question,answer_text,answer_start
12345,this is an answer one. this is a context one,this is a question one,this is an answer one,0
12346,this is an answer two. this is a context two,this is a question two,this is an answer two,0
12347,this is an answer three. this is a context three,this is a question three,this is an answer three,0
12348,this is an answer four. this is a context four,this is a question four,this is an answer four,0

The file predict_data.csv contains the following:

id,context,question
12349,this is an answer five. this is a context five,this is a question five
12350,this is an answer six. this is a context six,this is a question six

>>> from flash import Trainer
>>> from flash.text import QuestionAnsweringData, QuestionAnsweringTask
>>> datamodule = QuestionAnsweringData.from_csv(
...     train_file="train_data.csv",
...     predict_file="predict_data.csv",
...     batch_size=2,
... )  
Downloading...
>>> model = QuestionAnsweringTask(max_source_length=32, max_target_length=32)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

classmethod from_dicts(train_data=None, val_data=None, test_data=None, predict_data=None, train_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, val_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, test_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, predict_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, input_cls=<class 'flash.text.question_answering.input.QuestionAnsweringDictionaryInput'>, transform_kwargs=None, question_column_name='question', context_column_name='context', answer_column_name='answer', **data_module_kwargs)[source]¶

Load the QuestionAnsweringData from Python dictionary objects containing questions, contexts and their corresponding answers.

Question snippets will be extracted from the question_column_name field in the dictionaries. Context snippets will be extracted from the context_column_name field in the dictionaries. Answer snippets will be extracted from the answer_column_name field in the dictionaries. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

train_data¶ (Optional[Dict[str, Any]]) – The dictionary containing the training data.
val_data¶ (Optional[Dict[str, Any]]) – The dictionary containing the validation data.
test_data¶ (Optional[Dict[str, Any]]) – The dictionary containing the testing data.
predict_data¶ (Optional[Dict[str, Any]]) – The dictionary containing the data to use when predicting.
train_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when training.
val_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when validating.
test_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when testing.
predict_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
question_column_name¶ (str) – The key in the JSON file to recognize the question field.
context_column_name¶ (str) – The key in the JSON file to recognize the context field.
answer_column_name¶ (str) – The key in the JSON file to recognize the answer field.

Return type

QuestionAnsweringData

Returns

The constructed QuestionAnsweringData.

Examples

>>> from flash import Trainer
>>> from flash.text import QuestionAnsweringData, QuestionAnsweringTask
>>> train_data = {
...     "id": ["12345", "12346", "12347", "12348"],
...     "context": [
...         "this is an answer one. this is a context one",
...         "this is an answer two. this is a context two",
...         "this is an answer three. this is a context three",
...         "this is an answer four. this is a context four",
...     ],
...     "question": [
...         "this is a question one",
...         "this is a question two",
...         "this is a question three",
...         "this is a question four",
...     ],
...     "answer_text": [
...         "this is an answer one",
...         "this is an answer two",
...         "this is an answer three",
...         "this is an answer four",
...     ],
...     "answer_start": [0, 0, 0, 0],
... }
>>> predict_data = {
...     "id": ["12349", "12350"],
...     "context": [
...         "this is an answer five. this is a context five",
...         "this is an answer six. this is a context six",
...     ],
...     "question": [
...         "this is a question five",
...         "this is a question six",
...     ],
... }
>>> datamodule = QuestionAnsweringData.from_dicts(
...     train_data=train_data,
...     predict_data=predict_data,
...     batch_size=2,
... )  
>>> model = QuestionAnsweringTask(max_source_length=32, max_target_length=32)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

classmethod from_json(train_file=None, val_file=None, test_file=None, predict_file=None, train_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, val_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, test_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, predict_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, input_cls=<class 'flash.text.question_answering.input.QuestionAnsweringJSONInput'>, transform_kwargs=None, field=None, question_column_name='question', context_column_name='context', answer_column_name='answer', **data_module_kwargs)[source]¶

Load the QuestionAnsweringData from JSON files containing questions, contexts and their corresponding answers.

Question snippets will be extracted from the question_column_name column in the JSON files. Context snippets will be extracted from the context_column_name column in the JSON files. Answer snippets will be extracted from the answer_column_name column in the JSON files. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

train_file¶ (Union[str, bytes, PathLike, None]) – The JSON file containing the training data.
val_file¶ (Union[str, bytes, PathLike, None]) – The JSON file containing the validation data.
test_file¶ (Union[str, bytes, PathLike, None]) – The JSON file containing the testing data.
predict_file¶ (Union[str, bytes, PathLike, None]) – The JSON file containing the data to use when predicting.
train_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when training.
val_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when validating.
test_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when testing.
predict_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
field¶ (Optional[str]) – The field that holds the data in the JSON file.
question_column_name¶ (str) – The key in the JSON file to recognize the question field.
context_column_name¶ (str) – The key in the JSON file to recognize the context field.
answer_column_name¶ (str) – The key in the JSON file to recognize the answer field.

Return type

QuestionAnsweringData

Returns

The constructed QuestionAnsweringData.

Examples

The file train_data.json contains the following:

{"id":"12345","context":"this is an answer one. this is a context one","question":"this is a question one",
"answer_text":"this is an answer one","answer_start":0}
{"id":"12346","context":"this is an answer two. this is a context two","question":"this is a question two",
"answer_text":"this is an answer two","answer_start":0}
{"id":"12347","context":"this is an answer three. this is a context three","question":"this is a question
 three","answer_text":"this is an answer three","answer_start":0}
{"id":"12348","context":"this is an answer four. this is a context four","question":"this is a question
 four","answer_text":"this is an answer four","answer_start":0}

The file predict_data.json contains the following:

{"id":"12349","context":"this is an answer five. this is a context five","question":"this is a question
 five"}
{"id":"12350","context":"this is an answer six. this is a context six","question":"this is a question
 six"}

>>> from flash import Trainer
>>> from flash.text import QuestionAnsweringData, QuestionAnsweringTask
>>> datamodule = QuestionAnsweringData.from_json(
...     train_file="train_data.json",
...     predict_file="predict_data.json",
...     batch_size=2,
... )  
Downloading...
>>> model = QuestionAnsweringTask(max_source_length=32, max_target_length=32)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

classmethod from_squad_v2(train_file=None, val_file=None, test_file=None, predict_file=None, train_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, val_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, test_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, predict_transform=<class 'flash.core.data.io.input_transform.InputTransform'>, input_cls=<class 'flash.text.question_answering.input.QuestionAnsweringSQuADInput'>, transform_kwargs=None, question_column_name='question', context_column_name='context', answer_column_name='answer', **data_module_kwargs)[source]¶

Load the QuestionAnsweringData from JSON files containing questions, contexts and their corresponding answers in the SQuAD2.0 format.

Question snippets will be extracted from the question_column_name column in the JSON files. Context snippets will be extracted from the context_column_name column in the JSON files. Answer snippets will be extracted from the answer_column_name column in the JSON files. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters

train_file¶ (Optional[str]) – The JSON file containing the training data.
val_file¶ (Optional[str]) – The JSON file containing the validation data.
test_file¶ (Optional[str]) – The JSON file containing the testing data.
predict_file¶ (Optional[str]) – The JSON file containing the predict data.
train_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when training.
val_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when validating.
test_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when testing.
predict_transform¶ (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[LightningEnum, str], Dict[str, Any]], Union[LightningEnum, str], None)) – The InputTransform type to use when predicting.
input_cls¶ (Type[Input]) – The Input type to use for loading the data.
transform_kwargs¶ (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.
question_column_name¶ (str) – The key in the JSON file to recognize the question field.
context_column_name¶ (str) – The key in the JSON file to recognize the context field.
answer_column_name¶ (str) – The key in the JSON file to recognize the answer field.

Return type

QuestionAnsweringData

Returns

The constructed data module.

Examples

The file train_data.json contains the following:

{
    "version": "v2.0",
    "data": [
        {
            "title": "ExampleSet1",
            "paragraphs": [
                {
                    "qas": [
                        {
                            "question": "this is a question one",
                            "id": "12345",
                            "answers": [{"text": "this is an answer one", "answer_start": 0}],
                            "is_impossible": false
                        }
                    ],
                    "context": "this is an answer one. this is a context one"
                }, {
                    "qas": [
                        {
                            "question": "this is a question two",
                            "id": "12346",
                            "answers": [{"text": "this is an answer two", "answer_start": 0}],
                            "is_impossible": false
                        }
                    ],
                    "context": "this is an answer two. this is a context two"
                }
            ]
        }, {
            "title": "ExampleSet2",
            "paragraphs": [
                {
                    "qas": [
                        {
                            "question": "this is a question three",
                            "id": "12347",
                            "answers": [{"text": "this is an answer three", "answer_start": 0}],
                            "is_impossible": false
                        }
                    ],
                    "context": "this is an answer three. this is a context three"
                }, {
                    "qas": [
                        {
                            "question": "this is a question four",
                            "id": "12348",
                            "answers": [{"text": "this is an answer four", "answer_start": 0}],
                            "is_impossible": false
                        }
                    ],
                    "context": "this is an answer four. this is a context four"
                }
            ]
        }
    ]
}

The file predict_data.json contains the following:

{
    "version": "v2.0",
    "data": [
        {
            "title": "ExampleSet3",
            "paragraphs": [
                {
                    "qas": [
                        {
                            "question": "this is a question five",
                            "id": "12349",
                            "is_impossible": false
                        }
                    ],
                    "context": "this is an answer five. this is a context five"
                }, {
                    "qas": [
                        {
                            "question": "this is a question six",
                            "id": "12350",
                            "is_impossible": false
                        }
                    ],
                    "context": "this is an answer six. this is a context six"
                }
            ]
        }
    ]
}

>>> from flash import Trainer
>>> from flash.text import QuestionAnsweringData, QuestionAnsweringTask
>>> datamodule = QuestionAnsweringData.from_squad_v2(
...     train_file="train_data.json",
...     predict_file="predict_data.json",
...     batch_size=2,
... )  
>>> model = QuestionAnsweringTask(max_source_length=32, max_target_length=32)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

input_transform_cls¶: alias of flash.core.data.io.input_transform.InputTransform