Question Answering¶
The Task¶
Question Answering is the task of being able to answer questions pertaining to some known context. For example, given a context about some historical figure, any question pertaininig to the context should be answerable. In our case the article would be our input context and question, and the answer would be the output sequence from the model.
Note
We currently only support Extractive Question Answering, like the task performed using the SQUAD like datasets.
Example¶
Let’s look at an example.
We’ll use the SQUAD 2.0 dataset, which contains train-v2.0.json
and dev-v2.0.json
.
Each JSON file looks like this:
{
"answers": {
"answer_start": [94, 87, 94, 94],
"text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"]
},
"context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...",
"id": "56ddde6b9a695914005b9629",
"question": "When were the Normans in Normandy?",
"title": "Normans"
}
...
In the above, the context
key represents the context used for the question and answer, the question
key represents the question being asked with respect to the context, the answer
key stores the answer(s) for the question.
id
and title
are used for unique identification and grouping concepts together respectively.
Once we’ve downloaded the data using download_data()
, we create the QuestionAnsweringData
.
We select a pre-trained backbone to use for our QuestionAnsweringTask
and finetune on the SQUAD 2.0 data.
The backbone can be any Question Answering model from HuggingFace/transformers.
Note
When changing the backbone, make sure you pass in the same backbone to the QuestionAnsweringData
and the QuestionAnsweringTask
!
Next, we use the trained QuestionAnsweringTask
for inference.
Finally, we save the model.
Here’s the full example:
from flash import Trainer
from flash.core.data.utils import download_data
from flash.core.utilities.imports import example_requires
from flash.text import QuestionAnsweringData, QuestionAnsweringTask
example_requires("text")
import nltk # noqa: E402
nltk.download("punkt")
# 1. Create the DataModule
download_data("https://pl-flash-data.s3.amazonaws.com/squad_tiny.zip", "./data/")
datamodule = QuestionAnsweringData.from_squad_v2(
train_file="./data/squad_tiny/train.json",
val_file="./data/squad_tiny/val.json",
batch_size=1,
)
# 2. Build the task
model = QuestionAnsweringTask(backbone="distilbert-base-uncased")
# 3. Create the trainer and finetune the model
trainer = Trainer(max_epochs=3)
trainer.finetune(model, datamodule=datamodule)
# 4. Answer some Questions!
datamodule = QuestionAnsweringData.from_dicts(
predict_data={
"id": ["56ddde6b9a695914005b9629", "56ddde6b9a695914005b9628"],
"context": [
"""
The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th
and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse
("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under
their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations
of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their
descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct
cultural and ethnic identity of the Normans emerged initially in the first half of the 10th
century, and it continued to evolve over the succeeding centuries.
""",
"""
The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th
and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse
("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under
their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations
of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their
descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct
cultural and ethnic identity of the Normans emerged initially in the first half of the 10th
century, and it continued to evolve over the succeeding centuries.
""",
],
"question": ["When were the Normans in Normandy?", "In what country is Normandy located?"],
},
batch_size=4,
)
predictions = trainer.predict(model, datamodule=datamodule)
print(predictions)
# 5. Save the model!
trainer.save_checkpoint("question_answering_on_sqaud_v2.pt")
To learn how to view the available backbones / heads for this task, see Backbones and Heads.
Accelerate Training & Inference with Torch ORT¶
Torch ORT converts your model into an optimized ONNX graph, speeding up training & inference when using NVIDIA or AMD GPUs. Enabling Torch ORT requires a single flag passed to the QuestionAnsweringTask
once installed. See installation instructions here.
Note
Not all Transformer models are supported. See this table for supported models + branches containing fixes for certain models.
...
model = QuestionAnsweringTask(backbone="distilbert-base-uncased", max_answer_length=30, enable_ort=True)