Text Embedder¶

The Task¶

This task consists of creating a Sentence Embedding. That is, a vector of sentence representations which can be used for a downstream task. The TextEmbedder implementation relies on components from sentence-transformers.

Example¶

Let’s look at an example of generating sentence embeddings.

We start by loading some sentences for prediction with the TextClassificationData class. Next, we create our TextEmbedder with a pretrained backbone from the HuggingFace hub. Finally, we create a Trainer and generate sentence embeddings. Here’s the full example:

import torch

import flash
from flash.text import TextClassificationData, TextEmbedder

# 1. Create the DataModule
datamodule = TextClassificationData.from_lists(
    predict_data=[
        "Turgid dialogue, feeble characterization - Harvey Keitel a judge?.",
        "The worst movie in the history of cinema.",
        "I come from Bulgaria where it 's almost impossible to have a tornado.",
    ],
    batch_size=4,
)

# 2. Load a previously trained TextEmbedder
model = TextEmbedder(backbone="sentence-transformers/all-MiniLM-L6-v2")

# 3. Generate embeddings for the first 3 graphs
trainer = flash.Trainer(gpus=torch.cuda.device_count())
predictions = trainer.predict(model, datamodule=datamodule)
print(predictions)

To learn how to view the available backbones / heads for this task, see Backbones and Heads.