Shortcuts

VideoClassificationData

class flash.video.classification.data.VideoClassificationData(train_input=None, val_input=None, test_input=None, predict_input=None, data_fetcher=None, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, val_split=None, batch_size=None, num_workers=0, sampler=None, pin_memory=True, persistent_workers=False)[source]

The VideoClassificationData class is a DataModule with a set of classmethods for loading data for video classification.

classmethod from_csv(input_field, target_fields=None, train_file=None, train_videos_root=None, train_resolver=None, val_file=None, val_videos_root=None, val_resolver=None, test_file=None, test_videos_root=None, test_resolver=None, predict_file=None, predict_videos_root=None, predict_resolver=None, target_formatter=None, clip_sampler='random', clip_duration=2, clip_sampler_kwargs=None, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, decode_audio=False, decoder='pyav', input_cls=<class 'flash.video.classification.input.VideoClassificationCSVInput'>, predict_input_cls=<class 'flash.video.classification.input.VideoClassificationCSVPredictInput'>, transform=<function VideoClassificationInputTransform>, transform_kwargs=None, **data_module_kwargs)[source]

Load the VideoClassificationData from CSV files containing video file paths and their corresponding targets.

Input video file paths will be extracted from the input_field column in the CSV files. The supported file extensions are: .mp4, and .avi. The targets will be extracted from the target_fields in the CSV files and can be in any of our supported classification target formats. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
Return type

VideoClassificationData

Returns

The constructed VideoClassificationData.

Examples

The files can be in Comma Separated Values (CSV) format with either a .csv or .txt extension.

The file train_data.csv contains the following:

videos,targets
video_1.mp4,cat
video_2.mp4,dog
video_3.mp4,cat

The file predict_data.csv contains the following:

videos
predict_video_1.mp4
predict_video_2.mp4
predict_video_3.mp4
>>> from flash import Trainer
>>> from flash.video import VideoClassifier, VideoClassificationData
>>> datamodule = VideoClassificationData.from_csv(
...     "videos",
...     "targets",
...     train_file="train_data.csv",
...     train_videos_root="train_folder",
...     predict_file="predict_data.csv",
...     predict_videos_root="predict_folder",
...     transform_kwargs=dict(image_size=(244, 244)),
...     batch_size=2,
... )
>>> datamodule.num_classes
2
>>> datamodule.labels
['cat', 'dog']
>>> model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...

Alternatively, the files can be in Tab Separated Values (TSV) format with a .tsv extension.

The file train_data.tsv contains the following:

videos      targets
video_1.mp4 cat
video_2.mp4 dog
video_3.mp4 cat

The file predict_data.tsv contains the following:

videos
predict_video_1.mp4
predict_video_2.mp4
predict_video_3.mp4
>>> from flash import Trainer
>>> from flash.video import VideoClassifier, VideoClassificationData
>>> datamodule = VideoClassificationData.from_csv(
...     "videos",
...     "targets",
...     train_file="train_data.tsv",
...     train_videos_root="train_folder",
...     predict_file="predict_data.tsv",
...     predict_videos_root="predict_folder",
...     transform_kwargs=dict(image_size=(244, 244)),
...     batch_size=2,
... )
>>> datamodule.num_classes
2
>>> datamodule.labels
['cat', 'dog']
>>> model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_data_frame(input_field, target_fields=None, train_data_frame=None, train_videos_root=None, train_resolver=None, val_data_frame=None, val_videos_root=None, val_resolver=None, test_data_frame=None, test_videos_root=None, test_resolver=None, predict_data_frame=None, predict_videos_root=None, predict_resolver=None, clip_sampler='random', clip_duration=2, clip_sampler_kwargs=None, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, decode_audio=False, decoder='pyav', input_cls=<class 'flash.video.classification.input.VideoClassificationDataFrameInput'>, predict_input_cls=<class 'flash.video.classification.input.VideoClassificationDataFramePredictInput'>, target_formatter=None, transform=<function VideoClassificationInputTransform>, transform_kwargs=None, **data_module_kwargs)[source]

Load the VideoClassificationData from pandas DataFrame objects containing video file paths and their corresponding targets.

Input video file paths will be extracted from the input_field in the DataFrame. The supported file extensions are: .mp4, and .avi. The targets will be extracted from the target_fields in the DataFrame and can be in any of our supported classification target formats. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
  • input_field (str) – The field (column name) in the DataFrames containing the video file paths.

  • target_fields (Union[str, Sequence[str], None]) – The field (column name) or list of fields in the DataFrames containing the targets.

  • train_data_frame (Optional[DataFrame]) – The DataFrame to use when training.

  • train_videos_root (Optional[str]) – The root directory containing train videos.

  • train_resolver (Optional[Callable[[str, str], str]]) – Optionally provide a function which converts an entry from the input_field into a video file path.

  • val_data_frame (Optional[DataFrame]) – The DataFrame to use when validating.

  • val_videos_root (Optional[str]) – The root directory containing validation videos.

  • val_resolver (Optional[Callable[[str, str], str]]) – Optionally provide a function which converts an entry from the input_field into a video file path.

  • test_data_frame (Optional[DataFrame]) – The DataFrame to use when testing.

  • test_videos_root (Optional[str]) – The root directory containing test videos.

  • test_resolver (Optional[Callable[[str, str], str]]) – Optionally provide a function which converts an entry from the input_field into a video file path.

  • predict_data_frame (Optional[DataFrame]) – The DataFrame to use when predicting.

  • predict_videos_root (Optional[str]) – The root directory containing predict videos.

  • predict_resolver (Optional[Callable[[str, str], str]]) – Optionally provide a function which converts an entry from the input_field into a video file path.

  • target_formatter (Optional[TargetFormatter]) – Optionally provide a TargetFormatter to control how targets are handled. See Formatting Classification Targets for more details.

  • clip_sampler (Optional[str]) – The clip sampler to use. One of: "uniform", "random", "constant_clips_per_video".

  • clip_duration (float) – The duration of clips to sample.

  • clip_sampler_kwargs (Optional[Dict[str, Any]]) – Additional keyword arguments to use when constructing the clip sampler.

  • video_sampler (Type[Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.

  • decode_audio (bool) – If True, also decode audio from video.

  • decoder (str) – The decoder to use to decode videos. One of: "pyav", "torchvision". Not used for frame videos.

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • predict_input_cls (Type[Input]) – The Input type to use for loading the prediction data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

VideoClassificationData

Returns

The constructed VideoClassificationData.

Examples

>>> from pandas import DataFrame
>>> from flash import Trainer
>>> from flash.video import VideoClassifier, VideoClassificationData
>>> train_data_frame = DataFrame.from_dict(
...     {
...         "videos": ["video_1.mp4", "video_2.mp4", "video_3.mp4"],
...         "targets": ["cat", "dog", "cat"],
...     }
... )
>>> predict_data_frame = DataFrame.from_dict(
...     {
...         "videos": ["predict_video_1.mp4", "predict_video_2.mp4", "predict_video_3.mp4"],
...     }
... )
>>> datamodule = VideoClassificationData.from_data_frame(
...     "videos",
...     "targets",
...     train_data_frame=train_data_frame,
...     train_videos_root="train_folder",
...     predict_data_frame=predict_data_frame,
...     predict_videos_root="predict_folder",
...     transform_kwargs=dict(image_size=(244, 244)),
...     batch_size=2,
... )
>>> datamodule.num_classes
2
>>> datamodule.labels
['cat', 'dog']
>>> model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_fiftyone(cls, train_dataset=None, val_dataset=None, test_dataset=None, predict_dataset=None, target_formatter=None, clip_sampler='random', clip_duration=2, clip_sampler_kwargs=None, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, decode_audio=False, decoder='pyav', label_field='ground_truth', input_cls=<class 'flash.video.classification.input.VideoClassificationFiftyOneInput'>, predict_input_cls=<class 'flash.video.classification.input.VideoClassificationFiftyOnePredictInput'>, transform=<function VideoClassificationInputTransform>, transform_kwargs=None, **data_module_kwargs)[source]

Load the VideoClassificationData from FiftyOne SampleCollection objects.

The supported file extensions are: .mp4, and .avi. The targets will be extracted from the label_field in the SampleCollection objects and can be in any of our supported classification target formats. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
  • train_dataset (None) – The SampleCollection to use when training.

  • val_dataset (None) – The SampleCollection to use when validating.

  • test_dataset (None) – The SampleCollection to use when testing.

  • predict_dataset (None) – The SampleCollection to use when predicting.

  • label_field (str) – The field in the SampleCollection objects containing the targets.

  • target_formatter (Optional[TargetFormatter]) – Optionally provide a TargetFormatter to control how targets are handled. See Formatting Classification Targets for more details.

  • clip_sampler (Optional[str]) – The clip sampler to use. One of: "uniform", "random", "constant_clips_per_video".

  • clip_duration (float) – The duration of clips to sample.

  • clip_sampler_kwargs (Optional[Dict[str, Any]]) – Additional keyword arguments to use when constructing the clip sampler.

  • video_sampler (Type[Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.

  • decode_audio (bool) – If True, also decode audio from video.

  • decoder (str) – The decoder to use to decode videos. One of: "pyav", "torchvision". Not used for frame videos.

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • predict_input_cls (Type[Input]) – The Input type to use for loading the prediction data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs – Additional keyword arguments to provide to the DataModule constructor.

Return type

VideoClassificationData

Returns

The constructed VideoClassificationData.

Examples

>>> import fiftyone as fo
>>> from flash import Trainer
>>> from flash.video import VideoClassifier, VideoClassificationData
>>> train_dataset = fo.Dataset.from_videos(
...     ["video_1.mp4", "video_2.mp4", "video_3.mp4"]
... )  

...
>>> samples = [train_dataset[filepath] for filepath in train_dataset.values("filepath")]
>>> for sample, label in zip(samples, ["cat", "dog", "cat"]):
...     sample["ground_truth"] = fo.Classification(label=label)
...     sample.save()
...
>>> predict_dataset = fo.Dataset.from_images(
...     ["predict_video_1.mp4", "predict_video_2.mp4", "predict_video_3.mp4"]
... )  

...
>>> datamodule = VideoClassificationData.from_fiftyone(
...     train_dataset=train_dataset,
...     predict_dataset=predict_dataset,
...     transform_kwargs=dict(image_size=(244, 244)),
...     batch_size=2,
... )
>>> datamodule.num_classes
2
>>> datamodule.labels
['cat', 'dog']
>>> model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_files(train_files=None, train_targets=None, val_files=None, val_targets=None, test_files=None, test_targets=None, predict_files=None, target_formatter=None, clip_sampler='random', clip_duration=2, clip_sampler_kwargs=None, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, decode_audio=False, decoder='pyav', input_cls=<class 'flash.video.classification.input.VideoClassificationFilesInput'>, predict_input_cls=<class 'flash.video.classification.input.VideoClassificationPathsPredictInput'>, transform=<function VideoClassificationInputTransform>, transform_kwargs=None, **data_module_kwargs)[source]

Load the VideoClassificationData from lists of files and corresponding lists of targets.

The supported file extensions are: .mp4, and .avi. The targets can be in any of our supported classification target formats. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
Return type

VideoClassificationData

Returns

The constructed VideoClassificationData.

Examples

>>> from flash import Trainer
>>> from flash.video import VideoClassifier, VideoClassificationData
>>> datamodule = VideoClassificationData.from_files(
...     train_files=["video_1.mp4", "video_2.mp4", "video_3.mp4"],
...     train_targets=["cat", "dog", "cat"],
...     predict_files=["predict_video_1.mp4", "predict_video_2.mp4", "predict_video_3.mp4"],
...     transform_kwargs=dict(image_size=(244, 244)),
...     batch_size=2,
... )
>>> datamodule.num_classes
2
>>> datamodule.labels
['cat', 'dog']
>>> model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_folders(train_folder=None, val_folder=None, test_folder=None, predict_folder=None, target_formatter=None, clip_sampler='random', clip_duration=2, clip_sampler_kwargs=None, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, decode_audio=False, decoder='pyav', input_cls=<class 'flash.video.classification.input.VideoClassificationFoldersInput'>, predict_input_cls=<class 'flash.video.classification.input.VideoClassificationPathsPredictInput'>, transform=<function VideoClassificationInputTransform>, transform_kwargs=None, **data_module_kwargs)[source]

Load the VideoClassificationData from folders containing videos.

The supported file extensions are: .mp4, and .avi. For train, test, and validation data, the folders are expected to contain a sub-folder for each class. Here’s the required structure:

train_folder
├── cat
│   ├── video_1.mp4
│   ├── video_3.mp4
│   ...
└── dog
    ├── video_2.mp4
    ...

For prediction, the folder is expected to contain the files for inference, like this:

predict_folder
├── predict_video_1.mp4
├── predict_video_2.mp4
├── predict_video_3.mp4
...

To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
  • train_folder (Optional[str]) – The folder containing videos to use when training.

  • val_folder (Optional[str]) – The folder containing videos to use when validating.

  • test_folder (Optional[str]) – The folder containing videos to use when testing.

  • predict_folder (Optional[str]) – The folder containing videos to use when predicting.

  • target_formatter (Optional[TargetFormatter]) – Optionally provide a TargetFormatter to control how targets are handled. See Formatting Classification Targets for more details.

  • clip_sampler (Optional[str]) – The clip sampler to use. One of: "uniform", "random", "constant_clips_per_video".

  • clip_duration (float) – The duration of clips to sample.

  • clip_sampler_kwargs (Optional[Dict[str, Any]]) – Additional keyword arguments to use when constructing the clip sampler.

  • video_sampler (Type[Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.

  • decode_audio (bool) – If True, also decode audio from video.

  • decoder (str) – The decoder to use to decode videos. One of: "pyav", "torchvision". Not used for frame videos.

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • predict_input_cls (Type[Input]) – The Input type to use for loading the prediction data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs – Additional keyword arguments to provide to the DataModule constructor.

Return type

VideoClassificationData

Returns

The constructed VideoClassificationData.

Examples

>>> from flash import Trainer
>>> from flash.video import VideoClassifier, VideoClassificationData
>>> datamodule = VideoClassificationData.from_folders(
...     train_folder="train_folder",
...     predict_folder="predict_folder",
...     transform_kwargs=dict(image_size=(244, 244)),
...     batch_size=2,
... )
>>> datamodule.num_classes
2
>>> datamodule.labels
['cat', 'dog']
>>> model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_labelstudio(export_json=None, train_export_json=None, val_export_json=None, test_export_json=None, predict_export_json=None, data_folder=None, train_data_folder=None, val_data_folder=None, test_data_folder=None, predict_data_folder=None, val_split=None, multi_label=False, clip_sampler='random', clip_duration=2, clip_sampler_kwargs=None, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, decode_audio=False, decoder='pyav', input_cls=<class 'flash.core.integrations.labelstudio.input.LabelStudioVideoClassificationInput'>, transform=<function VideoClassificationInputTransform>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a DataModule object from the given export file and data directory using the Input of name FOLDERS from the passed or constructed InputTransform.

Parameters
  • export_json (Optional[str]) – path to label studio export file

  • train_export_json (Optional[str]) – path to label studio export file for train set. (overrides export_json if specified)

  • val_export_json (Optional[str]) – path to label studio export file for validation

  • test_export_json (Optional[str]) – path to label studio export file for test

  • predict_export_json (Optional[str]) – path to label studio export file for predict

  • data_folder (Optional[str]) – path to label studio data folder

  • train_data_folder (Optional[str]) – path to label studio data folder for train data set. (overrides data_folder if specified)

  • val_data_folder (Optional[str]) – path to label studio data folder for validation data

  • test_data_folder (Optional[str]) – path to label studio data folder for test data

  • predict_data_folder (Optional[str]) – path to label studio data folder for predict data

  • val_split (Optional[float]) – The val_split argument to pass to the DataModule.

  • multi_label (Optional[bool]) – Whether the label are multi encoded.

  • clip_sampler (Optional[str]) – Defines how clips should be sampled from each video.

  • clip_duration (float) – Defines how long the sampled clips should be for each video.

  • clip_sampler_kwargs (Optional[Dict[str, Any]]) – Additional keyword arguments to use when constructing the clip sampler.

  • video_sampler (Type[Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.

  • decode_audio (bool) – If True, also decode audio from video.

  • decoder (str) – Defines what type of decoder used to decode a video.

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs – Additional keyword arguments to use when constructing the datamodule.

Return type

VideoClassificationData

Returns

The constructed data module.

Examples:

data_module = DataModule.from_labelstudio(
    export_json='project.json',
    data_folder='label-studio/media/upload',
    val_split=0.8,
)
classmethod from_tensors(train_data=None, train_targets=None, val_data=None, val_targets=None, test_data=None, test_targets=None, predict_data=None, target_formatter=None, video_sampler=<class 'torch.utils.data.sampler.SequentialSampler'>, input_cls=<class 'flash.video.classification.input.VideoClassificationTensorsInput'>, predict_input_cls=<class 'flash.video.classification.input.VideoClassificationTensorsPredictInput'>, transform=<function VideoClassificationInputTransform>, transform_kwargs=None, **data_module_kwargs)[source]

Load the VideoClassificationData from a dictionary containing PyTorch tensors representing input video frames and their corresponding targets.

Input tensor(s) will be extracted from the input_field in the dict. The targets will be extracted from the target_fields in the dict and can be in any of our supported classification target formats.

To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
Return type

VideoClassificationData

Returns

The constructed VideoClassificationData.

Examples

>>> import torch
>>> from flash import Trainer
>>> from flash.video import VideoClassifier, VideoClassificationData
>>> frame = torch.randint(low=0, high=255, size=(3, 5, 10, 10), dtype=torch.uint8, device="cpu")
>>> datamodule = VideoClassificationData.from_tensors(
...     train_data=[frame, frame, frame],
...     train_targets=["fruit", "vegetable", "fruit"],
...     val_data=[frame, frame],
...     val_targets=["vegetable", "fruit"],
...     predict_data=[frame],
...     batch_size=1,
... )
>>> datamodule.num_classes
2
>>> datamodule.labels
['fruit', 'vegetable']
>>> model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
input_transform_cls(image_size=244, temporal_sub_sample=8, mean=tensor([0.4500, 0.4500, 0.4500]), std=tensor([0.2250, 0.2250, 0.2250]), data_format='BCTHW', same_on_frame=False) = <function VideoClassificationInputTransform>
Read the Docs v: latest
Versions
latest
stable
0.8.2
0.8.1.post0
0.8.1
0.8.0
0.7.5
0.7.4
0.7.3
0.7.2
0.7.1
0.7.0
0.6.0
0.5.2
0.5.1
0.5.0
0.4.0
0.3.2
0.3.1
0.3.0
0.2.3
0.2.2
0.2.1
0.2.0
0.1.0post1
Downloads
html
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.