Shortcuts

ObjectDetectionData

class flash.image.detection.data.ObjectDetectionData(train_input=None, val_input=None, test_input=None, predict_input=None, data_fetcher=None, transform=<class 'flash.core.data.io.input_transform.InputTransform'>, transform_kwargs=None, val_split=None, batch_size=None, num_workers=0, sampler=None, pin_memory=True, persistent_workers=False)[source]

The ObjectDetectionData class is a DataModule with a set of classmethods for loading data for object detection.

classmethod from_coco(train_folder=None, train_ann_file=None, val_folder=None, val_ann_file=None, test_folder=None, test_ann_file=None, predict_folder=None, transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, input_cls=<class 'flash.core.integrations.icevision.data.IceVisionInput'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given data folders and annotation files in the COCO JSON format.

For help understanding and using the COCO format, take a look at this tutorial: Create COCO annotations from scratch.

To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
  • train_folder (Optional[str]) – The folder containing images to use when training.

  • train_ann_file (Optional[str]) – The COCO format annotation file to use when training.

  • val_folder (Optional[str]) – The folder containing images to use when validating.

  • val_ann_file (Optional[str]) – The COCO format annotation file to use when validating.

  • test_folder (Optional[str]) – The folder containing images to use when testing.

  • test_ann_file (Optional[str]) – The COCO format annotation file to use when testing.

  • predict_folder (Optional[str]) – The folder containing images to use when predicting.

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

The folder train_folder has the following contents:

train_folder
├── image_1.png
├── image_2.png
├── image_3.png
...

The file train_annotations.json contains the following:

{
    "annotations": [
        {"area": 50, "bbox": [10, 20, 5, 10], "category_id": 1, "id": 1, "image_id": 1, "iscrowd": 0},
        {"area": 100, "bbox": [20, 30, 10, 10], "category_id": 2, "id": 2, "image_id": 2, "iscrowd": 0},
        {"area": 125, "bbox": [10, 20, 5, 25], "category_id": 1, "id": 3, "image_id": 3, "iscrowd": 0}
    ], "categories": [
        {"id": 1, "name": "cat", "supercategory": "cat"},
        {"id": 2, "name": "dog", "supercategory": "dog"}
    ], "images": [
        {"file_name": "image_1.png", "height": 64, "width": 64, "id": 1},
        {"file_name": "image_2.png", "height": 64, "width": 64, "id": 2},
        {"file_name": "image_3.png", "height": 64, "width": 64, "id": 3}
    ]
}
>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> datamodule = ObjectDetectionData.from_coco(
...     train_folder="train_folder",
...     train_ann_file="train_annotations.json",
...     predict_folder="predict_folder",
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_fiftyone(cls, train_dataset=None, val_dataset=None, test_dataset=None, predict_dataset=None, label_field='ground_truth', iscrowd='iscrowd', transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, input_cls=<class 'flash.image.detection.input.ObjectDetectionFiftyOneInput'>, transform_kwargs=None, **data_module_kwargs)[source]

Load the ObjectDetectionData from FiftyOne SampleCollection objects.

Targets will be extracted from the label_field in the SampleCollection objects. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
  • train_dataset (None) – The SampleCollection to use when training.

  • val_dataset (None) – The SampleCollection to use when validating.

  • test_dataset (None) – The SampleCollection to use when testing.

  • predict_dataset (None) – The SampleCollection to use when predicting.

  • label_field (str) – The field in the SampleCollection objects containing the targets.

  • iscrowd (str) – The field in the SampleCollection objects containing the iscrowd annotation (if required).

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

>>> import numpy as np
>>> import fiftyone as fo
>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> train_dataset = fo.Dataset.from_images(
...     ["image_1.png", "image_2.png", "image_3.png"]
... )  

...
>>> samples = [train_dataset[filepath] for filepath in train_dataset.values("filepath")]
>>> for sample, label, bounding_box in zip(
...     samples,
...     ["cat", "dog", "cat"],
...     [[0.1, 0.2, 0.15, 0.3], [0.2, 0.3, 0.3, 0.4], [0.1, 0.2, 0.15, 0.45]],
... ):
...     sample["ground_truth"] = fo.Detections(
...         detections=[fo.Detection(label=label, bounding_box=bounding_box)],
...     )
...     sample.save()
...
>>> predict_dataset = fo.Dataset.from_images(
...     ["predict_image_1.png", "predict_image_2.png", "predict_image_3.png"]
... )  

...
>>> datamodule = ObjectDetectionData.from_fiftyone(
...     train_dataset=train_dataset,
...     predict_dataset=predict_dataset,
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )  

...
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_files(train_files=None, train_targets=None, train_bboxes=None, val_files=None, val_targets=None, val_bboxes=None, test_files=None, test_targets=None, test_bboxes=None, predict_files=None, target_formatter=None, input_cls=<class 'flash.image.detection.input.ObjectDetectionFilesInput'>, transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given data list of image files, bounding boxes, and targets.

The supported file extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp, and .npy. The targets can be in any of our supported classification target formats. The bounding boxes are expected to be dictionaries with integer values (representing pixels) and the following keys: xmin, ymin, width, height. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> datamodule = ObjectDetectionData.from_files(
...     train_files=["image_1.png", "image_2.png", "image_3.png"],
...     train_targets=[["cat"], ["dog"], ["cat"]],
...     train_bboxes=[
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 10}],
...         [{"xmin": 20, "ymin": 30, "width": 10, "height": 10}],
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 25}],
...     ],
...     predict_files=["predict_image_1.png", "predict_image_2.png", "predict_image_3.png"],
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(labels=datamodule.labels)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_folders(predict_folder=None, predict_transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, input_cls=<class 'flash.core.integrations.icevision.data.IceVisionInput'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given data folders This is currently support only for the predicting stage.

Parameters
  • predict_folder (Optional[str]) – The folder containing the predict data.

  • predict_transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The dictionary of transforms to use during predicting which maps

  • data_module_kwargs (Any) – The keywords arguments for creating the datamodule.

Return type

DataModule

Returns

The constructed data module.

classmethod from_images(train_images=None, train_targets=None, train_bboxes=None, val_images=None, val_targets=None, val_bboxes=None, test_images=None, test_targets=None, test_bboxes=None, predict_images=None, target_formatter=None, input_cls=<class 'flash.image.detection.input.ObjectDetectionImageInput'>, transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given lists of PIL images and corresponding lists of bounding boxes and targets.

The targets can be in any of our supported classification target formats. The bounding boxes are expected to be dictionaries with integer values (representing pixels) and the following keys: xmin, ymin, width, height. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

>>> from PIL import Image
>>> import numpy as np
>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> datamodule = ObjectDetectionData.from_images(
...     train_images=[
...         Image.fromarray(np.random.randint(0, 255, (64, 64, 3), dtype="uint8")),
...         Image.fromarray(np.random.randint(0, 255, (64, 64, 3), dtype="uint8")),
...         Image.fromarray(np.random.randint(0, 255, (64, 64, 3), dtype="uint8")),
...     ],
...     train_targets=[["cat"], ["dog"], ["cat"]],
...     train_bboxes=[
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 10}],
...         [{"xmin": 20, "ymin": 30, "width": 10, "height": 10}],
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 25}],
...     ],
...     predict_images=[Image.fromarray(np.random.randint(0, 255, (64, 64, 3), dtype="uint8"))],
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(labels=datamodule.labels)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_numpy(train_data=None, train_targets=None, train_bboxes=None, val_data=None, val_targets=None, val_bboxes=None, test_data=None, test_targets=None, test_bboxes=None, predict_data=None, target_formatter=None, input_cls=<class 'flash.image.detection.input.ObjectDetectionNumpyInput'>, transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given from numpy arrays (or lists of arrays) and corresponding lists of bounding boxes and targets.

The targets can be in any of our supported classification target formats. The bounding boxes are expected to be dictionaries with integer values (representing pixels) and the following keys: xmin, ymin, width, height. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

>>> import numpy as np
>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> datamodule = ObjectDetectionData.from_numpy(
...     train_data=[np.random.rand(3, 64, 64), np.random.rand(3, 64, 64), np.random.rand(3, 64, 64)],
...     train_targets=[["cat"], ["dog"], ["cat"]],
...     train_bboxes=[
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 10}],
...         [{"xmin": 20, "ymin": 30, "width": 10, "height": 10}],
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 25}],
...     ],
...     predict_data=[np.random.rand(3, 64, 64)],
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(labels=datamodule.labels)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_tensors(train_data=None, train_targets=None, train_bboxes=None, val_data=None, val_targets=None, val_bboxes=None, test_data=None, test_targets=None, test_bboxes=None, predict_data=None, target_formatter=None, input_cls=<class 'flash.image.detection.input.ObjectDetectionTensorInput'>, transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given from torch tensors (or lists of tensors) and corresponding lists of bounding boxes and targets.

The targets can be in any of our supported classification target formats. The bounding boxes are expected to be dictionaries with integer values (representing pixels) and the following keys: xmin, ymin, width, height. To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

>>> import torch
>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> datamodule = ObjectDetectionData.from_tensors(
...     train_data=[torch.rand(3, 64, 64), torch.rand(3, 64, 64), torch.rand(3, 64, 64)],
...     train_targets=[["cat"], ["dog"], ["cat"]],
...     train_bboxes=[
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 10}],
...         [{"xmin": 20, "ymin": 30, "width": 10, "height": 10}],
...         [{"xmin": 10, "ymin": 20, "width": 5, "height": 25}],
...     ],
...     predict_data=[torch.rand(3, 64, 64)],
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(labels=datamodule.labels)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_via(labels, label_field='label', train_folder=None, train_ann_file=None, val_folder=None, val_ann_file=None, test_folder=None, test_ann_file=None, predict_folder=None, transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, input_cls=<class 'flash.core.integrations.icevision.data.IceVisionInput'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given data folders and annotation files in the VIA (VGG Image Annotator) JSON format.

To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
  • labels (List[str]) – A list of class labels. Not that the list should not include a label for the background class which will be added automatically as class zero (additional labels will be sorted).

  • label_field (str) – The field within region_attributes which corresponds to the region label.

  • train_folder (Optional[str]) – The folder containing images to use when training.

  • train_ann_file (Optional[str]) – The VIA format annotation file to use when training.

  • val_folder (Optional[str]) – The folder containing images to use when validating.

  • val_ann_file (Optional[str]) – The VIA format annotation file to use when validating.

  • test_folder (Optional[str]) – The folder containing images to use when testing.

  • test_ann_file (Optional[str]) – The VIA format annotation file to use when testing.

  • predict_folder (Optional[str]) – The folder containing images to use when predicting.

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

The folder train_folder has the following contents:

train_folder
├── image_1.png
├── image_2.png
├── image_3.png
...

The file train_annotations.json contains the following:

{
    "image_1.png": {
        "filename": "image_1.png",
        "regions": [{
            "shape_attributes": {"name": "rect", "x": 10, "y": 20, "width": 5, "height": 10},
            "region_attributes": {"label": "cat"}
        }]
    }, "image_2.png": {
        "filename": "image_2.png",
        "regions": [{
            "shape_attributes": {"name": "rect", "x": 20, "y": 30, "width": 10, "height": 10},
            "region_attributes": {"label": "dog"}}
    ]}, "image_3.png": {
        "filename": "image_3.png",
        "regions": [{
            "shape_attributes": {"name": "rect", "x": 10, "y": 20, "width": 5, "height": 25},
            "region_attributes": {"label": "cat"}
        }]
    }
}
>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> datamodule = ObjectDetectionData.from_via(
...     ["cat", "dog"],
...     train_folder="train_folder",
...     train_ann_file="train_annotations.json",
...     predict_folder="predict_folder",
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
classmethod from_voc(labels, train_folder=None, train_ann_folder=None, val_folder=None, val_ann_folder=None, test_folder=None, test_ann_folder=None, predict_folder=None, transform=<class 'flash.core.integrations.icevision.transforms.IceVisionInputTransform'>, input_cls=<class 'flash.core.integrations.icevision.data.IceVisionInput'>, transform_kwargs=None, **data_module_kwargs)[source]

Creates a ObjectDetectionData object from the given data folders and annotation files in the PASCAL VOC (Visual Object Challenge) XML format.

To learn how to customize the transforms applied for each stage, read our customizing transforms guide.

Parameters
  • labels (List[str]) – A list of class labels. Note that the list should not include a label for the background class which will be added automatically as class zero (additional labels will be sorted).

  • train_folder (Optional[str]) – The folder containing images to use when training.

  • train_ann_folder (Optional[str]) – The folder containing VOC format annotation files to use when training.

  • val_folder (Optional[str]) – The folder containing images to use when validating.

  • val_ann_folder (Optional[str]) – The folder containing VOC format annotation files to use when validating.

  • test_folder (Optional[str]) – The folder containing images to use when testing.

  • test_ann_folder (Optional[str]) – The folder containing VOC format annotation files to use when testing.

  • predict_folder (Optional[str]) – The folder containing images to use when predicting.

  • input_cls (Type[Input]) – The Input type to use for loading the data.

  • transform (TypeVar(INPUT_TRANSFORM_TYPE, Type[flash.core.data.io.input_transform.InputTransform], Callable, Tuple[Union[StrEnum, str], Dict[str, Any]], Union[StrEnum, str], None)) – The InputTransform type to use.

  • transform_kwargs (Optional[Dict]) – Dict of keyword arguments to be provided when instantiating the transforms.

  • data_module_kwargs (Any) – Additional keyword arguments to provide to the DataModule constructor.

Return type

ObjectDetectionData

Returns

The constructed ObjectDetectionData.

Examples

The folder train_folder has the following contents:

train_folder
├── image_1.png
├── image_2.png
├── image_3.png
...

The folder train_annotations has the following contents:

train_annotations
├── image_1.xml
├── image_2.xml
├── image_3.xml
...

The file image_1.xml contains the following:

<annotation>
    <filename>image_0.png</filename>
    <path>image_0.png</path>
    <source><database>example</database></source>
    <size><width>64</width><height>64</height><depth>3</depth></size>
    <object>
        <name>cat</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <occluded>0</occluded>
        <bndbox><xmin>10</xmin><xmax>15</xmax><ymin>20</ymin><ymax>30</ymax></bndbox>
    </object>
</annotation>
>>> from flash import Trainer
>>> from flash.image import ObjectDetector, ObjectDetectionData
>>> datamodule = ObjectDetectionData.from_voc(
...     ["cat", "dog"],
...     train_folder="train_folder",
...     train_ann_folder="train_annotations",
...     predict_folder="predict_folder",
...     transform_kwargs=dict(image_size=(128, 128)),
...     batch_size=2,
... )
>>> datamodule.num_classes
3
>>> datamodule.labels
['background', 'cat', 'dog']
>>> model = ObjectDetector(num_classes=datamodule.num_classes)
>>> trainer = Trainer(fast_dev_run=True)
>>> trainer.fit(model, datamodule=datamodule)  
Training...
>>> trainer.predict(model, datamodule=datamodule)  
Predicting...
input_transform_cls

alias of flash.core.integrations.icevision.transforms.IceVisionInputTransform

Read the Docs v: latest
Versions
latest
stable
0.8.2
0.8.1.post0
0.8.1
0.8.0
0.7.5
0.7.4
0.7.3
0.7.2
0.7.1
0.7.0
0.6.0
0.5.2
0.5.1
0.5.0
0.4.0
0.3.2
0.3.1
0.3.0
0.2.3
0.2.2
0.2.1
0.2.0
0.1.0post1
Downloads
html
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.