ValidBboxesDataFrame#

class ethology.io.annotations.validate.ValidBboxesDataFrame(*args, **kwargs)[source]#

Bases: DataFrameModel

Class for valid bounding boxes intermediate dataframes.

We use this dataframe internally as an intermediate step in the process of converting an input bounding box annotations file (VIA or COCO) to an ethology dataset. The validation checks all required columns exist and their types are correct.

image_filename#

Name of the image file.

Type:

str

image_id#

Unique identifier for each of the images.

Type:

int

image_width#

Width of each of the images, in the same units as the input file (usually pixels).

Type:

int

image_height#

Height of each of the images, in the same units as the input file (usually pixels).

Type:

int

x_min#

Minimum x-coordinate of the bounding box, in the same units as the input file.

Type:

float

y_min#

Minimum y-coordinate of the bounding box, in the same units as the input file.

Type:

float

width#

Width of the bounding box, in the same units as the input file.

Type:

float

height#

Height of the bounding box, in the same units as the input file.

Type:

float

category_id#

Unique identifier for the category, as specified in the input file. A value of 0 is usually reserved for the background class.

Type:

int

category#

Category of the annotation as a string.

Type:

str

supercategory#

Supercategory of the annotation as a string.

Type:

str

Raises:

pa.errors.SchemaError – If the input dataframe does not match the schema.

Methods

build_schema_(**kwargs)

empty(*_args)

Create an empty DataFrame with the schema of this model.

example(cls, **kwargs)

Generate an example of a particular size.

get_empty_values()

Get the default empty values for selected dataframe columns.

get_metadata()

Provide metadata for columns and schema level

pydantic_validate(schema_model)

Verify that the input is a compatible dataframe model.

strategy(cls, **kwargs)

Create a hypothesis strategy for generating a DataFrame.

to_json_schema()

Serialize schema metadata into json-schema format.

to_schema()

Create DataFrameSchema from the DataFrameModel.

to_yaml([stream])

Convert Schema to yaml using io.to_yaml.

validate(check_obj[, head, tail, sample, ...])

Validate a DataFrame based on the schema specification.

classmethod empty(*_args)#

Create an empty DataFrame with the schema of this model.

Return type:

DataFrame[Self]

classmethod example(cls, **kwargs)#

Generate an example of a particular size.

Parameters:

size – number of elements in the generated DataFrame.

Return type:

DataFrameBase[TypeVar(TDataFrameModel, bound= DataFrameModel)]

Returns:

DataFrame object.

static get_empty_values()[source]#

Get the default empty values for selected dataframe columns.

The columns are those that can be undefined in VIA and COCO files: category, supercategory, category_id, image_width and image_height.

Returns:

A dictionary with the default empty values the specified columns.

Return type:

dict

classmethod get_metadata()#

Provide metadata for columns and schema level

Return type:

Optional[dict]

classmethod pydantic_validate(schema_model)#

Verify that the input is a compatible dataframe model.

Return type:

DataFrameModel

classmethod strategy(cls, **kwargs)#

Create a hypothesis strategy for generating a DataFrame.

Parameters:
  • size – number of elements to generate

  • n_regex_columns – number of regex columns to generate.

Returns:

a strategy that generates DataFrame objects.

classmethod to_json_schema()#

Serialize schema metadata into json-schema format.

Parameters:

dataframe_schema – schema to write to json-schema format.

Note

This function is currently does not fully specify a pandera schema, and is primarily used internally to render OpenAPI docs via the FastAPI integration.

classmethod to_schema()#

Create DataFrameSchema from the DataFrameModel.

Return type:

TypeVar(TSchema, bound= BaseSchema)

classmethod to_yaml(stream=None)#

Convert Schema to yaml using io.to_yaml.

classmethod validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)#

Validate a DataFrame based on the schema specification.

Parameters:
  • check_obj (pd.DataFrame) – the dataframe to be validated.

  • head (Optional[int]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (Optional[int]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (Optional[int]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (Optional[int]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type:

DataFrame[Self]

Returns:

validated DataFrame

Raises:

SchemaError – when DataFrame violates built-in or custom checks.