ValidBboxesDataFrame#
- class ethology.io.annotations.validate.ValidBboxesDataFrame(*args, **kwargs)[source]#
Bases:
DataFrameModel
Class for valid bounding boxes intermediate dataframes.
We use this dataframe internally as an intermediate step in the process of converting an input bounding box annotations file (VIA or COCO) to an
ethology
dataset. The validation checks all required columns exist and their types are correct.- image_width#
Width of each of the images, in the same units as the input file (usually pixels).
- Type:
- image_height#
Height of each of the images, in the same units as the input file (usually pixels).
- Type:
- category_id#
Unique identifier for the category, as specified in the input file. A value of 0 is usually reserved for the background class.
- Type:
- Raises:
pa.errors.SchemaError – If the input dataframe does not match the schema.
Methods
build_schema_
(**kwargs)empty
(*_args)Create an empty DataFrame with the schema of this model.
example
(cls, **kwargs)Generate an example of a particular size.
Get the default empty values for selected dataframe columns.
Provide metadata for columns and schema level
pydantic_validate
(schema_model)Verify that the input is a compatible dataframe model.
strategy
(cls, **kwargs)Create a
hypothesis
strategy for generating a DataFrame.Serialize schema metadata into json-schema format.
Create
DataFrameSchema
from theDataFrameModel
.to_yaml
([stream])Convert Schema to yaml using io.to_yaml.
validate
(check_obj[, head, tail, sample, ...])Validate a DataFrame based on the schema specification.
- classmethod empty(*_args)#
Create an empty DataFrame with the schema of this model.
- Return type:
DataFrame
[Self
]
- classmethod example(cls, **kwargs)#
Generate an example of a particular size.
- Parameters:
size – number of elements in the generated DataFrame.
- Return type:
DataFrameBase
[TypeVar
(TDataFrameModel
, bound= DataFrameModel)]- Returns:
DataFrame object.
- static get_empty_values()[source]#
Get the default empty values for selected dataframe columns.
The columns are those that can be undefined in VIA and COCO files:
category
,supercategory
,category_id
,image_width
andimage_height
.- Returns:
A dictionary with the default empty values the specified columns.
- Return type:
- classmethod get_metadata()#
Provide metadata for columns and schema level
- classmethod pydantic_validate(schema_model)#
Verify that the input is a compatible dataframe model.
- Return type:
- classmethod strategy(cls, **kwargs)#
Create a
hypothesis
strategy for generating a DataFrame.- Parameters:
size – number of elements to generate
n_regex_columns – number of regex columns to generate.
- Returns:
a strategy that generates DataFrame objects.
- classmethod to_json_schema()#
Serialize schema metadata into json-schema format.
- Parameters:
dataframe_schema – schema to write to json-schema format.
Note
This function is currently does not fully specify a pandera schema, and is primarily used internally to render OpenAPI docs via the FastAPI integration.
- classmethod to_schema()#
Create
DataFrameSchema
from theDataFrameModel
.- Return type:
TypeVar
(TSchema
, bound=BaseSchema
)
- classmethod to_yaml(stream=None)#
Convert Schema to yaml using io.to_yaml.
- classmethod validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)#
Validate a DataFrame based on the schema specification.
- Parameters:
check_obj (pd.DataFrame) – the dataframe to be validated.
head (
Optional
[int
]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.tail (
Optional
[int
]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.sample (
Optional
[int
]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.random_state (
Optional
[int
]) – random seed for thesample
argument.lazy (
bool
) – if True, lazily evaluates dataframe against all validation checks and raises aSchemaErrors
. Otherwise, raiseSchemaError
as soon as one occurs.inplace (
bool
) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Return type:
DataFrame
[Self
]- Returns:
validated
DataFrame
- Raises:
SchemaError – when
DataFrame
violates built-in or custom checks.