2

I would like to create a typed DataFrame from a Pydantic BaseModel class, let's call it MyModel that has Optional fields. As I create multiple instances of MyModel, some will have Optional fields with None values, and if I initialize a DataFrame with such rows, they will may have inconsistent column dtypes. I'd like thus to cast Optional[TypeX] to TypeX, e.g.:

import pydantic
import pandas as pd
import numpy as np
from typing import Optional

class MyModel(pydantic.BaseModel):
   thisfield: int
   thatfield: Optional[str]
   ...

col_types = {kk: ff.annotation for kk, ff in MyModel.model_fields.items()}


pd.DataFrame(np.empty(0, dtype=[tuple(tt) for tt in col_types.items()]))

This fails with TypeError: Cannot interpret 'typing.Optional[str]' as a data type.

I need a function or method of Optional[X] -> X. Any suggestions other than using repr with regex?

2

1 Answer 1

1

As long as Optional[X] is equivalent to Union[X, None]:

from typing import Union, get_args, get_origin

def get_optional_arg(typ: type) -> type | None:
    # make sure typ is really Optional[...], otherwise return None
    if get_origin(typ) is Union:
        args = get_args(typ)
        if len(args) == 2 and args[1] is type(None):
            return args[0]

col_types = {
    k: get_optional_arg(f.annotation) or f.annotation
    for k, f in MyModel.model_fields.items()
}
Sign up to request clarification or add additional context in comments.

2 Comments

Optional[X] has always been equivalent to Union[X, None]. Python 3.10 simply introduced type-level | to allow either to be written as X | None.
@chepner agreed, fixed

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.