Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columns converted to str if there is one Category column and coerce = True #1806

Open
2 of 3 tasks
antonioalegria opened this issue Sep 9, 2024 · 1 comment
Open
2 of 3 tasks
Labels
bug Something isn't working

Comments

@antonioalegria
Copy link

Describe the bug
A clear and concise description of what the bug is.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import datetime
from pandera.polars import Field # type: ignore
from pandera.polars import DataFrameModel # type: ignore
from pandera.engines.polars_engine import Category
from typing import Optional

import polars as pl


class MyModel(DataFrameModel):
    a: datetime.datetime = Field(description="some description")
    b: Category = Field(description="some description", dtype_kwargs={"categories": ["a", "b", "c"]})

df = pl.DataFrame({"a": [datetime.datetime.now(), datetime.datetime.now()], "b": ["a", "b"]})
schema = MyModel.to_schema()
schema.strict = True
schema.coerce = True # -> this is what causes the conversion of `a` to string
print(schema.validate(df)) # BOOM!

Exception:
pandera.errors.SchemaError: expected column 'a' to have type Datetime(time_unit='us', time_zone=None), got String

When checking the df in debug mode, we see that it's after the coerce of column b that column a becomes converted to str.

Expected behavior

The dataframe should've been validated and no types changed in this case. Column b should be coerced to pl.Category.

Desktop (please complete the following information):

OS: macOS 14.6.1
Python 3.12.4
polars-lts-cpu 1.6.0
pandera 0.20.3

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

@antonioalegria antonioalegria added the bug Something isn't working label Sep 9, 2024
@antonioalegria
Copy link
Author

Workaround: use pl.Category in the schema and the isin checker.

Ideally we would be able to use Literal['a', 'b', 'c'] as the column type in the model, but that doesn't seem to be supported (another error is thrown when converting the model to schema).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant