Skip to content
This repository has been archived by the owner on Nov 2, 2023. It is now read-only.

Design proposal for Fake Data generation #57

Open
nhuray opened this issue Mar 1, 2023 · 0 comments
Open

Design proposal for Fake Data generation #57

nhuray opened this issue Mar 1, 2023 · 0 comments

Comments

@nhuray
Copy link

nhuray commented Mar 1, 2023

Hi @JakobGM,

Thanks for that project reconciling Data modelling and validation using Pydantic and Data Transformation using Polars. I think your approach to reconcile those worlds is really interesting ! Kudos 💯

You started to implement APIs for generating examples (Fake data) based on the type defined in the Pydantic model. This is convenient but I think we might go further defining in the pt.Field an attribute example_factory we can pass to generate sample data based on a Factory (similar to the default_factory introduced in Pydantic)

With an example it's probably better to understand:

from typing import Literal, Optional
from mimesis import Generic
import patito as pt

Factory = Generic()


class Employee(pt.Model):
    first_name: str = pt.Field(example_factory=Factory.person.first_name)
    last_name: str = pt.Field(example_factory=Factory.person.last_name)
    age: int = pt.Field(example_factory=lambda: Factory.person.age(18, 65)) 

Here I'm using mimesis as the factory but we can use Faker as well.

Once we can define an example_factory per Field we might imagine to enrich the examples API to generate fake data in bulk:

# Create 50 employees (as a DataFrame)
df = Employee.examples(count=50)

# Create 50 employees (as a sequence of Dictionaries)
employees = Employee.examples(count=50).to_dicts()

Obviously, we should continue support the existing examples API (passing the data to populate the dataframe):

Employee.examples({"age": [25, 32, 49]})

Let me know what you think about that design proposal.

Nicolas

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant