diff --git a/python-serialize/README.md b/python-serialize/README.md new file mode 100644 index 0000000000..beffbe6bca --- /dev/null +++ b/python-serialize/README.md @@ -0,0 +1,295 @@ +# Serialize Your Data With Python + +This folder contains the sample code for the tutorial [Serialize Your Data With Python](https://realpython.com/python-serialize-data/) published on Real Python. + +## Table of Contents + +- [Setup](#setup) +- [Usage](#usage) + - [Python Objects](##python-objects) + - [Standard Python](#standard-python) + - [Customize Pickle](#customize-pickle) + - [JSON Encode](#json-encode) + - [Foreign Formats](#foreign-formats) + - [Executable Code](#executable-code) + - [Pickle-Importable Code](#pickle-importable-code) + - [Code Objects](#code-objects) + - [Digital Signature](#digital-signature) + - [HTTP Payload](#http-payload) + - [Flask](#flask) + - [Django REST Framework](#django-rest-framework) + - [FastAPI](#fastapi) + - [Pydantic](#pydantic) + - [Hierarchical Data](#hierarchical-data) + - [XML, YAML, JSON, BSON](#xml-yaml-json-bson) + - [Tabular Data](#tabular-data) + - [CSV](#csv) + - [Apache Parquet](#apache-parquet) + - [Schema-Based Formats](#schema-based-formats) + - [Apache Avro](#apache-avro) + - [Protocol Buffers (Protobuf)](#protocol-buffers-protobuf) + +## Setup + +Create and activate a new virtual environment: + +```shell +$ python3 -m venv venv/ +$ source venv/bin/activate +``` + +Install the required third-party dependencies: + +```shell +(venv) $ python -m pip install -r requirements.txt +``` + +## Usage + +### Python Objects + +#### Standard Python + +```shell +(venv) $ cd python-objects/standard-python/ +(venv) $ python pickle_demo.py +(venv) $ python marshal_demo.py +(venv) $ python shelve_demo.py +(venv) $ python dbm_demo.py +``` + +#### Customize Pickle + +```shell +(venv) $ cd python-objects/customize-pickle/ +(venv) $ python main.py +``` + +#### JSON Encode + +```shell +(venv) $ cd python-objects/json-encode/ +(venv) $ python main.py +``` + +#### Foreign Formats + +jsonpickle and PyYAML: + +```shell +(venv) $ cd python-objects/foreign-formats/ +(venv) $ python jsonpickle_demo.py +(venv) $ python pyyaml_demo.py +``` + +### Executable Code + +#### Pickle-Importable Code + +```shell +(venv) $ cd executable-code/pickle-importable/ +(venv) $ python main.py +``` + +#### Code Objects + +```shell +(venv) $ cd executable-code/code-objects/ +(venv) $ python dill_demo.py +``` + +#### Digital Signature + +```shell +(venv) $ cd executable-code/digital-signature/ +(venv) $ python main.py +``` + +### HTTP Payload + +#### Flask + +Start the web server: + +```shell +(venv) $ cd http-payload/flask-rest-api/ +(venv) $ flask --app main --debug run +``` + +Navigate to the "users" resource in your web browser: + + +Send an HTTP GET request to retrieve all users: + +```shell +$ curl -s http://127.0.0.1:5000/users | jq +[ + { + "name": "Alice", + "id": "512a956f-165a-429f-9ec8-83d859843072", + "created_at": "2023-11-13T12:29:18.664574" + }, + { + "name": "Bob", + "id": "fb52a80f-8982-46be-bcdd-605932d8ef03", + "created_at": "2023-11-13T12:29:18.664593" + } +] +``` + +Send an HTTP POST request to add a new user: + +```shell +$ curl -s -X POST http://127.0.0.1:5000/users \ + -H 'Content-Type: application/json' \ + --data '{"name": "Frank"}' | jq +{ + "name": "Frank", + "id": "f6d3cae7-f86a-4bc8-8d05-2fb65e8c6f3b", + "created_at": "2023-11-13T12:31:21.602389" +} +``` + +#### Django REST Framework + +Navigate to the folder: + +```shell +(venv) $ cd http-payload/django-rest-api/ +``` + +Apply the migrations if necessary: + +```shell +(venv) $ python manage.py migrate +``` + +Start the Django development web server: + +```shell +(venv) $ python manage.py runserver +``` + +Navigate to the "users" resource in your web browser: + + +You can use the web interface generated by Django REST Framework to send a POST request to add a new user, for example: + +```json +{"name": "Frank"} +``` + +#### FastAPI + +Start the web server: + +```shell +(venv) $ cd http-payload/fastapi-rest-api/ +(venv) $ uvicorn main:app --reload +``` + +Navigate to the "users" resource in your web browser: + + +Send an HTTP GET request to retrieve all users: + +```shell +$ curl -s http://127.0.0.1:8000/users | jq +[ + { + "name": "Alice", + "id": "512a956f-165a-429f-9ec8-83d859843072", + "created_at": "2023-11-13T12:29:18.664574" + }, + { + "name": "Bob", + "id": "fb52a80f-8982-46be-bcdd-605932d8ef03", + "created_at": "2023-11-13T12:29:18.664593" + } +] +``` + +Send an HTTP POST request to add a new user: + +```shell +$ curl -s -X POST http://127.0.0.1:8000/users \ + -H 'Content-Type: application/json' \ + --data '{"name": "Frank"}' | jq +{ + "name": "Frank", + "id": "f6d3cae7-f86a-4bc8-8d05-2fb65e8c6f3b", + "created_at": "2023-11-13T12:31:21.602389" +} +``` + +#### Pydantic + +Start the FastAPI server: + +```shell +(venv) $ cd http-payload/fastapi-rest-api/ +(venv) $ uvicorn main:app --reload +``` + +Run the REST API consumer: + +```shell +(venv) $ cd http-payload/pydantic-demo/ +(venv) $ python main.py +``` + +### Hierarchical Data + +#### XML, YAML, JSON, BSON + +```shell +(venv) $ cd hierarchical-data/ +(venv) $ python bson_demo.py +(venv) $ python yaml_demo.py +``` + +### Tabular Data + +#### CSV + +```shell +(venv) $ cd tabular-data/csv-demo/ +(venv) $ python main.py +``` + +#### Apache Parquet + +```shell +(venv) $ cd tabular-data/parquet-demo/ +(venv) $ python main.py +``` + +### Schema-Based Formats + +#### Apache Avro + +```shell +(venv) $ cd schema-based/avro-demo/ +(venv) $ python main.py +``` + +#### Protocol Buffers (Protobuf) + +Install the `protoc` compiler: + +```shell +$ sudo apt install protobuf-compiler +``` + +Generate Python code from IDL: + +```shell +(venv) $ cd schema-based/protocol-buffers-demo/ +(venv) $ protoc --python_out=. --pyi_out=. users.proto +``` + +Run the demo: + +```shell +(venv) $ python main.py +``` diff --git a/python-serialize/executable-code/code-objects/dill_demo.py b/python-serialize/executable-code/code-objects/dill_demo.py new file mode 100644 index 0000000000..37ec689401 --- /dev/null +++ b/python-serialize/executable-code/code-objects/dill_demo.py @@ -0,0 +1,22 @@ +import dill + + +def main(): + create_plus = deserialize(serialize()) + print(create_plus) + print(f"{create_plus(3)(2) = }") # noqa + + +def serialize(): + import plus + + plus.create_plus.__module__ = None + return dill.dumps(plus.create_plus, recurse=True) + + +def deserialize(data): + return dill.loads(data) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/executable-code/code-objects/plus.py b/python-serialize/executable-code/code-objects/plus.py new file mode 100644 index 0000000000..b13ac30ee9 --- /dev/null +++ b/python-serialize/executable-code/code-objects/plus.py @@ -0,0 +1,9 @@ +def create_plus(x): + def plus(y): + return x + y + + return plus + + +plus_one = create_plus(1) +plus_two = lambda x: x + 2 # noqa diff --git a/python-serialize/executable-code/digital-signature/code.pkl b/python-serialize/executable-code/digital-signature/code.pkl new file mode 100644 index 0000000000..57d066753e Binary files /dev/null and b/python-serialize/executable-code/digital-signature/code.pkl differ diff --git a/python-serialize/executable-code/digital-signature/main.py b/python-serialize/executable-code/digital-signature/main.py new file mode 100644 index 0000000000..9078949112 --- /dev/null +++ b/python-serialize/executable-code/digital-signature/main.py @@ -0,0 +1,30 @@ +import pickle +from pathlib import Path + +from trustworthy import safe_dump, safe_load + + +def main(): + path = Path("code.pkl") + serialize(lambda a, b: a + b, path, b"top-secret") + code = deserialize(path, b"top-secret") + print(code) + print(f"{code(3, 2) = }") # noqa + try: + deserialize(path, b"incorrect-key") + except pickle.UnpicklingError as ex: + print(repr(ex)) + + +def serialize(code, path, secret_key): + with path.open(mode="wb") as file: + safe_dump(code, file, secret_key) + + +def deserialize(path, secret_key): + with path.open(mode="rb") as file: + return safe_load(file, secret_key) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/executable-code/digital-signature/safe_unpickler.py b/python-serialize/executable-code/digital-signature/safe_unpickler.py new file mode 100644 index 0000000000..4a779aed0c --- /dev/null +++ b/python-serialize/executable-code/digital-signature/safe_unpickler.py @@ -0,0 +1,22 @@ +import importlib +import io +import pickle + + +class SafeUnpickler(pickle.Unpickler): + ALLOWED = { + "builtins": ["print"], + "sysconfig": ["get_python_version"], + } + + @classmethod + def safe_loads(cls, serialized_data): + file = io.BytesIO(serialized_data) + return cls(file).load() + + def find_class(self, module_name, name): + if module_name in self.ALLOWED: + if name in self.ALLOWED[module_name]: + module = importlib.import_module(module_name) + return getattr(module, name) + raise pickle.UnpicklingError(f"{module_name}.{name} is unsafe") diff --git a/python-serialize/executable-code/digital-signature/trustworthy.py b/python-serialize/executable-code/digital-signature/trustworthy.py new file mode 100644 index 0000000000..751c84a6d7 --- /dev/null +++ b/python-serialize/executable-code/digital-signature/trustworthy.py @@ -0,0 +1,23 @@ +import hashlib +import hmac + +import dill + + +def safe_dump(obj, file, secret_key): + serialized_data = dill.dumps(obj) + signature = sign(serialized_data, secret_key) + dill.dump(signature, file) + file.write(serialized_data) + + +def safe_load(file, secret_key): + signature = dill.load(file) + serialized_data = file.read() + if signature == sign(serialized_data, secret_key): + return dill.loads(serialized_data) + raise dill.UnpicklingError("invalid digital signature") + + +def sign(message, secret_key, algorithm=hashlib.sha256): + return hmac.new(secret_key, message, algorithm).digest() diff --git a/python-serialize/executable-code/pickle-importable/main.py b/python-serialize/executable-code/pickle-importable/main.py new file mode 100644 index 0000000000..9bf1cd5551 --- /dev/null +++ b/python-serialize/executable-code/pickle-importable/main.py @@ -0,0 +1,15 @@ +import pickle + +import plus + + +def main(): + function_raw = pickle.dumps(plus.create_plus) + function = pickle.loads(function_raw) + print(function_raw) + print(function) + print(f"{function(3)(2) = }") # noqa + + +if __name__ == "__main__": + main() diff --git a/python-serialize/executable-code/pickle-importable/plus.py b/python-serialize/executable-code/pickle-importable/plus.py new file mode 100644 index 0000000000..b13ac30ee9 --- /dev/null +++ b/python-serialize/executable-code/pickle-importable/plus.py @@ -0,0 +1,9 @@ +def create_plus(x): + def plus(y): + return x + y + + return plus + + +plus_one = create_plus(1) +plus_two = lambda x: x + 2 # noqa diff --git a/python-serialize/hierarchical-data/bson_demo.py b/python-serialize/hierarchical-data/bson_demo.py new file mode 100644 index 0000000000..9f1fadcf7e --- /dev/null +++ b/python-serialize/hierarchical-data/bson_demo.py @@ -0,0 +1,23 @@ +import json + +import bson + + +def main(): + binary_json = serialize() + print(binary_json) + print(deserialize(binary_json)) + + +def serialize(): + with open("data/training.json", encoding="utf-8") as file: + document = json.load(file) + return bson.encode(document) + + +def deserialize(data): + return bson.decode(data) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/hierarchical-data/data/training.json b/python-serialize/hierarchical-data/data/training.json new file mode 100644 index 0000000000..0c5c2c80ab --- /dev/null +++ b/python-serialize/hierarchical-data/data/training.json @@ -0,0 +1,30 @@ +{ + "program": { + "metadata": { + "author": "John Doe", + "goals": ["health improvement", "fat loss"] + }, + "exercises": { + "plank": {"muscles": ["abs", "core", "shoulders"]}, + "push-ups": {"muscles": ["chest", "biceps", "triceps"]} + }, + "days": { + "rest-day": {"type": "rest"}, + "workout-1": { + "type": "workout", + "segments": [ + {"type": "@plank", "seconds": 60}, + {"type": "rest", "seconds": 10}, + {"type": "@push-ups", "seconds": 60} + ] + } + }, + "schedule": [ + "@workout-1", + "@rest-day", + "@rest-day", + "@workout-1", + "@rest-day" + ] + } +} diff --git a/python-serialize/hierarchical-data/data/training.xml b/python-serialize/hierarchical-data/data/training.xml new file mode 100644 index 0000000000..c688754cd5 --- /dev/null +++ b/python-serialize/hierarchical-data/data/training.xml @@ -0,0 +1,55 @@ + + + + John Doe + + health improvement + fat loss + + + + + + abs + core + shoulders + + + + + chest + biceps + triceps + + + + + + rest + + + workout + + + + + + + rest + + + + + + + + + + + + + + + + + diff --git a/python-serialize/hierarchical-data/data/training.yaml b/python-serialize/hierarchical-data/data/training.yaml new file mode 100644 index 0000000000..71da581946 --- /dev/null +++ b/python-serialize/hierarchical-data/data/training.yaml @@ -0,0 +1,38 @@ +program: + metadata: + author: John Doe + goals: + - health improvement + - fat loss + exercises: + - &plank + muscles: + - abs + - core + - shoulders + - &pushups + muscles: + - chest + - biceps + - triceps + days: + - &restday + type: rest + - &workout1 + type: workout + segments: + - type: *plank + duration: + seconds: 60 + - type: rest + duration: + seconds: 10 + - type: *pushups + duration: + seconds: 60 + schedule: + - day: *workout1 + - day: *restday + - day: *restday + - day: *workout1 + - day: *restday diff --git a/python-serialize/hierarchical-data/yaml_demo.py b/python-serialize/hierarchical-data/yaml_demo.py new file mode 100644 index 0000000000..3162b0bde3 --- /dev/null +++ b/python-serialize/hierarchical-data/yaml_demo.py @@ -0,0 +1,23 @@ +import json + +import yaml + + +def main(): + yaml_data = serialize() + print(yaml_data) + print(deserialize(yaml_data)) + + +def serialize(): + with open("data/training.json", encoding="utf-8") as file: + document = json.load(file) + return yaml.safe_dump(document) + + +def deserialize(data): + return yaml.safe_load(data) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/http-payload/django-rest-api/manage.py b/python-serialize/http-payload/django-rest-api/manage.py new file mode 100755 index 0000000000..e170f6bafc --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/manage.py @@ -0,0 +1,22 @@ +#!/usr/bin/env python +"""Django's command-line utility for administrative tasks.""" +import os +import sys + + +def main(): + """Run administrative tasks.""" + os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings") + try: + from django.core.management import execute_from_command_line + except ImportError as exc: + raise ImportError( + "Couldn't import Django. Are you sure it's installed and " + "available on your PYTHONPATH environment variable? Did you " + "forget to activate a virtual environment?" + ) from exc + execute_from_command_line(sys.argv) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/http-payload/django-rest-api/project/__init__.py b/python-serialize/http-payload/django-rest-api/project/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/python-serialize/http-payload/django-rest-api/project/asgi.py b/python-serialize/http-payload/django-rest-api/project/asgi.py new file mode 100644 index 0000000000..3bc6e62e38 --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/project/asgi.py @@ -0,0 +1,16 @@ +""" +ASGI config for project project. + +It exposes the ASGI callable as a module-level variable named ``application``. + +For more information on this file, see +https://docs.djangoproject.com/en/4.2/howto/deployment/asgi/ +""" + +import os + +from django.core.asgi import get_asgi_application + +os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings") + +application = get_asgi_application() diff --git a/python-serialize/http-payload/django-rest-api/project/settings.py b/python-serialize/http-payload/django-rest-api/project/settings.py new file mode 100644 index 0000000000..fe8f7e4e53 --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/project/settings.py @@ -0,0 +1,127 @@ +""" +Django settings for project project. + +Generated by 'django-admin startproject' using Django 4.2.7. + +For more information on this file, see +https://docs.djangoproject.com/en/4.2/topics/settings/ + +For the full list of settings and their values, see +https://docs.djangoproject.com/en/4.2/ref/settings/ +""" + +from pathlib import Path + +# Build paths inside the project like this: BASE_DIR / 'subdir'. +BASE_DIR = Path(__file__).resolve().parent.parent + + +# Quick-start development settings - unsuitable for production +# See https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/ + +# SECURITY WARNING: keep the secret key used in production secret! +SECRET_KEY = ( + "django-insecure-(fi2)bq9#t#%jl$0f@3&b1$0$zcx_xb-wv8h%28-op@sq^-57v" +) + +# SECURITY WARNING: don't run with debug turned on in production! +DEBUG = True + +ALLOWED_HOSTS = [] + + +# Application definition + +INSTALLED_APPS = [ + "django.contrib.admin", + "django.contrib.auth", + "django.contrib.contenttypes", + "django.contrib.sessions", + "django.contrib.messages", + "django.contrib.staticfiles", + "rest_framework", + "rest_api", +] + +MIDDLEWARE = [ + "django.middleware.security.SecurityMiddleware", + "django.contrib.sessions.middleware.SessionMiddleware", + "django.middleware.common.CommonMiddleware", + "django.middleware.csrf.CsrfViewMiddleware", + "django.contrib.auth.middleware.AuthenticationMiddleware", + "django.contrib.messages.middleware.MessageMiddleware", + "django.middleware.clickjacking.XFrameOptionsMiddleware", +] + +ROOT_URLCONF = "project.urls" + +TEMPLATES = [ + { + "BACKEND": "django.template.backends.django.DjangoTemplates", + "DIRS": [], + "APP_DIRS": True, + "OPTIONS": { + "context_processors": [ + "django.template.context_processors.debug", + "django.template.context_processors.request", + "django.contrib.auth.context_processors.auth", + "django.contrib.messages.context_processors.messages", + ], + }, + }, +] + +WSGI_APPLICATION = "project.wsgi.application" + + +# Database +# https://docs.djangoproject.com/en/4.2/ref/settings/#databases + +DATABASES = { + "default": { + "ENGINE": "django.db.backends.sqlite3", + "NAME": BASE_DIR / "db.sqlite3", + } +} + + +# Password validation +# https://docs.djangoproject.com/en/4.2/ref/settings/#auth-password-validators + +AUTH_PASSWORD_VALIDATORS = [ + { + "NAME": "django.contrib.auth.password_validation.UserAttributeSimilarityValidator", + }, + { + "NAME": "django.contrib.auth.password_validation.MinimumLengthValidator", + }, + { + "NAME": "django.contrib.auth.password_validation.CommonPasswordValidator", + }, + { + "NAME": "django.contrib.auth.password_validation.NumericPasswordValidator", + }, +] + + +# Internationalization +# https://docs.djangoproject.com/en/4.2/topics/i18n/ + +LANGUAGE_CODE = "en-us" + +TIME_ZONE = "UTC" + +USE_I18N = True + +USE_TZ = True + + +# Static files (CSS, JavaScript, Images) +# https://docs.djangoproject.com/en/4.2/howto/static-files/ + +STATIC_URL = "static/" + +# Default primary key field type +# https://docs.djangoproject.com/en/4.2/ref/settings/#default-auto-field + +DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField" diff --git a/python-serialize/http-payload/django-rest-api/project/urls.py b/python-serialize/http-payload/django-rest-api/project/urls.py new file mode 100644 index 0000000000..b4217eef6a --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/project/urls.py @@ -0,0 +1,23 @@ +""" +URL configuration for project project. + +The `urlpatterns` list routes URLs to views. For more information please see: + https://docs.djangoproject.com/en/4.2/topics/http/urls/ +Examples: +Function views + 1. Add an import: from my_app import views + 2. Add a URL to urlpatterns: path('', views.home, name='home') +Class-based views + 1. Add an import: from other_app.views import Home + 2. Add a URL to urlpatterns: path('', Home.as_view(), name='home') +Including another URLconf + 1. Import the include() function: from django.urls import include, path + 2. Add a URL to urlpatterns: path('blog/', include('blog.urls')) +""" +from django.contrib import admin +from django.urls import include, path + +urlpatterns = [ + path("admin/", admin.site.urls), + path("users/", include("rest_api.urls")), +] diff --git a/python-serialize/http-payload/django-rest-api/project/wsgi.py b/python-serialize/http-payload/django-rest-api/project/wsgi.py new file mode 100644 index 0000000000..05ee66ce09 --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/project/wsgi.py @@ -0,0 +1,16 @@ +""" +WSGI config for project project. + +It exposes the WSGI callable as a module-level variable named ``application``. + +For more information on this file, see +https://docs.djangoproject.com/en/4.2/howto/deployment/wsgi/ +""" + +import os + +from django.core.wsgi import get_wsgi_application + +os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings") + +application = get_wsgi_application() diff --git a/python-serialize/http-payload/django-rest-api/rest_api/__init__.py b/python-serialize/http-payload/django-rest-api/rest_api/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/python-serialize/http-payload/django-rest-api/rest_api/apps.py b/python-serialize/http-payload/django-rest-api/rest_api/apps.py new file mode 100644 index 0000000000..ed762758ae --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/rest_api/apps.py @@ -0,0 +1,6 @@ +from django.apps import AppConfig + + +class RestApiConfig(AppConfig): + default_auto_field = "django.db.models.BigAutoField" + name = "rest_api" diff --git a/python-serialize/http-payload/django-rest-api/rest_api/migrations/0001_initial.py b/python-serialize/http-payload/django-rest-api/rest_api/migrations/0001_initial.py new file mode 100644 index 0000000000..38ebeef0ad --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/rest_api/migrations/0001_initial.py @@ -0,0 +1,31 @@ +# Generated by Django 4.2.7 on 2023-11-13 13:38 + +import uuid + +import django.utils.timezone +from django.db import migrations, models + + +class Migration(migrations.Migration): + initial = True + + dependencies = [] + + operations = [ + migrations.CreateModel( + name="User", + fields=[ + ( + "id", + models.UUIDField( + default=uuid.uuid4, primary_key=True, serialize=False + ), + ), + ("name", models.CharField(max_length=200)), + ( + "created_at", + models.DateTimeField(default=django.utils.timezone.now), + ), + ], + ), + ] diff --git a/python-serialize/http-payload/django-rest-api/rest_api/migrations/__init__.py b/python-serialize/http-payload/django-rest-api/rest_api/migrations/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/python-serialize/http-payload/django-rest-api/rest_api/models.py b/python-serialize/http-payload/django-rest-api/rest_api/models.py new file mode 100644 index 0000000000..065169e097 --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/rest_api/models.py @@ -0,0 +1,10 @@ +import uuid + +from django.db import models +from django.utils import timezone + + +class User(models.Model): + id = models.UUIDField(primary_key=True, default=uuid.uuid4) + name = models.CharField(max_length=200) + created_at = models.DateTimeField(default=timezone.now) diff --git a/python-serialize/http-payload/django-rest-api/rest_api/serializers.py b/python-serialize/http-payload/django-rest-api/rest_api/serializers.py new file mode 100644 index 0000000000..295795a3b7 --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/rest_api/serializers.py @@ -0,0 +1,15 @@ +from rest_framework import serializers + +from . import models + + +class UserSerializerOut(serializers.ModelSerializer): + class Meta: + model = models.User + fields = "__all__" + + +class UserSerializerIn(serializers.ModelSerializer): + class Meta: + model = models.User + fields = ["name"] diff --git a/python-serialize/http-payload/django-rest-api/rest_api/urls.py b/python-serialize/http-payload/django-rest-api/rest_api/urls.py new file mode 100644 index 0000000000..9f96dab657 --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/rest_api/urls.py @@ -0,0 +1,5 @@ +from django.urls import path + +from . import views + +urlpatterns = [path("", views.handle_users)] diff --git a/python-serialize/http-payload/django-rest-api/rest_api/views.py b/python-serialize/http-payload/django-rest-api/rest_api/views.py new file mode 100644 index 0000000000..0adf175b9f --- /dev/null +++ b/python-serialize/http-payload/django-rest-api/rest_api/views.py @@ -0,0 +1,21 @@ +from rest_framework import status +from rest_framework.decorators import api_view +from rest_framework.response import Response + +from .models import User +from .serializers import UserSerializerIn, UserSerializerOut + + +@api_view(["GET", "POST"]) +def handle_users(request): + if request.method == "GET": + users = User.objects.all() + serializer = UserSerializerOut(users, many=True) + return Response(serializer.data) + elif request.method == "POST": + serializer_in = UserSerializerIn(data=request.data) + if serializer_in.is_valid(): + user = serializer_in.save() + serializer_out = UserSerializerOut(user) + return Response(serializer_out.data, status.HTTP_201_CREATED) + return Response(serializer_in.errors, status.HTTP_400_BAD_REQUEST) diff --git a/python-serialize/http-payload/fastapi-rest-api/main.py b/python-serialize/http-payload/fastapi-rest-api/main.py new file mode 100644 index 0000000000..4d447c281a --- /dev/null +++ b/python-serialize/http-payload/fastapi-rest-api/main.py @@ -0,0 +1,34 @@ +from datetime import datetime +from uuid import UUID, uuid4 + +from fastapi import FastAPI +from pydantic import BaseModel, Field + +app = FastAPI() + + +class UserIn(BaseModel): + name: str + + +class UserOut(UserIn): + id: UUID = Field(default_factory=uuid4) + created_at: datetime = Field(default_factory=datetime.now) + + +users = [ + UserOut(name="Alice"), + UserOut(name="Bob"), +] + + +@app.get("/users") +async def get_users(): + return users + + +@app.post("/users", status_code=201) +async def create_user(user_in: UserIn): + user_out = UserOut(name=user_in.name) + users.append(user_out) + return user_out diff --git a/python-serialize/http-payload/flask-rest-api/main.py b/python-serialize/http-payload/flask-rest-api/main.py new file mode 100644 index 0000000000..1841c92e9a --- /dev/null +++ b/python-serialize/http-payload/flask-rest-api/main.py @@ -0,0 +1,40 @@ +from dataclasses import dataclass +from datetime import datetime +from uuid import UUID, uuid4 + +from flask import Flask, jsonify, request + +app = Flask(__name__) + + +@dataclass +class User: + id: UUID + name: str + created_at: datetime + + @classmethod + def create(cls, name): + return cls(uuid4(), name, datetime.now()) + + +users = [ + User.create("Alice"), + User.create("Bob"), +] + + +@app.route("/users", methods=["GET", "POST"]) +def view_users(): + if request.method == "GET": + return users + elif request.method == "POST": + if request.is_json: + payload = request.get_json() + user = User.create(payload["name"]) + users.append(user) + return jsonify(user), 201 + + +if __name__ == "__main__": + app.run(debug=True) diff --git a/python-serialize/http-payload/pydantic-demo/main.py b/python-serialize/http-payload/pydantic-demo/main.py new file mode 100644 index 0000000000..df385057ef --- /dev/null +++ b/python-serialize/http-payload/pydantic-demo/main.py @@ -0,0 +1,27 @@ +from datetime import datetime +from uuid import UUID + +import httpx +from pydantic import BaseModel, Field, field_validator + + +class Metadata(BaseModel): + uuid: UUID = Field(alias="id") + created_at: datetime + + +class User(Metadata): + name: str + + @field_validator("name") + def check_user_name(cls, name): + if name[0].isupper(): + return name + raise ValueError("name must start with an uppercase letter") + + +if __name__ == "__main__": + response = httpx.get("http://localhost:8000/users") + for item in response.json(): + user = User(**item) + print(repr(user)) diff --git a/python-serialize/python-objects/customize-pickle/main.py b/python-serialize/python-objects/customize-pickle/main.py new file mode 100644 index 0000000000..734ae76e81 --- /dev/null +++ b/python-serialize/python-objects/customize-pickle/main.py @@ -0,0 +1,22 @@ +import pickle + +from models import User + + +def main(): + data = serialize(User("alice", "secret")) + user = deserialize(data) + print(data) + print(user) + + +def serialize(user): + return pickle.dumps(user) + + +def deserialize(data): + return pickle.loads(data) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/python-objects/customize-pickle/models.py b/python-serialize/python-objects/customize-pickle/models.py new file mode 100644 index 0000000000..05c61bd981 --- /dev/null +++ b/python-serialize/python-objects/customize-pickle/models.py @@ -0,0 +1,19 @@ +import time +from dataclasses import dataclass + + +@dataclass +class User: + name: str + password: str + + def __getstate__(self): + state = self.__dict__.copy() + state["timestamp"] = int(time.time()) + del state["password"] + return state + + def __setstate__(self, state): + self.__dict__.update(state) + with open("/dev/random", mode="rb") as file: + self.password = file.read(8).decode("ascii", errors="ignore") diff --git a/python-serialize/python-objects/foreign-formats/jsonpickle_demo.py b/python-serialize/python-objects/foreign-formats/jsonpickle_demo.py new file mode 100644 index 0000000000..26d7035c1c --- /dev/null +++ b/python-serialize/python-objects/foreign-formats/jsonpickle_demo.py @@ -0,0 +1,22 @@ +import jsonpickle +from models import User + + +def main(): + user_json = serialize() + user = deserialize(user_json) + print(user_json) + print(user) + + +def serialize(): + user = User(name="John", password="*%!U8n9erx@GdqK(@J") + return jsonpickle.dumps(user, indent=4) + + +def deserialize(json_data): + return jsonpickle.loads(json_data) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/python-objects/foreign-formats/models.py b/python-serialize/python-objects/foreign-formats/models.py new file mode 100644 index 0000000000..05c61bd981 --- /dev/null +++ b/python-serialize/python-objects/foreign-formats/models.py @@ -0,0 +1,19 @@ +import time +from dataclasses import dataclass + + +@dataclass +class User: + name: str + password: str + + def __getstate__(self): + state = self.__dict__.copy() + state["timestamp"] = int(time.time()) + del state["password"] + return state + + def __setstate__(self, state): + self.__dict__.update(state) + with open("/dev/random", mode="rb") as file: + self.password = file.read(8).decode("ascii", errors="ignore") diff --git a/python-serialize/python-objects/foreign-formats/pyyaml_demo.py b/python-serialize/python-objects/foreign-formats/pyyaml_demo.py new file mode 100644 index 0000000000..bc644e6163 --- /dev/null +++ b/python-serialize/python-objects/foreign-formats/pyyaml_demo.py @@ -0,0 +1,22 @@ +import yaml +from models import User + + +def main(): + user_yaml = serialize() + user = deserialize(user_yaml) + print(user_yaml) + print(user) + + +def serialize(): + user = User(name="John", password="*%!U8n9erx@GdqK(@J") + return yaml.dump(user) + + +def deserialize(yaml_data): + return yaml.unsafe_load(yaml_data) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/python-objects/json-encode/main.py b/python-serialize/python-objects/json-encode/main.py new file mode 100644 index 0000000000..4281f3b039 --- /dev/null +++ b/python-serialize/python-objects/json-encode/main.py @@ -0,0 +1,26 @@ +import json + + +def main(): + data = {"weekend_days": {"Saturday", "Sunday"}} + json_string = json.dumps(data, default=serialize_custom) + python_dict = json.loads(json_string, object_hook=deserialize_custom) + print(repr(json_string)) + print(repr(python_dict)) + + +def serialize_custom(value): + if isinstance(value, set): + return {"type": "set", "elements": list(value)} + + +def deserialize_custom(value): + match value: + case {"type": "set", "elements": elements}: + return set(elements) + case _: + return value + + +if __name__ == "__main__": + main() diff --git a/python-serialize/python-objects/standard-python/data.pkl b/python-serialize/python-objects/standard-python/data.pkl new file mode 100644 index 0000000000..c678f9e044 --- /dev/null +++ b/python-serialize/python-objects/standard-python/data.pkl @@ -0,0 +1 @@ +K. \ No newline at end of file diff --git a/python-serialize/python-objects/standard-python/dbm_demo.py b/python-serialize/python-objects/standard-python/dbm_demo.py new file mode 100644 index 0000000000..1f31cdfa55 --- /dev/null +++ b/python-serialize/python-objects/standard-python/dbm_demo.py @@ -0,0 +1,5 @@ +import dbm + +with dbm.open("/tmp/cache.db") as db: + for key in db.keys(): + print(f"{key} = {db[key]}") diff --git a/python-serialize/python-objects/standard-python/marshal_demo.py b/python-serialize/python-objects/standard-python/marshal_demo.py new file mode 100644 index 0000000000..5ea59a59b9 --- /dev/null +++ b/python-serialize/python-objects/standard-python/marshal_demo.py @@ -0,0 +1,22 @@ +import marshal +import sysconfig +from importlib.util import cache_from_source +from pathlib import Path + + +def main(): + stdlib_dir = Path(sysconfig.get_path("stdlib")) + module_path = stdlib_dir / Path(cache_from_source("decimal.py")) + import_pyc(module_path) + print(Decimal(3.14)) # noqa + + +def import_pyc(path): + with path.open(mode="rb") as file: + _ = file.read(16) # Skip the file header + code = marshal.loads(file.read()) + exec(code, globals()) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/python-objects/standard-python/pickle_demo.py b/python-serialize/python-objects/standard-python/pickle_demo.py new file mode 100644 index 0000000000..4847f49244 --- /dev/null +++ b/python-serialize/python-objects/standard-python/pickle_demo.py @@ -0,0 +1,25 @@ +import array +import pickle + + +def main(): + serialize(data=255, filename="data.pkl") + data = deserialize("data.pkl") + print(data) + print(pickle.dumps(data)) + print(pickle.loads(b"\x80\x05K\xff.")) + print(pickle.loads(array.array("B", [128, 4, 75, 255, 46]))) + + +def serialize(data, filename): + with open(filename, mode="wb") as file: + pickle.dump(data, file) + + +def deserialize(filename): + with open(filename, mode="rb") as file: + return pickle.load(file) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/python-objects/standard-python/shelve_demo.py b/python-serialize/python-objects/standard-python/shelve_demo.py new file mode 100644 index 0000000000..d5cd1ae279 --- /dev/null +++ b/python-serialize/python-objects/standard-python/shelve_demo.py @@ -0,0 +1,13 @@ +import shelve + +with shelve.open("/tmp/cache.db") as shelf: + shelf["last_updated"] = 1696846049.8469703 + shelf["user_sessions"] = { + "jdoe@domain.com": { + "user_id": 4185395169, + "roles": {"admin", "editor"}, + "preferences": {"language": "en_US", "dark_theme": False}, + } + } + for key, value in shelf.items(): + print(f"['{key}'] = {value}") diff --git a/python-serialize/requirements.txt b/python-serialize/requirements.txt new file mode 100644 index 0000000000..d71aa3f174 --- /dev/null +++ b/python-serialize/requirements.txt @@ -0,0 +1,45 @@ +annotated-types==0.6.0 +anyio==3.7.1 +asgiref==3.7.2 +blinker==1.7.0 +certifi==2023.7.22 +click==8.1.7 +cramjam==2.7.0 +dill==0.3.7 +Django==4.2.7 +djangorestframework==3.14.0 +dnspython==2.4.2 +Faker==20.0.0 +fastapi==0.104.1 +fastavro==1.9.0 +fastparquet==2023.10.1 +Flask==3.0.0 +fsspec==2023.10.0 +h11==0.14.0 +httpcore==1.0.2 +httpx==0.25.1 +idna==3.4 +itsdangerous==2.1.2 +Jinja2==3.1.2 +jsonpickle==3.0.2 +MarkupSafe==2.1.3 +numpy==1.26.2 +packaging==23.2 +pandas==2.1.3 +protobuf==4.25.0 +pyarrow==14.0.1 +pydantic==2.4.2 +pydantic_core==2.10.1 +pymongo==4.6.0 +python-dateutil==2.8.2 +pytz==2023.3.post1 +PyYAML==6.0.1 +six==1.16.0 +sniffio==1.3.0 +sqlparse==0.4.4 +starlette==0.27.0 +types-protobuf==4.24.0.4 +typing_extensions==4.8.0 +tzdata==2023.3 +uvicorn==0.24.0.post1 +Werkzeug==3.0.1 diff --git a/python-serialize/schema-based/avro-demo/main.py b/python-serialize/schema-based/avro-demo/main.py new file mode 100644 index 0000000000..5029c4580a --- /dev/null +++ b/python-serialize/schema-based/avro-demo/main.py @@ -0,0 +1,26 @@ +from fastavro import reader, writer +from fastavro.schema import load_schema +from models import User + + +def main(): + serialize("users.avro", "user.asvc") + for user in deserialize("users.avro"): + print(user) + + +def serialize(filename, schema_filename): + users = [User.fake() for _ in range(5)] + with open(filename, mode="wb") as file: + schema = load_schema(schema_filename) + writer(file, schema, [user._asdict() for user in users]) + + +def deserialize(filename): + with open(filename, mode="rb") as file: + for record in reader(file): + yield User(**record) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/schema-based/avro-demo/models.py b/python-serialize/schema-based/avro-demo/models.py new file mode 100644 index 0000000000..2460ca2956 --- /dev/null +++ b/python-serialize/schema-based/avro-demo/models.py @@ -0,0 +1,34 @@ +import random +from datetime import datetime +from enum import StrEnum +from typing import NamedTuple + +from faker import Faker + + +class Language(StrEnum): + DE = "de" + EN = "en" + ES = "es" + FR = "fr" + IT = "it" + + +class User(NamedTuple): + id: int + name: str + email: str + language: Language + registered_at: datetime + + @classmethod + def fake(cls): + language = random.choice(list(Language)) + generator = Faker(language) + return cls( + generator.pyint(), + generator.name(), + generator.email(), + language, + generator.date_time_this_year(), + ) diff --git a/python-serialize/schema-based/avro-demo/user.avsc b/python-serialize/schema-based/avro-demo/user.avsc new file mode 100644 index 0000000000..cf83e8e982 --- /dev/null +++ b/python-serialize/schema-based/avro-demo/user.avsc @@ -0,0 +1,21 @@ +{ + "name": "User", + "type": "record", + "fields": [ + {"name": "id", "type": "long"}, + {"name": "name", "type": "string"}, + {"name": "email", "type": "string"}, + { + "name": "language", + "type": { + "name": "Language", + "type": "enum", + "symbols": ["de", "en", "es", "fr", "it"] + } + }, + { + "name": "registered_at", + "type": {"type": "long", "logicalType": "timestamp-millis"} + } + ] +} diff --git a/python-serialize/schema-based/avro-demo/users.avro b/python-serialize/schema-based/avro-demo/users.avro new file mode 100644 index 0000000000..b416f2eb06 Binary files /dev/null and b/python-serialize/schema-based/avro-demo/users.avro differ diff --git a/python-serialize/schema-based/protocol-buffers-demo/main.py b/python-serialize/schema-based/protocol-buffers-demo/main.py new file mode 100644 index 0000000000..96e0c7e426 --- /dev/null +++ b/python-serialize/schema-based/protocol-buffers-demo/main.py @@ -0,0 +1,44 @@ +from models import Language, User +from users_pb2 import Language as LanguageDAO +from users_pb2 import User as UserDAO +from users_pb2 import Users as UsersDAO + + +def main(): + buffer = serialize() + users = deserialize(buffer) + print(buffer) + print(users) + + +def serialize(): + users = [User.fake() for _ in range(5)] + users_dao = UsersDAO() + for user in users: + user_dao = UserDAO() + user_dao.id = user.id + user_dao.name = user.name + user_dao.email = user.email + user_dao.language = LanguageDAO.Value(user.language.name) + user_dao.registered_at.FromDatetime(user.registered_at) + users_dao.users.append(user_dao) + return users_dao.SerializeToString() + + +def deserialize(buffer): + users_dao = UsersDAO() + users_dao.ParseFromString(buffer) + return [ + User( + id=user_dao.id, + name=user_dao.name, + email=user_dao.email, + language=list(Language)[user_dao.language], + registered_at=user_dao.registered_at.ToDatetime(), + ) + for user_dao in users_dao.users + ] + + +if __name__ == "__main__": + main() diff --git a/python-serialize/schema-based/protocol-buffers-demo/models.py b/python-serialize/schema-based/protocol-buffers-demo/models.py new file mode 100644 index 0000000000..2460ca2956 --- /dev/null +++ b/python-serialize/schema-based/protocol-buffers-demo/models.py @@ -0,0 +1,34 @@ +import random +from datetime import datetime +from enum import StrEnum +from typing import NamedTuple + +from faker import Faker + + +class Language(StrEnum): + DE = "de" + EN = "en" + ES = "es" + FR = "fr" + IT = "it" + + +class User(NamedTuple): + id: int + name: str + email: str + language: Language + registered_at: datetime + + @classmethod + def fake(cls): + language = random.choice(list(Language)) + generator = Faker(language) + return cls( + generator.pyint(), + generator.name(), + generator.email(), + language, + generator.date_time_this_year(), + ) diff --git a/python-serialize/schema-based/protocol-buffers-demo/users.proto b/python-serialize/schema-based/protocol-buffers-demo/users.proto new file mode 100644 index 0000000000..0c2e63f61c --- /dev/null +++ b/python-serialize/schema-based/protocol-buffers-demo/users.proto @@ -0,0 +1,25 @@ +syntax = "proto3"; + +package com.realpython; + +import "google/protobuf/timestamp.proto"; + +enum Language { + DE = 0; + EN = 1; + ES = 2; + FR = 3; + IT = 4; +} + +message User { + int64 id = 1; + string name = 2; + string email = 3; + Language language = 4; + google.protobuf.Timestamp registered_at = 5; +} + +message Users { + repeated User users = 1; +} diff --git a/python-serialize/tabular-data/csv-demo/main.py b/python-serialize/tabular-data/csv-demo/main.py new file mode 100644 index 0000000000..da9f9b6eea --- /dev/null +++ b/python-serialize/tabular-data/csv-demo/main.py @@ -0,0 +1,26 @@ +import csv + +from models import User + + +def main(): + serialize("users.csv") + for user in deserialize("users.csv"): + print(user) + + +def serialize(filename): + users = [User.fake() for _ in range(50)] + with open(filename, mode="w", encoding="utf-8", newline="") as file: + writer = csv.writer(file) + writer.writerows(users) + + +def deserialize(filename): + with open(filename, mode="r", encoding="utf-8", newline="") as file: + reader = csv.DictReader(file, fieldnames=User._fields) + return [User.from_dict(row_dict) for row_dict in reader] + + +if __name__ == "__main__": + main() diff --git a/python-serialize/tabular-data/csv-demo/models.py b/python-serialize/tabular-data/csv-demo/models.py new file mode 100644 index 0000000000..461372f516 --- /dev/null +++ b/python-serialize/tabular-data/csv-demo/models.py @@ -0,0 +1,49 @@ +import random +from datetime import datetime +from enum import StrEnum +from typing import NamedTuple + +from faker import Faker + + +class Language(StrEnum): + DE = "de" + EN = "en" + ES = "es" + FR = "fr" + IT = "it" + + +class User(NamedTuple): + id: int + name: str + email: str + language: Language + registered_at: datetime + + @classmethod + def fake(cls): + language = random.choice(list(Language)) + generator = Faker(language) + return cls( + generator.pyint(), + generator.name(), + generator.email(), + language, + generator.date_time_this_year(), + ) + + @classmethod + def from_dict(cls, row_dict): + transforms = { + "id": int, + "name": str.title, + "language": Language, + "registered_at": datetime.fromisoformat, + } + return cls( + **{ + key: transforms.get(key, lambda x: x)(value) + for key, value in row_dict.items() + } + ) diff --git a/python-serialize/tabular-data/csv-demo/users.csv b/python-serialize/tabular-data/csv-demo/users.csv new file mode 100644 index 0000000000..ecf2ca0b4b --- /dev/null +++ b/python-serialize/tabular-data/csv-demo/users.csv @@ -0,0 +1,50 @@ +9758,Henriette Berthelot,matthieu52@example.com,fr,2023-02-21 04:05:44.093987 +3445,Debbie Salazar,millerjames@example.com,en,2023-04-10 03:44:32.702553 +9939,Kevin Foster,pjohns@example.com,en,2023-01-30 10:06:32.144663 +6880,Elpidio Amaya,prodenas@example.com,es,2023-05-09 22:45:52.133888 +1572,Margaud Hardy,anne82@example.org,fr,2023-11-11 16:37:18.211205 +2363,Chema Estévez Ríos,rubiomatilde@example.com,es,2023-06-07 12:28:28.087932 +9083,Noa de Urrutia,fernandoribas@example.net,es,2023-10-24 10:48:39.021567 +4364,Odalis Correa Vilaplana,kmateo@example.net,es,2023-10-03 18:50:57.957958 +2518,Sig. Achille Franzese,angelicapacelli@example.org,it,2023-01-30 04:48:26.045722 +4213,Sandra Ray,ncarter@example.com,en,2023-08-21 17:38:53.852478 +1294,Martine Boulanger,margaux64@example.org,fr,2023-05-16 01:43:01.331903 +8920,Iolanda Lovato,kmercalli@example.org,it,2023-06-28 08:38:46.989600 +6201,Dogan Junken,schenkbekir@example.com,de,2023-03-30 08:58:41.834983 +6319,Nicolas Chauvet,menardjeannine@example.net,fr,2023-01-01 09:33:36.552130 +1545,Patrick de Lemoine,gjoseph@example.com,fr,2023-04-14 09:41:36.916881 +3923,Valérie de Colin,girardnicolas@example.org,fr,2023-05-18 15:00:09.999234 +5732,Rogelio Díez,tgelabert@example.com,es,2023-01-07 05:50:49.632160 +3594,Timothy Lee,charles72@example.org,en,2023-06-22 18:17:54.068984 +7079,Herr Hartwig Haering B.A.,kemalernst@example.com,de,2023-04-11 21:07:34.901169 +3634,Luc Morin,patrickperrin@example.org,fr,2023-08-27 01:57:16.371915 +1285,Matthew Moore,qjohnson@example.org,en,2023-03-18 09:11:48.317362 +2886,Mathilde Maurice,michelroussel@example.net,fr,2023-04-12 10:10:14.976291 +1236,Dino Guidotti-Florio,gianpietro33@example.org,it,2023-01-03 06:13:02.425219 +5053,Eustaquio Esteve-Perera,jescalona@example.org,es,2023-03-15 17:28:47.761729 +101,Bernardino Grande Falcón,almaneira@example.net,es,2023-10-31 16:19:00.517415 +9085,Monique-Geneviève Roche,gregoirecaroline@example.net,fr,2023-11-10 07:41:17.195860 +6467,Dr. Justine Eberhardt B.Eng.,rudolphzimmer@example.net,de,2023-01-20 15:23:26.985409 +2462,Herminio Vázquez Sosa,cconesa@example.org,es,2023-10-16 23:36:15.336254 +6172,Erhard Biggen,reimerschoenland@example.org,de,2023-02-02 23:04:13.797610 +1536,Mr. David May,phyllis06@example.net,en,2023-10-29 23:07:43.348745 +1983,James Sullivan,alexandra75@example.com,en,2023-03-23 05:08:04.023468 +3780,Ullrich Koch B.Sc.,mahmoudadolph@example.net,de,2023-03-27 22:35:04.419591 +2225,Carlos Albero Seco,nataliocantero@example.org,es,2023-04-04 11:40:47.346411 +5690,Franck Raymond,oleger@example.net,fr,2023-10-23 02:46:38.379807 +5303,Romolo Perozzo,zoppettigianfrancesco@example.com,it,2023-08-19 10:47:31.681339 +1893,Grégoire Pires,leclercmarianne@example.org,fr,2023-09-21 14:49:07.711476 +3099,Roberto Rubio,emily43@example.com,en,2023-04-01 18:09:12.746216 +9594,Rita Valier,vito75@example.net,it,2023-01-01 19:11:28.436761 +5167,Marcus Black,susanwright@example.org,en,2023-03-16 19:28:58.934455 +2951,Emmanuel Noël,gdidier@example.net,fr,2023-09-23 02:29:43.045283 +3104,Piersanti Antonioni,ligoriocarolina@example.com,it,2023-02-03 02:16:20.030199 +8133,Karl-Ludwig Röhrdanz,karl-heinz08@example.com,de,2023-06-26 20:22:00.082432 +7897,Andrew Rosenow,birgit11@example.com,de,2023-04-18 09:17:21.746676 +3887,Ruy Aroca Perales,hector49@example.org,es,2023-05-22 22:52:39.453287 +1342,Vinzenz Lindner-Kuhl,madeleinegraf@example.org,de,2023-10-15 15:49:32.140109 +7607,Filomena Planas-Jiménez,palmirajulia@example.net,es,2023-03-01 20:27:14.208775 +9399,Etelvina Costa Quiroga,ordonezyaiza@example.net,es,2023-07-05 15:04:41.464586 +6275,Susan Masse,william66@example.org,fr,2023-10-19 22:42:31.476621 +329,Jeanette Nerger,williamtlustek@example.org,de,2023-02-04 03:41:33.508890 +8075,Coriolano Alboni,pizzamanomatteo@example.org,it,2023-02-25 20:25:11.711279 diff --git a/python-serialize/tabular-data/parquet-demo/main.py b/python-serialize/tabular-data/parquet-demo/main.py new file mode 100644 index 0000000000..757d4459cd --- /dev/null +++ b/python-serialize/tabular-data/parquet-demo/main.py @@ -0,0 +1,46 @@ +import fastparquet +import pandas as pd +import pyarrow.parquet as pq + + +def main(): + users_df = pd.read_csv("users.csv") + serialize_with_pandas(users_df, "users.parquet") + + df1 = deserialize_with_pandas("users.parquet") + df2 = deserialize_with_pyarrow("users.parquet") + df3 = deserialize_with_fastparquet("users.parquet") + + print(f"{df1.equals(df2) = }") # noqa + print(f"{df2.equals(df3) = }") # noqa + + df = prune_and_filter("users.parquet") + print(df.head()) + + +def serialize_with_pandas(df, filename): + df.to_parquet(filename) + + +def deserialize_with_pandas(filename): + return pd.read_parquet(filename) + + +def deserialize_with_pyarrow(filename): + return pq.read_table(filename).to_pandas() + + +def deserialize_with_fastparquet(filename): + return fastparquet.ParquetFile(filename).to_pandas() + + +def prune_and_filter(filename): + return pd.read_parquet( + filename, + filters=[("language", "=", "fr")], + columns=["id", "name"], + ) + + +if __name__ == "__main__": + main() diff --git a/python-serialize/tabular-data/parquet-demo/users.csv b/python-serialize/tabular-data/parquet-demo/users.csv new file mode 100644 index 0000000000..d79035dee0 --- /dev/null +++ b/python-serialize/tabular-data/parquet-demo/users.csv @@ -0,0 +1,51 @@ +id,name,email,language,registered_at +3684,Alexandre Pages,elevy@example.net,fr,2023-02-20 19:37:51.504749 +4601,Dino Linke B.Eng.,jbolnbach@example.net,de,2023-01-28 08:41:25.955049 +208,Eutimio Jerez Molina,apaz@example.com,es,2023-04-06 16:04:41.463429 +6593,Joan Esteve Valverde,morellfelix@example.com,es,2023-10-04 05:35:40.844968 +7218,James Archer,tylermclean@example.com,en,2023-03-05 10:23:56.022739 +6981,Nadia Bragadin-Carfagna,fibonaccigirolamo@example.net,it,2023-11-06 02:31:45.759037 +698,Piermaria Salvemini,brogginiantonella@example.net,it,2023-01-15 11:43:53.604350 +9118,Berenice Basadonna,goridante@example.com,it,2023-04-25 15:07:28.558049 +1602,Dolores Klemm B.A.,niklas47@example.com,de,2023-10-15 01:38:36.430892 +7917,Ricciotti Missoni,annunziata97@example.net,it,2023-08-17 22:44:35.219119 +2268,Susan Chen,melaniehall@example.net,en,2023-04-03 06:33:23.903512 +842,Eugène Le Dos Santos,margotda-silva@example.org,fr,2023-01-12 14:07:30.274792 +1216,Mirko Weimer-Rust,mangoldgiovanna@example.org,de,2023-04-01 16:59:07.380842 +441,Haroldo Bayo-Cárdenas,teofilobernad@example.org,es,2023-10-25 19:30:31.373488 +7318,Philippine Weiss,legeragnes@example.org,fr,2023-10-26 05:39:44.033645 +8558,Graziella Farnese,nboezio@example.net,it,2023-07-19 10:21:41.336847 +4835,Éléonore Robin,stephanie53@example.net,fr,2023-08-08 08:59:19.177603 +2097,Stacy Carlson,colemanolivia@example.com,en,2023-03-05 22:16:20.904828 +4591,Begoña Román Alfaro,isaaclosada@example.net,es,2023-01-20 01:44:20.073290 +8413,Gabriella Cardano,leonardosolimena@example.com,it,2023-04-13 11:44:34.084960 +4118,Ángela Pérez,desiderio38@example.net,es,2023-04-04 08:15:00.260845 +872,Katrina Martin,xking@example.com,en,2023-11-12 10:16:23.902219 +4793,William Webster,xstrong@example.org,en,2023-08-15 04:29:27.461179 +2696,Gabriel Petitjean,dupuisastrid@example.com,fr,2023-08-31 03:55:32.363045 +7508,Mohammed Löffler,danielle08@example.com,de,2023-02-27 02:09:37.569016 +4832,Werner Becker-Weimer,ingeborg77@example.org,de,2023-07-07 06:57:37.816039 +4705,Pasquale Giannini,costantinocavanna@example.com,it,2023-06-20 11:12:29.187413 +6449,Dott. Alessia Mastroianni,dovaragiancarlo@example.com,it,2023-04-26 14:54:03.092463 +2935,Josefa Rey Barrena,banglada@example.net,es,2023-11-03 03:00:21.450536 +3321,Elvira Sáez Tolosa,palvarez@example.net,es,2023-05-28 18:14:38.540089 +3642,Arsenio Garzoni,matteojovinelli@example.org,it,2023-04-14 00:47:47.066963 +4076,Sabas Chacón Sureda,tsalinas@example.org,es,2023-06-09 12:37:42.536521 +7650,Elma Gieß-Staude,lucie28@example.com,de,2023-11-10 17:31:50.053576 +9503,Juan Antonio Sanabria Méndez,nildaserna@example.com,es,2023-03-16 02:40:52.596846 +5996,Joseph Norris,simonmegan@example.net,en,2023-09-30 21:35:28.686320 +3998,Gabriela Serra Porras,almeidaalondra@example.com,es,2023-08-30 15:51:21.751123 +4884,Encarnita Casanovas Guillen,leticia87@example.net,es,2023-06-03 11:22:08.328512 +5752,Nikolaus Blümel-Dobes,ignatzklemt@example.com,de,2023-06-03 04:50:14.594056 +3966,Luc Lemoine,humbertsusan@example.org,fr,2023-07-05 19:00:17.377279 +3513,Gerhart Oestrovsky,pergandearmin@example.com,de,2023-02-23 01:43:05.262126 +502,Antonietta Vollbrecht B.Eng.,gesagute@example.net,de,2023-03-08 19:31:22.663747 +9141,Nathan Schmidt,terry81@example.org,en,2023-06-22 14:03:50.851764 +3331,Dr. Jochem Flantz B.A.,adlerharro@example.org,de,2023-03-21 20:04:09.056015 +7330,Élodie Roux Le Weiss,gaillardadrien@example.org,fr,2023-04-19 14:27:36.660823 +5660,Irma Berengario,albericogiunti@example.net,it,2023-07-25 01:27:59.646990 +5420,Patricia Dias,lagardesabine@example.org,fr,2023-03-06 15:28:21.621831 +8881,Ing. Johanne Henschel B.A.,faustjose@example.net,de,2023-03-28 02:01:22.805135 +8995,Olivier de la Durand,yveslegros@example.com,fr,2023-08-14 10:42:19.666189 +157,Rolando Bembo,leonardoromiti@example.com,it,2023-09-09 12:54:00.615598 +8386,Severino Naccari,vanichini@example.com,it,2023-07-01 02:44:57.391064 diff --git a/python-serialize/tabular-data/parquet-demo/users.parquet b/python-serialize/tabular-data/parquet-demo/users.parquet new file mode 100644 index 0000000000..315f75c389 Binary files /dev/null and b/python-serialize/tabular-data/parquet-demo/users.parquet differ