Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Migration Script Draft #19

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
127 changes: 127 additions & 0 deletions python/etlv2/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Editors
.vscode/
.idea/

# Vagrant
.vagrant/

# Mac/OSX
.DS_Store

# Windows
Thumbs.db

# Source for the following rules: https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

*.txt
42 changes: 42 additions & 0 deletions python/etlv2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Run the following command to create a virtual environment (replace myenv with your preferred name):
`python3 -m venv myenv`

## Activate the virtual environment:

#### On Windows: `myenv\Scripts\activate`

#### On macOS/Linux: `source myenv/bin/activate`

## Install requirements

`pip install -r requirements.txt`

## Set up Database Access Credential in .env

Create an .env file in the project directory to configure your database login credentials.

`touch .env`

Edit the .env file and copy the following key-value configs, update the values according to your actual database access credential.

```
SRC_DB_HOST=localhost
SRC_DB_PORT=5432
SRC_DB_NAME=mydatabase
SRC_DB_USER=myuser
SRC_DB_PASSWORD=mypassword

DEST_DB_HOST=localhost
DEST_DB_PORT=5432
DEST_DB_NAME=mydatabase
DEST_DB_USER=myuser
DEST_DB_PASSWORD=mypassword
```

## Activate Environment Variables:

Activate the environment variables by sourcing the .env file in your terminal: `source .env`

## Run Application:

Run the application using: `python migration_job.py`
35 changes: 35 additions & 0 deletions python/etlv2/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#section name job - job configs
[job]
# here job is scheduled to run every 24 hours, set value to preferred frequency
job_interval_seconds=5
# number of iterations to run (0) means to run continuously, unless manually stopped
max_iterations=1
# set batch size for each run, 1000 here means a limit of 1000 records per batch of query
query_batch_size=1000
# list of organization_ids separated by commas
organization_ids=3312 #15,1,13,14,2,3,4,5,6,30,31,32,11

#section name queries
[queries]
#[columns] will be replaced by list of columns specified in columns section below
organization_query=SELECT [columns] FROM entity WHERE entity.id > %s and (type = 'o' or type = 'O') and entity.id in (%s) ORDER BY entity.id LIMIT %s;
planter_query=SELECT [columns] FROM planter, entity WHERE planter.id > %s and entity.id = planter.organization_id and (entity.type = 'o' or entity.type = 'O') and planter.organization_id in (%s) ORDER BY planter.id LIMIT %s;
tree_query=SELECT [columns] FROM trees, planter, entity WHERE trees.id > %s and planter_id = planter.id and entity.id = planter.organization_id and (entity.type = 'o' or entity.type = 'O') and entity.id in (%s) ORDER BY trees.id LIMIT %s;
Comment on lines +15 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel you don't need these settings anymore, right?


#section name columns: columns to migrate
[columns]
organization_columns=entity.id, entity.type, entity.name, entity.first_name, entity.last_name, entity.email, entity.phone,
entity.pwd_reset_required, entity.website, entity.wallet, entity.password, entity.salt, entity.active_contract_id,
entity.offering_pay_to_plant, entity.tree_validation_contract_id, entity.logo_url, entity.map_name,
entity.stakeholder_uuid
planter_columns=planter.id, planter.first_name, planter.last_name, planter.email, planter.organization, planter.phone,
planter.pwd_reset_required, planter.image_url, planter.person_id, planter.organization_id, planter.image_rotation,
planter.gender, planter.grower_account_uuid
tree_columns=trees.id, trees.time_created, trees.time_updated, trees.missing, trees.priority, trees.cause_of_death_id,
trees.planter_id, trees.primary_location_id, trees.settings_id, trees.override_settings_id, trees.dead, trees.photo_id,
trees.image_url, trees.certificate_id, trees.estimated_geometric_location, trees.lat, trees.lon, trees.gps_accuracy,
trees.active, trees.planter_photo_url, trees.planter_identifier, trees.device_id, trees.note, trees.verified,
trees.uuid, trees.approved, trees.status, trees.cluster_regions_assigned, trees.species_id,
trees.planting_organization_id, trees.payment_id, trees.contract_id, trees.token_issued, trees.morphology, trees.age,
trees.species, trees.capture_approval_tag, trees.rejection_reason, trees.matching_hash, trees.device_identifier,
trees.images, trees.domain_specific_data, trees.token_id, trees.name, trees.earnings_id, trees.session_id
Loading