This guide provides step-by-step instructions to set up Apache Airflow on an Ubuntu. It includes system updates, Python installation, creating a virtual environment, installing Airflow, initializing the database, and starting the Airflow services.
- Ubuntu OS (tested on version 24.04).
- Basic familiarity with terminal commands.
# Update the package list and upgrade outdated packages
sudo apt update && sudo apt upgrade -y
# Install Python and venv for managing virtual environments
sudo apt install -y python3 python3-venv
# Verify Python and pip installation
python3 --version
pip3 --version
# Install virtualenv
pip3 install virtualenv
# Create a virtual environment for Airflow
python3 -m venv ~/airflow_env
# Activate the virtual environment
source ~/airflow_env/bin/activate
# Set the AIRFLOW_HOME environment variable
export AIRFLOW_HOME=~/airflow
# Install Apache Airflow using pip
pip install apache-airflow
# Initialize the Airflow database
airflow db init
# Create an admin user for the Airflow Web UI
airflow users create \
--username admin \
--password admin \
--firstname Admin \
--lastname User \
--role Admin \
--email [email protected]
# Start the Airflow Scheduler (open a terminal)
airflow scheduler
# Start the Airflow Webserver (open another terminal)
airflow webserver --port 8080
# Access the Airflow Web UI at http://localhost:8080
# To stop services, use CTRL+C.
# To reactivate the virtual environment after restarting the terminal:
source ~/airflow_env/bin/activate
# Ensure the AIRFLOW_HOME variable is set each time:
export AIRFLOW_HOME=~/airflow
If you don’t want to see the example DAGs provided by Airflow, you can disable them:
-
Open the
airflow.cfg
configuration file. Typically, it is located in the$AIRFLOW_HOME
directory:nano $AIRFLOW_HOME/airflow.cfg
-
Find the following line:
load_examples = True
-
Change it to:
load_examples = False
-
Save the file and exit.
-
Restart the Airflow services to apply the changes:
airflow scheduler airflow webserver