-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feat: Airflow uses kubernetes_executor as default executor, git sync …
…dags and add tutorials (#109) Signed-off-by: xingcan-ltc <[email protected]>
- Loading branch information
1 parent
83a893e
commit 48351d1
Showing
12 changed files
with
176 additions
and
102 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,74 +1,49 @@ | ||
### 1. Application Description | ||
|
||
Airflow™ is a platform created by the community to programmatically author, schedule and monitor workflows. | ||
Airflow™ is a platform for programmatically authoring, scheduling, and monitoring workflows. | ||
|
||
### 2. Quick Start | ||
|
||
#### 2.1 Deploy | ||
#### 2.1 Deployment | ||
|
||
Airflow is deployed by using KDP web. | ||
Airflow is deployed via the KDP web interface. | ||
|
||
#### 2.2. Practice | ||
#### 2.2 Practical Usage | ||
|
||
##### 2.2.1 Create DAG file | ||
##### 2.2.1 Creating DAG Files | ||
|
||
Here is a simple DAG file, copy the following content to the demo.py file. | ||
##### 2.2.3 Accessing Airflow Web via Browser | ||
|
||
```python | ||
from datetime import datetime | ||
|
||
from airflow import DAG | ||
from airflow.decorators import task | ||
from airflow.operators.bash import BashOperator | ||
|
||
with DAG(dag_id="demo", start_date=datetime(2024, 5, 20), schedule="0 0 * * *") as dag: | ||
|
||
hello = BashOperator(task_id="hello", bash_command="echo hello") | ||
|
||
@task() | ||
def airflow(): | ||
print("airflow") | ||
|
||
hello >> airflow() | ||
``` | ||
|
||
##### 2.2.2 Copy DAG file to airflow scheduler and worker pod | ||
You can access the Airflow web interface through the configured ingress (http://airflow-web-kdp-data.kdp-e2e.io/home) or by using kubectl port-forward, as shown in the following command: | ||
|
||
```shell | ||
kubectl cp demo.py airflow-scheduler-7cf5ddb9c5-fql6x:/opt/airflow/dags -n kdp-data | ||
kubectl cp demo.py airflow-worker-584b97f6cb-8gxq8:/opt/airflow/dags -n kdp-data | ||
kubectl port-forward svc/airflow-webserver -n kdp-data 8080:8080 | ||
``` | ||
|
||
Note: When using this example, please modify `kdp-data` to your namespace, modify `airflow-scheduler-7cf5ddb9c5-fql6x` to your scheduler pod name, modify `airflow-worker-584b97f6cb-8gxq8` to your worker pod name. | ||
|
||
##### 2.2.3 Visit airflow web | ||
The default login username/password is admin/admin. | ||
|
||
Visit airflow web by ingress address, or using kubectl port-forward, as follows: | ||
|
||
```shell | ||
kubectl port-forward svc/airflow-webserver -n kdp-data 8080:8080 | ||
``` | ||
### 2.2.4 Configuring DAG Files | ||
|
||
Default login user/password is `admin/admin` | ||
DAG files are stored in a Git repository. The default installation configuration places the DAG files in a Git repository, which you can modify to change the DAG files. Alternatively, you can fork the repository, modify the DAG files, and then commit them to the Git repository. You can also install and configure the DAG repository, branch, etc., on the KDP page and then update it. | ||
|
||
#### 2.2.4 View DAG and task execution | ||
### 2.2.4 Running DAGs | ||
|
||
The current configured scheduler scanning frequency is 1 minute, and the demo DAG can be seen on the web page. Click the 'Trigger DAG' button on the right to trigger DAG execution. | ||
DAGs are set to a paused state by default and need to be manually started. Manually activate the DAG named `hello_airflow` by clicking the switch next to its name. This DAG runs once a day and will automatically catch up on yesterday's tasks after activation. You can also manually trigger it by clicking the `Trigger DAG` button on the right side of the `hello_airflow` DAG. | ||
|
||
### 3. FAQ | ||
|
||
#### 3.1. DAG execution failed | ||
#### 3.1. DAG Execution Failure | ||
|
||
Reasons and results: | ||
Causes and Troubleshooting: | ||
|
||
1. Check whether the demo.py file exists in the `/opt/airflow/dags` directory of the scheduler and worker pod; | ||
2. View the output information of scheduler and worker pod logs. | ||
- Check if the DAG code synchronization is successful by checking the logs: `kubectl logs -l component=scheduler,release=airflow -c git-sync -n kdp-data` | ||
- Review the log output information for the scheduler and worker pods. | ||
|
||
### 4. Appendix | ||
|
||
#### 4.1. Concept | ||
#### 4.1. Concept Introduction | ||
|
||
**DAG** | ||
|
||
A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together, organized with dependencies and relationships to say how they should run. | ||
|
||
Directed Acyclic Graph, which is the basic unit for describing workflows in Airflow. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
51 changes: 51 additions & 0 deletions
51
docs/en/user-tutorials/batch-job-scheduling-for-hive-sql-with-apache-airflow.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
|
||
# Batch Job Scheduling for Hive SQL with Apache Airflow | ||
|
||
# 1. Introduction | ||
Apache Airflow is an open-source platform for orchestrating and automating batch jobs, allowing for the easy creation, scheduling, monitoring, and management of workflows. Airflow supports Hive SQL, enabling the effortless execution of Hive SQL tasks. | ||
|
||
Apache Airflow utilizes Directed Acyclic Graphs (DAGs) to represent workflows, which consist of task nodes and dependencies. Task nodes can be Python operations, Shell operations, SQL operations, and more. Airflow supports various schedulers, including the Local scheduler, Celery scheduler, Kubernetes scheduler, and others. | ||
|
||
This article introduces how to write Hive SQL tasks using `pyhive` and execute them with the Apache Airflow Kubernetes scheduler. | ||
|
||
# 2. Writing a Hive SQL DAG | ||
|
||
The specific code implementation can be accessed on [Github](https://github.com/linktimecloud/example-datasets/blob/airflow/dags/hive-sql-example.py) or [Gitee](https://gitee.com/linktime-cloud/example-datasets/blob/airflow/dags/hive-sql-example.py). | ||
|
||
This code is a DAG (Directed Acyclic Graph) written using the Apache Airflow framework, designed for automating data processing tasks. It primarily performs two tasks: creating a Hive table and inserting data, followed by identifying the top-scoring students in each subject. | ||
|
||
# 3. Running the DAG | ||
## 3.1 Component Dependencies | ||
The following components need to be installed in KDP: | ||
- mysql | ||
- airflow | ||
- zookeeper | ||
- hdfs | ||
- hive (hive metastore, hive server) | ||
- hue, httpfs-gateway (optional) | ||
|
||
## 3.2 Scheduling Jobs | ||
After installing Airflow with default parameters in KDP, log in to the Airflow Web interface using the username `admin` and password `admin`. | ||
|
||
Start the DAG named `hive-sql-example`. | ||
|
||
![Airflow Web Interface](./images/airflow01.png) | ||
|
||
Upon successful execution, the results can be viewed through the Hue Web interface. Alternatively, you can refer to the `hive-server2` Quick Start guide to connect to Hive Server2 using beeline and view the results. | ||
|
||
![Hue Web Interface](./images/airflow02.png) | ||
|
||
|
||
运行成功后,可以通过Hue Web界面查看结果。也可以参考 `hive-server2` Qucick Start 使用beeline 连接 Hive Server2 查看结果。 | ||
|
||
![](./images/airflow02.png) | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.