Articles published by Feng Xiaoxian

Airflow can define a set of dependent tasks through the DAG configuration file and execute them in order of dependency.

Python based workflow management platform, with webUI, command line and scheduling.

Use python to write subtasks. There are various operators that can be used directly out of the box. See the supported operators

System dependency

Install and update some dependencies

 sudo apt update && sudo apt upgrade sudo apt install build-essential unzip bzip2 zlib1g-dev pkg-config libncurses5-dev libffi-dev libreadline-dev libbz2-dev libsqlite3-dev libssl-dev liblzma-dev libmysqlclient-dev libkrb5-dev unixodbc

Installing pyenv

 curl  https://pyenv.run  | bash pyenv install 3.12.3 pyenv versions pyenv global 3.12.3 python -V pyenv virtualenv 3.12.3 airflow curl  https://bootstrap.pypa.io/get-pip.py  -o get-pip.py python get-pip.py

Runtime dependencies

requirements.txt

 apache-airflow==2.9.0 airflow-clickhouse-plugin==1.4.0 apache-airflow-providers-apache-flink==1.5.1 apache-airflow-providers-apache-hdfs==4.6.0 apache-airflow-providers-apache-hive==8.2.1 apache-airflow-providers-apache-kafka==1.6.1 apache-airflow-providers-apache-spark==4.11.3 apache-airflow-providers-elasticsearch==5.5.3 apache-airflow-providers-grpc==3.6.0 apache-airflow-providers-redis==3.8.0 apache-airflow-providers-postgres==5.14.0 apache-airflow-providers-influxdb==2.7.1 apache-airflow-providers-jdbc==4.5.3 apache-airflow-providers-microsoft-azure==11.1.0 apache-airflow-providers-mysql==5.7.4 apache-airflow-providers-mongo==4.2.2 apache-airflow-providers-neo4j==3.7.0 apache-airflow-providers-odbc==4.8.1 apache-airflow-providers-trino==5.9.0 apache-airflow-providers-ssh==3.14.0 apache-airflow-providers-amazon==9.1.0 apache-airflow-providers-cncf-kubernetes==10.0.1 apache-airflow-providers-http==4.13.3

Installation dependency

 pip install -r requirements.txt #Indicators pip install apache-airflow[statsd] pip install apache-airflow[celery]

function

 export AIRFLOW_HOME=~/airflow #The database needs to be initialized for the first time airflow db init #- D background operation airflow webserver --port 9988 #Create administrator user airflow users create -e  admin@abc.com  -f my -l admin -r Admin -u admin -p Pa55w0rd #Administrator account password after creation # admin # Pa55w0rd #- D background operation airflow scheduler -D #- D background operation airflow triggerer -D #You need to set the standalone_dag_processor item to True in the airflow.cfg configuration file airflow dag-processor -D #- D background operation airflow celery flower -D #- D background operation airflow celery worker -D

High availability

Execute all the above operations on the new machine once, and the last command to run only needs to execute the following command

 #- D background operation airflow celery worker -D
Due to cross machine task scheduling and execution, the dag file needs to be fully synchronized to all workers

maintain

 #Start Service /home/ubuntu/airflow/run.sh start #Stop service /home/ubuntu/airflow/run.sh stop

Start/stop script run.sh

 #!/bin/bash set -eux source /home/ubuntu/.pyenv/versions/airflow/bin/activate case $1 in "start") { Echo "-------- Start airflow --------" airflow webserver --port 9988 -D airflow scheduler -D airflow triggerer -D airflow dag-processor -D airflow celery worker -D airflow celery flower -D };; "stop") { Echo "-------- Close airflow --------" airflow celery stop celery_flower_pid=$(ps -ef | egrep 'airflow celery flower' | grep -v grep | awk '{print $2}') if [[ $celery_flower_pid != "" ]];  then ps -ef | egrep 'airflow celery flower' | grep -v grep | awk '{print $2}' | xargs kill -15 airflow_flower_pid_file="/home/ubuntu/airflow/airflow-flower.pid" if [ -f $airflow_flower_pid_file ];  then rm $airflow_flower_pid_file fi fi airflow_scheduler_pid=$(ps -ef | egrep 'airflow scheduler' | grep -v grep | awk '{print $2}') if [[ $airflow_scheduler_pid != "" ]];  then ps -ef | egrep 'airflow scheduler' | grep -v grep | awk '{print $2}' | xargs kill -15 airflow_scheduler_pid_file="/home/ubuntu/airflow/airflow-scheduler.pid" if [ -f $airflow_scheduler_pid_file ];  then rm $airflow_scheduler_pid_file fi fi airflow_triggerer_pid=$(ps -ef | egrep 'airflow triggerer' | grep -v grep | awk '{print $2}') if [[ $airflow_triggerer_pid != "" ]];  then ps -ef | egrep 'airflow triggerer' | grep -v grep | awk '{print $2}' | xargs kill -15 airflow_triggerer_pid_file="/home/ubuntu/airflow/airflow-triggerer.pid" if [ -f $airflow_triggerer_pid_file ];  then rm $airflow_triggerer_pid_file fi fi airflow_master_pid=$(ps -ef | egrep 'gunicorn: master' | grep -v grep | awk '{print $2}') if [[ $airflow_master_pid != "" ]];  then ps -ef | egrep 'gunicorn: master' | grep -v grep | awk '{print $2}' | xargs kill -15 airflow_webserver_pid_file="/home/ubuntu/airflow/airflow-webserver.pid" if [ -f $airflow_webserver_pid_file ];  then rm $airflow_webserver_pid_file fi airflow_webserver_monitor_pid_file="/home/ubuntu/airflow/airflow-webserver-monitor.pid" if [ -f $airflow_webserver_monitor_pid_file ];  then rm $airflow_webserver_monitor_pid_file fi airflow_master_pid_file="/home/ubuntu/airflow/airflow-master.pid" if [ -f $airflow_master_pid_file ];  then rm $airflow_master_pid_file fi airflow_worker_pid_file="/home/ubuntu/airflow/airflow-worker.pid" if [ -f $airflow_worker_pid_file ];  then rm $airflow_worker_pid_file fi fi # ps -ef | egrep 'airflow scheduler' | grep -v grep | awk '{print $2}' | xargs kill -15 };; esac

Health Check Script health.sh

Installation is required first antonmedv/fx It is used to process json, and can also be replaced by itself
 #!/bin/bash print() { echo -e "$(date) $1" } Print "Start checking airflow health status" source ~/.pyenv/versions/airflow/bin/activate echo -e health_resp=$(curl -sL  http://127.0.0.1:9988/health ) echo $health_resp | /usr/local/bin/fx . echo -e Print "Output service status" dag_processor_status=$(echo $health_resp | /usr/local/bin/fx .dag_processor.status) metadatabase_status=$(echo $health_resp | /usr/local/bin/fx .metadatabase.status) scheduler_status=$(echo $health_resp | /usr/local/bin/fx .scheduler.status) trigger_status=$(echo $health_resp | /usr/local/bin/fx .triggerer.status) printf "%20s: %10s\n" "dag_processor_status" $dag_processor_status "metadatabase_status" $metadatabase_status "scheduler_status" $scheduler_status "trigger_status" $trigger_status echo -e if [[ "$scheduler_status" != "healthy" ]]; then Print "Restart the airflow scheduler .." airflow scheduler -D Print "Airflow scheduler started successfully! " fi if [[ "$trigger_status" != "healthy" ]]; then Print "Restart the airflow trigger .." airflow triggerer -D Print "Airflow trigger started successfully! " fi # crontab # 1 * * * * /home/ubuntu/airflow/health.sh

Third party services

https://www.astronomer.io/product/

reference resources

For problems with the ghcr.io container service today, refer to Github document The internal execution is not smooth, and the success occurs

If you use macOS and remote session access (such as ssh), the following errors may occur

 Error saving credentials: error storing credentials - err: exit status 1, out: `error storing credentials - err: exit status 1, out: `User interaction is not allowed.`

Use in the terminal security unlock-keychain The key chain can be released successfully Login Succeeded

Because the old shell platform of hostker was closed, I was forced to back up the website and put it on the desktop, but I suffered from procrastination, which lasted several weeks.

Now the service is resumed, running on the 12 core 32G machine in the cabinet of the rental house (fog)

 41675350121_.pic_hd.jpg

The world is a super big box,
The sky is a transparent glass cover,

The halo is chaotic,
The land is chaotic,

People are fleeing,
Rush out of this airtight junction.

I think it should be free,
It's the same color as the sun.

The Godfather reviewed the 50th anniversary and thought about how wonderful it would be if the passengers could die alone like old Mike.

🕯🕯🕯

The year 2022 is coming, and we are preparing to spend the second Spring Festival outside.

People often ask me whether it's interesting or customary to celebrate the New Year outside?

My answer is often that there is concern in my heart, as usual. So that my mother would always complain about my "heartless guy" when asked 😂 Technology can make you call at any time, and you are so concerned about the existence of objects.

And I know that she is afraid of the unknown and a series of dangers I face.

But when we think of the recent Yale University open class Philosophy Death, why don't people live for a feeling, and losing a feeling is not equal to death?

In 18 years, when my grandfather died, I didn't show too much emotion. That night, I lay on his bed when he left to think whether he would feel free and not be constrained by this body ... so I fell asleep.

Sometimes my mind suddenly flashes the appearance of his conversation with me, and his fragments are another kind of existence in my mind? I believe that the corruption of the body will make him stay at that moment, and we will take his "kindling" and continue to feel the world.

As for the state of "death", since it cannot be relieved at this stage, it is better to "redeem oneself and others".

On the contrary, spend more time with your family.


Recently I met a girl with gentle eyes (my older sister hhh

I made an appointment to see you after the New Year. (It is still necessary to ensure the stability of the epidemic to avoid further trouble