admin管理员组文章数量:1794759
airflow 运行周期设置 schedule
airflow 运行周期问题
最近开始正式使用airflow,关于 schedule_interval 和页面上显示的 last run一直有些不太清楚的地方,而在设置一个每周运行的任务时终于遇到了问题,任务并没有能够如期运行。
一系列google之后发现 airflow的 schedule_interval虽然可以使用cron表达式,但是还是和crontab有一些区别的。
关于 backfillbackfill命令是用来回填数据的,也就是说以之前的日期运行任务。
当任务是每天运行时只需要加上开始日期就可以了,例如
airflow backfill CKD_ALL_REPORT -s 2018-09-04但是当任务时多天运行一次时这样就不起作用了,会提示
No run dates were found for the given dates and dag interval.这是因为 airflow有一个窗口的概念 Airflow sets execution_date based on the left bound of the schedule period it is covering, not based on when it fires (which would be the right bound of the period) stackoverflow上搜到比较合理的解释,意思就是说,airflow会在start_date开始后,符合schedule_interval定义的第一个时间点记为execution_date,但是会在下个时间点到达是才开始运行,也就是说由于这个窗口的原因,last run会滞后一个周期。 所以如何通过jinja来查看execution_date就会发现问题
Jinja模板{{ ds }} | the execution date as YYYY-MM-DD |
{{ ds_nodash }} | the execution date as YYYYMMDD |
{{ yesterday_ds }} | yesterday’s date as YYYY-MM-DD |
{{ yesterday_ds_nodash }} | yesterday’s date as YYYYMMDD |
{{ tomorrow_ds }} | tomorrow’s date as YYYY-MM-DD |
{{ tomorrow_ds_nodash }} | tomorrow’s date as YYYYMMDD |
{{ ts }} | same as execution_date.isoformat() |
{{ ts_nodash }} | same as ts without - and : |
{{ execution_date }} | the execution_date, (datetime.datetime) |
{{ prev_execution_date }} | the previous execution date (if available)(datetime.datetime) |
{{ next_execution_date }} | the next execution date (datetime.datetime) |
{{ dag }} | the DAG object |
{{ task }} | the Task object |
{{ macros }} | a reference to the macros package, described below |
{{ task_instance }} | the task_instance object |
{{ end_date }} | same as {{ ds }} |
{{ latest_date }} | same as {{ ds }} |
{{ ti }} | same as {{ task_instance }} |
{{ params }} | a reference to the user-defined params dictionary |
{{ var.value.my_var }} | global defined variables represented as a dictionary |
{{ var.json.my_var.path }} | global defined variables represented as a dictionary with deserialized JSON object, append the path to the key within the JSON object |
{{ task_instance_key_str }} | a unique, human-readable key to the task instance formatted {dag_id}{task_id}{ds} |
conf | the full configuration object located at airflow.configuration.conf which represents |
run_id | the run_id of the current DAG run |
dag_run | a reference to the DagRun object |
test_mode | whether the task instance was called using the CLI’s test subcommand |
参考资料: stackoverflow/questions/39612488/airflow-trigger-dag-execution-date-is-the-next-day-why/39620901#39620901
版权声明:本文标题:airflow 运行周期设置 schedule 内容由林淑君副主任自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.xiehuijuan.com/baike/1686477609a71915.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论