doc: Add more examples (#795)
Some checks failed
CI / Check for spelling errors (push) Has been cancelled
CI / Go Linter (push) Has been cancelled
CI / Test (push) Has been cancelled

This commit is contained in:
Yota Hamada 2025-01-17 07:07:33 -08:00 committed by GitHub
parent d06ff213fa
commit d642ea7a9b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 579 additions and 85 deletions

547
README.md
View File

@ -28,6 +28,51 @@
Dagu is a powerful Cron alternative that comes with a Web UI. It allows you to define dependencies between commands in a declarative YAML Format. Additionally, Dagu natively supports running Docker containers, making HTTP requests, and executing commands over SSH. Dagu was designed to be easy to use, self-contained, and require no coding, making it ideal for small projects.
<h2><b>Table of Contents</b></h2>
- [Why Dagu?](#why-dagu)
- [Core Features](#core-features)
- [Common Use Cases](#common-use-cases)
- [Community](#community)
- [Installation](#installation)
- [Via Bash script](#via-bash-script)
- [Via GitHub Releases Page](#via-github-releases-page)
- [Via Homebrew (macOS)](#via-homebrew-macos)
- [Via Docker](#via-docker)
- [Quick Start Guide](#quick-start-guide)
- [1. Launch the Web UI](#1-launch-the-web-ui)
- [2. Create a New DAG](#2-create-a-new-dag)
- [3. Edit the DAG](#3-edit-the-dag)
- [4. Execute the DAG](#4-execute-the-dag)
- [Usage / Command Line Interface](#usage--command-line-interface)
- [Example DAG](#example-dag)
- [Minimal examples](#minimal-examples)
- [Named Parameters](#named-parameters)
- [Positional Parameters](#positional-parameters)
- [Conditional DAG](#conditional-dag)
- [Script Execution](#script-execution)
- [Variable Passing](#variable-passing)
- [Scheduling](#scheduling)
- [Calling a sub-DAG](#calling-a-sub-dag)
- [Running a docker image](#running-a-docker-image)
- [Environment Variables](#environment-variables)
- [Notifications on Failure or Success](#notifications-on-failure-or-success)
- [HTTP Request and Notifications](#http-request-and-notifications)
- [Execute commands over SSH](#execute-commands-over-ssh)
- [Advanced Preconditions](#advanced-preconditions)
- [Handling Various Execution Results](#handling-various-execution-results)
- [JSON Processing Examples](#json-processing-examples)
- [Web UI](#web-ui)
- [DAG Details](#dag-details)
- [DAGs](#dags)
- [Search](#search)
- [Execution History](#execution-history)
- [Log Viewer](#log-viewer)
- [Running as a daemon](#running-as-a-daemon)
- [Contributing](#contributing)
- [Contributors](#contributors)
- [License](#license)
## Why Dagu?
Dagu is a modern workflow engine that combines simplicity with power, designed for developers who need reliable automation without the overhead. Here's what makes Dagu stand out:
@ -42,32 +87,45 @@ Dagu is a modern workflow engine that combines simplicity with power, designed f
- **Cloud Native Ready**: While running perfectly on local environments, Dagu is built to seamlessly integrate with modern cloud infrastructure when you need to scale.
## **Features**
## Core Features
- Environment variables
- Flexible parameter passing
- Conditional logic with regex and shell commands
- Automatic retries
- Parallel steps
- Repeat steps at regular intervals
- Running sub workflows
- Execution timeouts
- Automatic logging
- Lifecycle hooks (on failure, on exit, etc.)
- Email notifications
- Running Docker containers in steps
- JSON handling support
- Controlling remote Dagu nodes from a single UI
- SSH remote commands in steps
- Flexible scheduling with cron expressions
- **Workflow Management**
- Declarative YAML definitions
- Dependency management
- Parallel execution
- Sub-workflows
- Conditional execution with regex
- Timeouts and automatic retries
- **Execution & Integration**
- Native Docker support
- SSH command execution
- HTTP requests
- JSON processing
- Email notifications
- **Operations**
- Web UI for monitoring
- Real-time logs
- Execution history
- Flexible scheduling
- Environment variables
- Automatic logging
## **Community**
## Common Use Cases
- Data Processing
- Scheduled Tasks
- Media Processing
- CI/CD Automation
- ETL Pipelines
- Agentic Workflows
## Community
- Issues: [GitHub Issues](https://github.com/dagu-org/dagu/issues)
- Discussion: [GitHub Discussions](https://github.com/dagu-org/dagu/discussions)
- Chat: [Discord](https://discord.gg/gpahPUjGRk)
## **Installation**
## Installation
Dagu can be installed in multiple ways, such as using Homebrew or downloading a single binary from GitHub releases.
@ -108,7 +166,7 @@ Note: The environment variable `DAGU_TZ` is the timezone for the scheduler and s
See [Environment variables](https://dagu.readthedocs.io/en/latest/config.html#environment-variables) to configure those default directories.
## **Quick Start Guide**
## Quick Start Guide
### 1. Launch the Web UI
@ -142,7 +200,7 @@ steps:
You can execute the example by pressing the `Start` button. You can see "Hello Dagu" in the log page in the Web UI.
## **Usage / Command Line Interface**
## Usage / Command Line Interface
```sh
# Runs the DAG
@ -182,15 +240,16 @@ dagu scheduler [--dags=<path to directory>]
dagu version
```
## **Example DAG**
## Example DAG
### Minimal examples
A DAG with two steps:
A simple example with a named parameter:
```yaml
params:
- NAME: "Dagu"
steps:
- name: Hello world
command: echo Hello $NAME
@ -217,20 +276,63 @@ steps:
shell: bash # The default shell is `$SHELL` or `sh`.
```
You can also define each steps as map instead of list:
### Named Parameters
You can define named parameters in the DAG file and override them when running the DAG.
```yaml
# Default named parameters
params:
NAME: "Dagu"
AGE: 30
steps:
step1:
command: echo "Hello"
step2:
command: echo "Bye"
depends: step1
- name: Hello world
command: echo Hello $NAME
- name: Done
command: echo Done!
depends: Hello world
```
Run the DAG with custom parameters:
```sh
dagu start my_dag -- NAME=John AGE=40
```
### Positional Parameters
You can define positional parameters in the DAG file and override them when running the DAG.
```yaml
# Default positional parameters
params: input.csv output.csv 60 # Default values for $1, $2, and $3
steps:
# Using positional parameters
- name: data processing
command: python
script: |
import sys
import pandas as pd
input_file = "$1" # First parameter
output_file = "$2" # Second parameter
timeout = "$3" # Third parameter
print(f"Processing {input_file} -> {output_file} with timeout {timeout}s")
# Add your processing logic here
```
Run the DAG with custom parameters:
```sh
dagu start my_dag -- input.csv output.csv 120
```
### Conditional DAG
You can add conditional logic to a DAG:
You can define conditions to run a step based on the output of a command.
```yaml
steps:
@ -241,12 +343,78 @@ steps:
expected: "re:0[1-9]" # Run only if the day is between 01 and 09
```
### Script Execution
You can run a script using the `script` field.
```yaml
steps:
# Python script example
- name: data analysis
command: python
script: |
import json
import sys
data = {'count': 100, 'status': 'ok'}
print(json.dumps(data))
sys.stderr.write('Processing complete\n')
output: RESULT
stdout: /tmp/analysis.log
stderr: /tmp/analysis.error
# Shell script with multiple commands
- name: cleanup
command: bash
script: |
#!/bin/bash
echo "Starting cleanup..."
# Remove old files
find /tmp -name "*.tmp" -mtime +7 -exec rm {} \;
# Archive logs
cd /var/log
tar -czf archive.tar.gz *.log
echo "Cleanup complete"
depends: data analysis
```
### Variable Passing
You can pass the output of one step to another step using the `output` field.
```yaml
steps:
# Basic output capture
- name: generate id
command: echo "ABC123"
output: REQUEST_ID
- name: use id
command: echo "Processing request ${REQUEST_ID}"
depends: generate id
# Capture JSON output
steps:
- name: get config
command: |
echo '{"port": 8080, "host": "localhost"}'
output: CONFIG
- name: start server
command: echo "Starting server at ${CONFIG.host}:${CONFIG.port}"
depends: get config
```
### Scheduling
You can specify the schedule with cron expression:
You can specify flexible schedules using the cron format.
```yaml
schedule: "5 4 * * *" # Run at 04:05.
steps:
- name: scheduled job
command: job.sh
@ -258,6 +426,7 @@ Or you can set multiple schedules.
schedule:
- "30 7 * * *" # Run at 7:30
- "0 20 * * *" # Also run at 20:00
steps:
- name: scheduled job
command: job.sh
@ -276,7 +445,7 @@ steps:
### Calling a sub-DAG
You can call a sub-DAG from a parent DAG:
You can call another DAG from a parent DAG.
```yaml
steps:
@ -315,9 +484,315 @@ steps:
command: echo "hello"
```
### Environment Variables
You can define environment variables and use them in the DAG.
```yaml
env:
- DATA_DIR: ${HOME}/data
- PROCESS_DATE: "`date '+%Y-%m-%d'`"
steps:
- name: process logs
command: python process.py
dir: ${DATA_DIR}
preconditions:
- "test -f ${DATA_DIR}/logs_${PROCESS_DATE}.txt" # Check if the file exists
```
### Notifications on Failure or Success
You can send notifications on failure in various ways.
```yaml
env:
- SLACK_WEBHOOK_URL: "https://hooks.slack.com/services/XXXXX/YYYYY/ZZZZZ"
dotenv:
- .env
smtp:
host: $SMTP_HOST
port: "587"
username: $SMTP_USERNAME
password: $SMTP_PASSWORD
handlerOn:
failure:
command: |
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"DAG Failed ($DAG_NAME")}' \
${SLACK_WEBHOOK_URL}
steps:
- name: critical process
command: important_job.sh
retryPolicy:
limit: 3
intervalSec: 60
mailOn:
failure: true # Send an email on failure
```
If you want to set it globally, you can create `~/.config/dagu/base.yaml` and define the common configurations across all DAGs.
```yaml
smtp:
host: $SMTP_HOST
port: "587"
username: $SMTP_USERNAME
password: $SMTP_PASSWORD
mailOn:
failure: true
success: true
```
You can also use mail executor to send notifications.
```yaml
params:
- RECIPIENT_NAME: XXX
- RECIPIENT_EMAIL: example@company.com
- MESSAGE: "Hello [RECIPIENT_NAME]"
steps:
- name: step1
executor:
type: mail
config:
to: $RECIPIENT_EMAIL
from: dagu@dagu.com
subject: "Hello [RECIPIENT_NAME]"
message: $MESSAGE
```
### HTTP Request and Notifications
You can make HTTP requests and send notifications.
```yaml
dotenv:
- .env
smtp:
host: $SMTP_HOST
port: "587"
username: $SMTP_USERNAME
password: $SMTP_PASSWORD
steps:
- name: fetch data
executor:
type: http
config:
timeout: 10
command: GET https://api.example.com/data
output: API_RESPONSE
- name: send notification
executor:
type: mail
config:
to: team@company.com
from: team@company.com
subject: "Data Processing Complete"
message: |
Process completed successfully.
Response: ${API_RESPONSE}
depends: fetch data
```
### Execute commands over SSH
You can execute commands over SSH.
```yaml
steps:
- name: backup
executor:
type: ssh
config:
user: admin
ip: 192.168.1.100
key: ~/.ssh/id_rsa
command: tar -czf /backup/data.tar.gz /data
```
### Advanced Preconditions
You can define complex conditions to run a step based on the output of a command.
```yaml
steps:
# Check multiple conditions
- name: daily task
command: process_data.sh
preconditions:
# Run only on weekdays
- condition: "`date '+%u'`"
expected: "re:[1-5]"
# Run only if disk space > 20%
- condition: "`df -h / | awk 'NR==2 {print $5}' | sed 's/%//'`"
expected: "re:^[0-7][0-9]$|^[1-9]$" # 0-79% used (meaning at least 20% free)
# Check if input file exists
- condition: "test -f input.csv"
# Complex file check
- name: process files
command: batch_process.sh
preconditions:
- condition: "`find data/ -name '*.csv' | wc -l`"
expected: "re:[1-9][0-9]*" # At least one CSV file exists
```
### Handling Various Execution Results
You can use `continueOn` to control when to fail or continue based on the exit code, output, or other conditions.
```yaml
steps:
# Basic error handling
- name: process data
command: python process.py
continueOn:
failure: true # Continue on any failure
skipped: true # Continue if preconditions aren't met
# Handle specific exit codes
- name: data validation
command: validate.sh
continueOn:
exitCode: [1, 2, 3] # 1:No data, 2:Partial data, 3:Invalid format
markSuccess: true # Mark as success even with these codes
# Output pattern matching
- name: api request
command: curl -s https://api.example.com/data
continueOn:
output:
- "no records found" # Exact match
- "re:^Error: [45][0-9]" # Regex match for HTTP errors
- "rate limit exceeded" # Another exact match
# Complex pattern
- name: database backup
command: pg_dump database > backup.sql
continueOn:
exitCode: [0, 1] # Accept specific exit codes
output: # Accept specific outputs
- "re:0 rows affected"
- "already exists"
failure: false # Don't continue on other failures
markSuccess: true # Mark as success if conditions match
# Multiple conditions combined
- name: data sync
command: sync_data.sh
continueOn:
exitCode: [1] # Exit code 1 is acceptable
output: # These outputs are acceptable
- "no changes detected"
- "re:synchronized [0-9]+ files"
skipped: true # OK if skipped due to preconditions
markSuccess: true # Mark as success in these cases
# Error output handling
- name: log processing
command: process_logs.sh
stderr: /tmp/process.err
continueOn:
output:
- "re:WARNING:.*" # Continue on warnings
- "no logs found" # Continue if no logs
exitCode: [0, 1, 2] # Multiple acceptable exit codes
failure: true # Continue on other failures too
# Application-specific status
- name: app health check
command: check_status.sh
continueOn:
output:
- "re:STATUS:(DEGRADED|MAINTENANCE)" # Accept specific statuses
- "re:PERF:[0-9]{2,3}ms" # Accept performance in range
markSuccess: true # Mark these as success
```
### JSON Processing Examples
You can use `jq` executor to process JSON data.
```yaml
# Simple data extraction
steps:
- name: extract value
executor: jq
command: .user.name # Get user name from JSON
script: |
{
"user": {
"name": "John",
"age": 30
}
}
# Output: "John"
# Transform array data
steps:
- name: get users
executor: jq
command: '.users[] | {name: .name}' # Extract name from each user
script: |
{
"users": [
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30}
]
}
# Output:
# {"name": "Alice"}
# {"name": "Bob"}
# Calculate and format
steps:
- name: sum ages
executor: jq
command: '{total_age: ([.users[].age] | add)}' # Sum all ages
script: |
{
"users": [
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30}
]
}
# Output: {"total_age": 55}
# Filter and count
steps:
- name: count active
executor: jq
command: '[.users[] | select(.active == true)] | length'
script: |
{
"users": [
{"name": "Alice", "active": true},
{"name": "Bob", "active": false},
{"name": "Charlie", "active": true}
]
}
# Output: 2
```
More examples can be found in the [documentation](https://dagu.readthedocs.io/en/latest/yaml_format.html).
## **Web UI**
## Web UI
### DAG Details
@ -351,7 +826,7 @@ Examine detailed step-level logs and outputs.
![DAG Log](assets/images/ui-logoutput.webp?raw=true)
## **Running as a daemon**
## Running as a daemon
The easiest way to make sure the process is always running on your system is to create the script below and execute it every minute using cron (you don't need `root` account in this way):
@ -374,12 +849,12 @@ exit
We welcome new contributors! Check out our [Contribution Guide](https://dagu.readthedocs.io/en/latest/contrib.html) for guidelines on how to get started.
## **Contributors**
## Contributors
<a href="https://github.com/dagu-org/dagu/graphs/contributors">
<img src="https://contrib.rocks/image?repo=dagu-org/dagu" />
</a>
## **License**
## License
Dagu is released under the [GNU GPLv3](./LICENSE.md).

View File

@ -187,29 +187,20 @@ Example:
username: "<username>"
password: "<password>"
params: RECIPIENT=XXX
params:
- RECIPIENT_NAME: XXX
- RECIPIENT_EMAIL: example@company.com
- MESSAGE: "Hello [RECIPIENT_NAME]"
steps:
- name: step1
executor:
type: mail
config:
to: <to address>
from: <from address>
subject: "Exciting New Features Now Available"
message: |
Hello [RECIPIENT],
We hope you're enjoying your experience with MyApp!
We're thrilled to announce that [] v2.0 is now available,
and we've added some fantastic new features based on your
valuable feedback.
Thank you for choosing MyApp and for your continued support.
We look forward to hearing from you and providing you with
an even better MyApp experience.
Best regards,
to: $RECIPIENT_EMAIL
from: dagu@dagu.com
subject: "Hello [RECIPIENT_NAME]"
message: $MESSAGE
.. _command-execution-over-ssh:

View File

@ -23,8 +23,8 @@ A powerful, self-contained Cron alternative with a clean Web UI and a `declarati
Quick Start
------------
:doc:`motivation`
The motivation behind Dagu.
:doc:`intro`
Introduction to Dagu.
:doc:`installation`
How to install Dagu.
@ -48,14 +48,12 @@ Quick Start
:ref:`schema-reference`
Schema reference.
:ref:`changelog`
History of changes.
.. toctree::
:caption: Motivation
:caption: About
:hidden:
motivation
intro
changelog
.. toctree::
:caption: Installation
@ -107,9 +105,3 @@ Quick Start
faq
contrib
.. toctree::
:caption: Changelog
:hidden:
changelog

35
docs/source/intro.rst Normal file
View File

@ -0,0 +1,35 @@
.. _INTRO:
Introduction
=============
Background
----------
In many organizations, legacy systems still rely on hundreds of cron jobs running across multiple servers. These jobs are often written in various languages like Perl or Shell scripts, with implicit interdependencies. When one job fails, troubleshooting requires manually logging into servers via SSH and checking individual logs. To perform recovery, one must understand these implicit dependencies, which often rely on tribal knowledge. Dagu was developed to eliminate this complexity by providing a clear and understandable tool for workflow definition and dependency management.
Vision & Mission
----------------
Dagu sparks an exciting future where workflow engines drive software operations for everyone. Its free from language dependencies, runs locally with minimal overhead, and strips away unnecessary complexity to make robust workflows accessible to all.
Core Principles
----------------
Dagu was born out of a desire to make workflow orchestration feasible in any environment—from long-standing legacy systems to IoT system and AI-driven pipelines. While Cron lacks clarity on dependencies and observability, and other tools such as Airflow, Prefect, or Temporal require you to write code in Python or Go, Dagu offers a simple, language-agnostic alternative. Its designed with a focus on the following principles:
1. **Local friendly**
Define and execute workflows in a single, self-contained environment without internet connection, making it simple to install and scale across diverse environments—from IoT devices to on-premise servers.
2. **Minimal Configuration**
Start with just a single binary and YAML file, using local file storage for DAG definitions, logs, and metadata without requiring external databases.
3. **Language Agnostic**
Any runtime—Python, Bash, Docker containers, or custom executors—works without extra layers of complexity.
4. **Keep it Simple**
Workflows are defined in clear, human-readable YAML with ease of development and fast onboarding. Dagu is designed to be simple to understand and use, even for non-developers.
5. **Community-Driven**
As an open-source project, developers can contribute new executors, integrate emerging technologies, or propose enhancements via GitHub. This encourages rapid iteration and keeps Dagu aligned with real-world use cases.

View File

@ -1,19 +0,0 @@
.. _MOTIVATION:
Motivation
==========
Why we built Dagu
------------------
In many organizations, legacy systems still rely on hundreds of cron jobs running across multiple servers. These jobs are often written in various languages like Perl or Shell scripts, with implicit interdependencies. When one job fails, troubleshooting requires manually logging into servers via SSH and checking individual logs. To perform recovery, one must understand these implicit dependencies, which often rely on tribal knowledge. Dagu was developed to eliminate this complexity by providing a clear and understandable tool for workflow definition and dependency management.
A Lightweight and Self-Contained Solution
------------------------------------------
While Cron is lightweight and suitable for simple scheduling, it doesn't scale well for complex workflows or provide features like retries, dependencies, or observability out of the box. On the other hand, tools like Airflow or other workflow engines can be overly complex for smaller projects or legacy environments, with steep learning curves and burdensome to maintain. Dagu strikes a balance: it's easy to use, self-contained, and require no coding, making it ideal for smaller projects.
Built By and For In-House Developers
-------------------------------------
Dagu's design philosophy stems from the real-world experience in managing complex jobs across diverse environments, from small startups to enterprise companies. By focusing on simplicity, transparency, and minimal setup overhead, Dagu aims to make life easier for developers who need a robust workflow engine without the heavy lift of a more complex tool.

View File

@ -229,6 +229,20 @@ These fields apply to the entire DAG. They appear at the root of the YAML file.
~~~~~~~~
A list of steps (tasks) to execute. Steps define your workflow logic and can depend on each other. See :ref:`Step Fields <step-fields>` below for details.
``smtp``
~~~~~~~~
SMTP server configuration for sending email notifications. This is necessary if you use the ``mail`` executor or ``mailOn`` field.
**Example**:
.. code-block:: yaml
smtp:
host: $SMTP_HOST
port: "587"
username: $SMTP_USER
password: $SMTP_PASS
------------
.. _step-fields:

View File

@ -580,6 +580,7 @@ Complete list of DAG-level configuration options:
- ``MaxCleanUpTimeSec``: Cleanup timeout
- ``handlerOn``: Lifecycle event handlers
- ``steps``: List of steps to execute
- ``smtp``: SMTP settings
Example DAG configuration:
@ -617,6 +618,11 @@ Example DAG configuration:
command: echo "canceled"
exit:
command: echo "finished"
smtp:
host: "smtp.foo.bar"
port: "587"
username: "<username>"
password: "<password>"
Step Fields
~~~~~~~~~