# NAME

scw - scheduled command wrapper with resilience, logging, metrics

# SYNOPSIS

**scw** \[ **\--config** FILE \] \[**\--set** *SETTING*=*VALUE*\]\...
*ACTION* \[*OPTION*\]\...

**scw** **run** \[ **\--force** \] \[ **\--strict** \] *ITEM*

**scw** **enable**\|**disable** *ITEM*

**scw** **status** *ITEM*

**scw** **list** \[**\--enabled**\|**\--disabled**\] \[ **\--info** \] \[ **\--all-users** \]

**scw** **update** \[ **\--all-users** \]

**scw** **faults** \[ **\--all-users** \]

**scw** **-h**\|**\--help**\
**scw** **-V**\|**\--version**

# DESCRIPTION

Wrap a scheduled command in a framework which adds concurrency locking,
prerequisites, dependency checks, conflict avoidance, randomised startup
delays, flexible logging, and monitoring metrics.

Each distinct scheduled command is referred to as an "*item*", and can
have its own configuration for each of these features:

**Concurrency locking**

:   Prevents an item from being run more than once at the same time, for
    example if it\'s scheduled to run every 2 minutes and occasionally
    takes longer than that to complete.

**Prerequisites**

:   Prevents an item from running if some prior condition is not met,
    for example checking whether this is the active node of a failover
    cluster, or whether some crucial underpinning service is running.

**Dependency checks**

:   Ensures that an item will run only if some other item has succeeded
    before it - with the possibility of waiting a short while for the
    dependency to finish rather than giving up straight away. Useful
    when linked items need to be scheduled separately at particular
    times but the later one can only run if the earlier one has
    succeeded. For example, a data load batch may have to run at some
    time in the early morning, and a subsequent data processing batch,
    which for business reasons has to run after a particular time of
    day, can only run if the early morning data load succeeded.

**Conflict avoidance**

:   Prevents an item from running if some other item is still running -
    with the possibility of waiting a short while for the conflicting
    item to finish rather than giving up straight away.

**Randomised startup delays**

:   Avoids resource overconsumption when the same item is scheduled to
    run on multiple systems.

**Flexible logging**

:   Item output can be sent to any combination of files, syslog, email,
    or HTTP, with or without timestamps. The standard output and
    standard error of items can be combined or separated, and a special
    status stream is also made available so that significant events
    (such as starting each step of a multi-step process) can be recorded
    separately.

**Metrics**

:   A collection of informational files is maintained for each item. Any
    monitoring agent can read these files and raise alerts based on
    their contents. An item list file (a JSON array of item
    descriptions) is automatically generated so that a system such as
    Zabbix can find and monitor all items without needing an operator to
    make adjustments when items are added or removed.

All of these features are optional.

In its simplest form, **scw** can be invoked from existing scheduler
entries by using the "**run**" action, so for example this
**crontab**(5) entry would change as follows:

>     # Original entry
>     0 * * * * /some/command --option ARGUMENT
>      
>     # Replacement
>     0 * * * * scw run mycommand -s Command="/some/command --option ARGUMENT"

In this example, the scheduled command will be known to **scw** as the
"*mycommand*" item, and all of the logs and metrics produced will use
that name.

To make use of the full range of features, place scheduled commands or
definition files into the item definition directory, and call
"*scw update -a*" to generate the crontab and the item list file.

# ACTIONS

**run** *ITEM*

:   Start running the item named *ITEM*, applying any prerequisite
    checks, dependency and conflict checks, startup delays, minimum
    interval constraints, and concurrency locks, after checking whether
    the item has been disabled (unless the "**\--force**" option was
    passed, in which case it will be run as if it was enabled).

    If standard input, output, and error are all connected to a
    terminal, any configured randomised startup delay will be skipped,
    and output will be written to the terminal as well as to the usual
    logging destinations.

**enable** *ITEM*

:   Enable the item named *ITEM* so that it runs as scheduled.

**disable** *ITEM*

:   Disable the item named *ITEM* so that it will not run as scheduled,
    unless run with the "**\--force**" option.

**status** *ITEM*

:   Show the current status of the item named *ITEM*. This only operates
    on items defined in the item definition directory.

**list**

:   List all defined items, or items that are defined and enabled (with
    the "**\--enabled**" option), or items that are defined and disabled
    (with "**\--disabled**").

    With the "**\--all-users**" option, items for all users are listed,
    not just the current user.

**update**

:   Update the crontab and the item list file from the items defined in
    the item definition directory (see the **FILES** section).

    With the "**\--all-users**" option, items for all users are
    considered, not just the current user, and the crontab file is
    written in the form expected in */etc/cron.d/*, with the additional
    username field.

    Note that schedules in all items must be valid, or the whole crontab
    may be skipped - see the **NOTES** section.

**faults**

:   Using the metrics files from the items defined in the item
    definition directory, list all current faults, such as "user/item:
    This item has gone too long without a successful run.". The same
    rules are applied as in the Zabbix template supplied with **scw**: a
    disabled item, or one which has gone too long without a successful
    run, is reported as an error; and an item which is currently
    overrunning is reported as a warning.

    With the "**\--all-users**" option, items for all users are
    considered, not just the current user.

# OPTIONS

With no action:

**-h**, **\--help**

:   Print a usage message on standard output and exit successfully.

**-V**, **\--version**

:   Print version information on standard output and exit successfully.

With any action:

**-c**, **\--config** *FILE*

:   Read configuration from *FILE* instead of the system-wide default
    location.

**-s**, **\--set** *SETTING*=*VALUE*

:   Set the item or configuration setting *SETTING* to *VALUE* (see the
    **CONFIGURATION** section).

With the "**run**" action:

**-f**, **\--force**

:   Run the item regardless of whether it has been marked as disabled.

**-S**, **\--strict**

:   Refuse to run the item if the **CheckLockFile** cannot be opened,
    the **MetricsDir** cannot be written to, the item lock cannot be
    opened, or a file used in an **OutputMap** cannot be opened.
    Normally, if any of these occur, a warning is produced, and the item
    runs anyway.

With the "**list**" action:

**-e**, **\--enabled**

:   Only list items which are currently enabled.

**-d**, **\--disabled**

:   Only list items which are currently disabled.

**-i**, **\--info**

:   Show extra information about each item - its schedule, success
    interval, last completion time, and any faults.

With the "**list**", "**update**", and "**faults**" actions:

**-a**, **\--all-users**

:   Look at items for all users, not just the current one. See the
    **NOTES** section.

# ITEM DEFINITIONS

The settings for an item are defined in a file named *ITEM.cf* under the
user\'s item definition directory, by default */etc/scw/items/USER/*
(see the **FILES** section).

When an item is defined in this way, its **Command** setting should be
provided, to state what to run.

Alternatively, shell or Perl scripts can be placed directly into the
directory, named *ITEM.sh* or *ITEM.pl*, so long as their first comment
block at the top of the file contains the item settings prefixed with
"**scw**", like this:

>     #!/bin/sh
>     #
>     # Perform action ABC.
>     #
>     # scw Description = Do ABC
>     # scw Schedule = 0 2 * * Mon
>     # scw MaxRunTime = 30 minutes
>     # scw SuccessInterval = 1 day 30 minutes
>     # 

This mechanism allows other packages to drop their own scheduled
commands directly into this framework, and so long as they run
"*scw update -a*" after installation, the items will be scheduled and
configured correctly.

If an item has a script as well as a "*.cf*" file, the "*.cf*" settings
are applied first, and the **Command** is implied.

Defining items in this way allows software developers to include
scheduling information directly in their build artefacts, so that
packaging and deployment teams don\'t need to worry about a
developer-generated crontab running something with the wrong credentials
(assuming that the package has been configured to run the software under
its own user account).

# CONFIGURATION

Item definitions, and configuration files, share the same syntax:
*SETTING*=*VALUE* pairs, one per line. Blank lines are ignored, as are
comments (denoted by "**\#**"). Leading and trailing whitespace is
ignored.

Unless otherwise stated, each setting takes only one value, so setting
it again overrides whatever it was set to earlier. To remove a setting,
set it to an empty value.

## Placeholders

The following placeholders can be used in values:

> **{ITEM}**
>
> :   The name of this item.
>
> **{USER}**
>
> :   The username of the account currently running **scw**.
>
> **{HOSTNAME}**
>
> :   The network node hostname ("**uname -n**").
>
> **{DATE}**
>
> :   The date on which this instance of **scw** was invoked (before any
>     startup delays), in system local time, in YYYY-MM-DD format.
>
> **{COMMAND}**
>
> :   This item\'s **Command** value. This is typically only useful in
>     the **EmailSubject** setting.

A placeholder should only be used once in values for **UserConfigFile**
and **ItemsDir** - see the **NOTES** section.

## Configuration settings

The following settings are available in configuration files:

**ItemsDir**

:   The directory in which to find item definitions and item scripts.
    The default value is usually */etc/scw/items/{USER}*.

**MetricsDir**

:   The directory in which to place metrics files for an item. The
    directory must already exist and be writable by the current user, or
    if it doesn\'t exist, its parent must be writable by the current
    user so that **scw** can create it. The default value is usually
    */var/spool/scw/{USER}/{ITEM}*.

**CheckLockFile**

:   The lock file to use when performing dependency or conflict checks.
    The default value is usually */var/spool/scw/{USER}/.lock*.

**Sendmail**

:   The command to use to send email. It should accept message headers
    and a message body on standard input. The default value is usually
    */usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t*.

**TransmitForm**

:   The command to use to transmit messages encoded as a form message to
    an HTTP or HTTPS URL. When called, the environment variable
    *SCW_TIMEOUT* will be set to the item\'s *HTTPTimeout* value, the
    environment variable *SCW_FILE* will name a temporary file
    containing the data to post, and *SCW_URL* will contain the URL to
    post to. The default value is usually
    *curl -s -S -m \"\$SCW_TIMEOUT\" \--data-binary \"@\$SCW_FILE\" \"\$SCW_URL\"*.

**TransmitJSON**

:   The command to use to transmit messages encoded as a JSON array to
    an HTTP or HTTPS URL. The same environment variables are used as
    above. The default value is usually the same as the above, with the
    extra option *-H \"Content-Type: application/json\"* just before
    *\"\$SCW_URL\"*.

**UserConfigFile**

:   The per-user configuration file. When **scw** runs, it first loads
    the global configuration file, and then this one. The default value
    is usually */etc/scw/settings/{USER}.cf*. This setting can only be
    changed in the global configuration file.

**ItemListFile**

:   The file to which "*scw update*" will write a JSON array describing
    all items, which can be used by a monitoring system to discover what
    to monitor. The default value is usually
    */var/spool/scw/items.json*. This setting can only be changed in the
    global configuration file.

**CrontabFile**

:   The file to which "*scw update*" will write a crontab. The default
    value is usually */etc/cron.d/scw*. This setting can only be changed
    in the global configuration file.

**UpdateLockFile**

:   The lock file to use to ensure that only one instance of
    "*scw update*" is running at a time. The default value is usually
    */var/spool/scw/.update-lock*. This setting can only be changed in
    the global configuration file.

Configuration files may also contain any of the other settings listed
below for items, to set defaults which items can override.

## Item settings

The following settings are available for items.

**Description**

:   A short one-line description of what this item does. This is
    recorded in the item list file by "*scw update*".

**Command**

:   The command to run. This is passed to "*sh -c*".

**AmbiguousExitStatus**

:   If set to a number greater than zero, then if the **Command** exits
    with this status, it is treated as having neither succeeded nor
    failed: its *ended* metrics file will be updated, but neither the
    *succeeded* file nor the *failed* file will be altered.

    For example, use this when an item needs to be able to exit without
    an error if it can\'t run yet (such as if some necessary input
    isn\'t ready): it hasn\'t *failed* in that case, but it hasn\'t
    succeeded so you still want an alert if the **SuccessInterval** is
    reached.

**Schedule**

:   When to run this item. This takes time and date fields in the same
    format as **crontab**(5). An item can have up to 16 **Schedule**
    values. This is used by "*scw update*" when generating a crontab.

**RandomDelay**

:   Each time this item starts, there will be a random delay of up to
    this many seconds before continuing with checks and running the
    command.

    The time period can be specified in seconds, or with multiple
    numbers suffixed with "**w**" (weeks), "**d**" (days), "**h**"
    (hours), "**m**" (minutes), or "**s**" (seconds). Either the whole
    word can be given, or just the first letter. Spaces are optional.

    For example, "**1d5h7m6s**",
    "**1 day 5 hours 7 minutes 6 seconds**", and "**104826**" are all
    equivalent.

**MaxRunTime**

:   The maximum number of seconds to allow the command to run, after
    which it will be forcibly terminated. If this is not set, there is
    no limit. The same time period formatting rules as **RandomDelay**
    apply here.

**Prerequisite**

:   A command to run before attempting to run any item\'s **Command**.
    If the **Prerequisite** command exits with a non-zero status, the
    item is treated as if it was not scheduled to run at all, and so its
    command is not run.

    For example, a prerequisite command could check that this server is
    the active node of a failover cluster, so that scheduled commands
    only run on the active node.

    The output of the **Prerequisite** command is always discarded.

    **Note:** The **Prerequisite** command may be run multiple times for
    an item if any delays are involved, since it is invoked prior to any
    delay, and then again after any delay just before the item\'s
    **Command** is to be run.

**MinInterval**

:   The minimum number of seconds that must have elapsed since the item
    last ended (regardless of its success) before the next run is
    permitted. If this is not set, there is no limit. The same time
    period formatting rules as **RandomDelay** apply here.

    When the item is about to start but not enough time has passed since
    it last ended, it is treated as if it was not scheduled to run now.

    This can be useful with items that take a variable amount of time to
    run, or which are scheduled for a time which may cause the scheduler
    to misfire (such as during the repeated hour when clocks go back due
    to daylight saving time).

**SuccessInterval**

:   The number of seconds permitted between successful command runs
    before an alert should be raised. The same time period formatting
    rules as **RandomDelay** apply here.

    This is made available as a metrics file for your monitoring system
    to read - see the **Metrics** subsection under **FILES** - and also
    used by the "**faults**" action.

**ConcurrencyWait**

:   Instead of immediately abandoning the attempt to run this item if
    its previous run has not completed yet, wait up to this many
    seconds. The same time period formatting rules as **RandomDelay**
    apply here.

    If the previous run finishes within this time, the new run will
    proceed as normal.

**SilentConcurrency**

:   If the item was already running, and did not finish before the
    **ConcurrencyWait** timeout expired, then a second instance won\'t
    be started. By default, when this happens, the *overran* metrics
    file is created (unless **IgnoreOverrun** is true, see below); and
    no other metrics are affected. If the **SilentConcurrency** setting
    is "**no**", "**off**", "**false**", or "**0**", then this situation
    will instead be treated as if the command had run and failed.

**IgnoreOverrun**

:   As described above, if an item cannot start because a previous
    instance is still running, then the *overran* metrics file is
    created. If the **IgnoreOverrun** setting is "**yes**", "**on**",
    "**true**", or "**1**", then instead, the *overran* metrics file is
    never created.

    Use this setting for items that are expected to overrun, to avoid
    generating needless monitoring alerts.

    For example, an item which processes batches of incoming work could
    be scheduled to run every minute, and may occasionally take many
    minutes to run if a lot of work arrives at once. In this situation,
    an overrun alert is of no interest, and the **SuccessInterval** will
    be better for checking that the item is working properly and not
    overwhelmed.

**DependsOn**

:   The name of another item which must have successfully run since the
    previous run of this item. If the dependency is not met, this item
    will not run. An item can have up to 16 **DependsOn** values.

**DependencyWait**

:   If all dependencies are not met, keep waiting for them to be met for
    up to this many seconds. If, after waiting, the dependencies have
    been met, start the item as normal. The same time period formatting
    rules as **RandomDelay** apply here.

**SilentDependency**

:   By default, if an item\'s dependencies are not met, the item is
    treated as if it ran its command and it failed. If the
    **SilentDependency** setting is "**yes**", "**on**", "**true**", or
    "**1**", then this situation will instead be treated as if the item
    was not scheduled to be run at all, and no metrics will be updated.

**ConflictsWith**

:   The name of another item which must not be running at the same time
    as this item. If it is, this item will not run. An item can have up
    to 16 **ConflictsWith** values.

**ConflictWait**

:   If a conflicting item is running, keep waiting for it to finish for
    up to this many seconds. If, after waiting, the conflicts have been
    resolved, start the item as normal. The same time period formatting
    rules as **RandomDelay** apply here.

**SilentConflict**

:   By default, if an item can\'t start because of a conflict, it is
    treated as if it ran its command and it failed. If the
    **SilentConflict** setting is "**yes**", "**on**", "**true**", or
    "**1**", then this situation will instead be treated as if the item
    was not scheduled to be run at all, and no metrics will be updated.

**StatusMode**

:   A command can provide additional information about its progress
    through a status stream (see the **STATUS REPORTING** section). If
    **StatusMode** is set to "**fd**", then status information is read
    from the command\'s file descriptor 3. If it is set to "**stdout**"
    or "**stderr**", status information is derived from the command\'s
    standard output or standard error respectively, using the
    **StatusTag**.

**StatusTag**

:   When **StatusMode** is not "**fd**", any command output lines in the
    appropriate stream which start with the **StatusTag** will have that
    tag removed and the remainder will be used as the status
    information. See the **STATUS REPORTING** section.

**TimestampUTC**

:   By default, timestamps are expressed in the system\'s local time
    zone. If the **TimestampUTC** setting is "**yes**", "**on**",
    "**true**", or "**1**", then timestamps are expressed in UTC. Note
    that this only affects timestamps in the output - the **Schedule**
    always refers to the system\'s local time zone, as **cron**(8) does.

**HTTPInterval**

:   To improve efficiency when sending output to an HTTP or HTTPS URL
    (see below), lines are not sent immediately, but are collected and
    transmitted in batches, with this number of seconds between them.
    The same time period formatting rules as **RandomDelay** apply here.

**HTTPTimeout**

:   Terminate transmissions after this number of seconds. The same time
    period formatting rules as **RandomDelay** apply here.

**SharedSecret**

:   An optional secret string to use with HTTP or HTTPS transmissions to
    generate a validation hash.

    The values of the fields listed for the "**json**" output format in
    the "**Output mapping**" subsection below are concatenated, in the
    order shown, then this secret is appended, and the entire string is
    hashed with SHA-256.

    The resultant 64 hexadecimal digits are included in the transmission
    as extra field named "*hash*".

    The receiver can validate the sender, if it knows this shared secret
    string, by performing the same calculation and comparing the hash.

    No hash is added if this setting is not provided.

**EmailMaxBodySize**

:   If the item output is larger than this, and the output is being sent
    by email, it will be sent as an attachment rather than as the body
    of the message. Values can be suffixed by "**K**" or "**M**" for
    kibibytes or mebibytes. A value of zero means that an attachment
    will always be used; a negative value or the word "**unlimited**",
    which is the default, means that an attachment will never be used.

**EmailBodyText**

:   When sending item output as an email attachment, place this text in
    the message body. The default is "*The output is attached.*".

**EmailAttachmentName**

:   When sending item output as an email attachment, give the attachment
    this filename. The default is "*output.txt*".

**EmailSender**

:   The value to use for the *From:* header when sending item output by
    email. The default is "*\"(Cron Daemon)\" \<{USER}\>*" - note the
    quotes around the displayed name. This matches the default behaviour
    of **cron**(8).

**EmailSubject**

:   The subject line to use when sending item output by email. The
    default is "*Cron \<{USER}@{HOSTNAME}\> {COMMAND}*". This matches
    the default behaviour of **cron**(8).

**OutputMap**

:   Map the command\'s output to a destination. See the "**Output
    mapping**" subsection below for more details. An item can have up to
    16 **OutputMap** values.

**ReceiverStrategy**

:   Which strategy to employ to receive command output. This must be one
    of "*pipe*", "*socket*", "*relay*", or the default, "*auto*". See
    the "**Receiver strategies**" subsection below for more details.

## Output mapping

The **OutputMap** settings take values of the form "*STREAM* *FORMAT*
*DESTINATION*", where *STREAM* selects one or more command output
streams, *FORMAT* selects how it should be formatted, and *DESTINATION*
selects where to send it to.

For example, an item might have this output map configuration, or it
could even be in the global configuration file as the default for this
server:

>     # Write stdout, stderr, and status, with timestamps, to a file
>     OutputMap = OES stamped /var/log/scw/{USER}/{ITEM}.log
>      
>     # Email stdout and stderr, without timestamps, to root@localhost
>     OutputMap = OE raw root@localhost
>      
>     # On failure, email stderr, without timestamps, to admin@company.com
>     OutputMap = !E raw admin@company.com
>      
>     # Write status messages to syslog as facility "user", level "notice".
>     OutputMap = S raw user.notice
>      
>     # Send status messages as JSON data via HTTPS POST
>     OutputMap = S json https://status.company.com/receiver

The *STREAM* is any combination of the following:

> **O**
>
> :   Standard output. "**-**" may also be used.
>
> **E**
>
> :   Standard error.
>
> **S**
>
> :   Status messages.
>
> **!**
>
> :   Spool until the command completes, and then only send to the
>     destination if the exit status was non-zero, indicating failure.

The *FORMAT* is one of the following:

> **raw**
>
> :   Lines of text exactly as output by the command.
>
> **stamped**
>
> :   Lines prefixed with a timestamp and, if multiple streams were
>     selected, an indicator of which stream they came from.
>
> **json**
>
> :   An array of JSON objects. Each object contains integers named
>     *epoch* and *pid*, and strings named *hostname*, *user*, *item*,
>     *stream*, and *message*; the *stream* will be one of "**stdout**",
>     "**stderr**", or "**status**". If a shared secret was provided, a
>     *hash* string will also be present (see *SharedSecret* above).
>
>     From version 0.7.4, an integer named *line* is also included in
>     each JSON object, providing the line number, but this is not
>     included in the hash calculation.
>
> **form**
>
> :   An HTTP form post (key=value pairs) containing the same fields as
>     the **json** format, each key being suffixed with a line number
>     counting from 1, such as *user1=root*. There is no separate *line*
>     field.

The *DESTINATION* is a filename, a list of email addresses separated by
commas, a syslog priority in the form *facility*.*level*, or an HTTP or
HTTPS URL.

When the *DESTINATION* is a filename, it can be prefixed with a single
"**\>**" character to enable output checking. Normally, an output file
is opened when the item starts, and only closed at the end. With output
checking, the file will be closed and re-opened if it was rotated out
(by **logrotate**(8) or similar), so that the old file stops being used
as soon as possible. This does incur a small performance penalty, may be
unreliable over NFS, and there is a small risk that mixed streams may
arrive out of order if data arrives at the moment the file is rotated
out.

When the *DESTINATION* is an HTTP or HTTPS URL, the *FORMAT* may only be
"**json**" or "**form**". These formats may only be used with URLs and
no other destination types.

When the *DESTINATION* is one or more email addresses, the provided
value is used as the "**To:**" header in an email which is sent when the
command completes. No email is sent if there was no output at all. If
the *STREAM* contains "**!**", then the email will only be sent if the
command fails (exits with a non-zero status).

## Receiver strategies

For the output of the command to be mapped to the destinations in the
output map, it must be received by **scw** using one of the following
strategies, each with their own benefits and drawbacks:

**pipe**

:   The command is connected to **scw** using one pipe for each output
    stream. Although this is simple and uses the least resources, it is
    likely to experience lines from standard output and lines from
    standard error arriving out of order. This is because **scw** reads
    data from each pipe in turn until the pipe buffer is empty, so if
    one stream writes a large burst of data, all of it will be read
    before the next stream is checked.

**socket**

:   The command\'s output streams are connected to a local Unix socket,
    which **scw** receives messages from. This gets rid of the
    out-of-order streams problem, without using extra resources.
    However, it doesn\'t work properly on some non-Linux systems, may be
    subject to buffering issues on other non-Linux systems under high
    volume, and may fail to receive output from certain commands on
    Linux systems where SELinux is enforcing, due to some SELinux
    contexts blocking Unix socket writes.

**relay**

:   The command is connected to one relay subprocess per output stream,
    using pipes. Each subprocess relays the data from its pipe to a
    local Unix socket. This works around the issue with SELinux blocking
    some commands from writing directly to Unix sockets, but may be
    slightly more likely to send streams out of order when one stream
    writes a lot of data at once.

**auto**

:   On FreeBSD and OpenBSD, select "**socket**". On Linux, if SELinux is
    enforcing, select "**relay**", otherwise select "**socket**". On any
    other system type, select "**pipe**".

The default is "**auto**".

If you have an item which is running on an SELinux enforcing system but
you know that everything it calls is able to write to Unix sockets, you
could set the **ReceiverStrategy** to "**socket**" to slightly reduce
overhead and improve performance.

If you have an item which generates a high volume of output and is
running on a FreeBSD or OpenBSD system, you could set the
**ReceiverStrategy** to "**pipe**" to avoid possible message buffering
problems, at the cost of increasing the risk of lines from different
streams being recorded out of order.

# STATUS REPORTING

When an item\'s command runs through several steps, it can pass details
of which step it\'s up to, and whether the previous step failed, to
**scw** using the status reporting mechanism.

The first word of the status report should be one of "**notice**",
"**ok**", "**warning**", or "**error**".

When running an item\'s command, **scw** inserts a special "**begin**"
status report at the start, and an "**end**" status report at the end.

Depending on the **StatusMode** and **StatusTag** settings, status
reporting might look like this:

>     #!/bin/sh
>     #
>     # scw StatusMode = fd
>      
>     printf '%s %s\n' 'notice' 'Starting step 1' >&3
>       # do some work here
>     if $succeeded; then
>         printf '%s %s\n' 'ok' 'Step 1 complete' >&3
>     else
>         printf '%s %s\n' 'error' 'Step 1 failed' >&3
>         exit 1
>     fi
>      
>     printf '%s %s\n' 'notice' 'Starting step 2' >&3
>       # do some more work here
>     # ...

Adding status information like this makes analysis easier, for example
discovering how the time taken for each step of a multi-step command
fluctuates, or clearly highlighting where a command failed.

Using a **StatusMode** of "**stdout**" or "**stderr**" means prefixing
the status message with the value of the **StatusTag**, like this:

>     #!/bin/sh
>     #
>     # scw StatusMode = stdout
>     # scw StatusTag = STATUS:
>      
>     printf 'STATUS: %s %s\n' 'notice' 'Starting step 1'
>     # ... and so on.

Writing status information this way may be easier than with the "**fd**"
method in some circumstances. It has the side effect that the status
messages will also be recorded in standard output or standard error
logs.

# EXIT STATUS

The following exit status values apply to all actions:

**0**

:   Success: the action completed without error.

**5**

:   An unknown option, action, or setting was passed on the command
    line, or too many or too few arguments were provided for the chosen
    action. No action was taken.

**6**

:   The configuration file could not be read, or contains unrecoverable
    errors. No action was taken.

**7**

:   Some other error occurred that was not covered by any of the above.
    Action may have been partially completed.

## Exit status values for "run"

The "**run**" action can exit with one of the following:

**0**

:   Success: the action completed without error.

**1**

:   Item failed: the item\'s command was run, and it exited non-zero.

**2**

:   Item timed out: the item\'s command was run, but it reached its
    configured maximum run time, and was forcibly terminated.

**3**

:   The item\'s command ran successfully, but there was a problem
    accessing the metrics directory or a lock file, so no concurrency,
    dependency, or conflict checks were possible.

**4**

:   The item\'s command was run, and it exited non-zero, and there was a
    problem accessing the metrics directory or a lock file, so no
    concurrency, dependency, or conflict checks were possible.

**8**

:   Item has no command: the specified item has no entry in the item
    definition directory and there was no command in the remaining
    command line arguments to **scw**, so no command has been run.

**9**

:   Item not enabled: the item is currently disabled, and the
    "**\--force**" option was not provided, so the command has not been
    run.

**10**

:   Item prerequisites not met: the prerequisite check has failed, so
    the command has not been run.

**11**

:   Item dependencies not met: the items on which this item depends have
    not all run, so the command has not been run.

**12**

:   Item conflict: one of the items which this item conflicts with is
    currently running, and all options for startup delays have been
    exhausted, so the command has not been run.

**13**

:   Item already running: the item is already running, and all of its
    configured options for startup delays have been exhausted, so the
    command has not been run.

**14**

:   Not enough time has elapsed since the item\'s last run ended (the
    **MinInterval** constraint), so the command has not been run.

Note that any exit status lower than 5 indicates that the item\'s
command was definitely started; only an exit status of 0 or 3 indicates
that it succeeded.

## Exit status values for "status"

The "**status**" action exits with the sum of the following values:

**16**

:   Added if the item does **not** exist (meaning that it has no entry
    in the item definition directory).

**32**

:   Added if the item is disabled.

**64**

:   Added if the item is currently running.

For example, an exit status of 0 indicates that the item exists, is
enabled, and is not currently running. An exit status of 96 indicates an
item which is disabled, but currently running.

## Exit status values for "faults"

The "**faults**" action exits with zero, regardless of whether there
were any faults.

# FILES

File locations may be adjusted by the installation process, so for
example paths listed here under */etc* may be under */usr/local/etc* on
your system. Locations may also be overridden by configuration settings.

*/etc/scw/default.cf*

:   Global default settings.

*/etc/scw/settings/USER.cf*

:   Settings to apply when running as user *USER*.

*/etc/scw/items/USER/\*.cf*

:   Item definitions for user *USER*.

*/etc/scw/items/USER/\*.sh*

:   

*/etc/scw/items/USER/\*.pl*

:   Item scripts for user *USER*, with their definitions embedded in a
    comment block at the top of the script (see the **ITEM DEFINITIONS**
    section).

*/var/log/scw/USER/\*.log*

:   The default location for log files generated by items owned by
    *USER*.

*/var/spool/scw/USER/ITEM/*

:   Metrics files for the item *ITEM* owned by the user *USER*. See
    below for more details.

*/var/spool/scw/USER/.lock*

:   An empty file used for locking while checking for dependencies and
    conflicts.

*/var/spool/scw/items.json*

:   A JSON array describing all items, suitable for using as a Zabbix
    low-level discovery file. Updated by "*scw update*".

*/etc/cron.d/scw*

:   The default crontab written by "*scw update*".

*/var/spool/scw/.update-lock*

:   An empty file used for locking while running "*scw update*".

## Metrics

The metrics directory for an item can contain these files:

*disabled*

:   An empty file whose presence indicates that the item is disabled,
    and whose last-modification time indicates when it was disabled.

*success-interval*

:   The number of seconds permitted between successful command runs
    before an alert should be raised, followed by a newline. Monitoring
    systems should be instructed to raise an alert if the *succeeded*
    file\'s last-modification time is more than *success-interval*
    seconds ago **and** the *prerequisites-met* file exists.

*prerequisites-met*

:   An empty file which is created if the item\'s prerequisites are met,
    and deleted if they are not. Its last-modification time indicates
    when the prerequisites were last successfully checked.

*delay*

:   The number of seconds that this item is to be delayed by, followed
    by a newline. This file is created when the item is invoked, when it
    has a non-zero **RandomDelay**. As soon as the delay is complete -
    before the item actually starts - this file is deleted.

*started*

:   An empty file whose last-modification time indicates when the item
    last started. It is not updated until the item\'s command actually
    starts running (so, after any startup delays).

*ended*

:   An empty file whose last-modification time indicates when the item
    last ended after running the item\'s command, regardless of whether
    the command succeeded. Its last-modification time is not updated
    unless the command actually ran.

*succeeded*

:   An empty file whose last-modification time indicates when the
    item\'s command last ran and ended with a zero exit status. It is
    **not** deleted on failure.

*failed*

:   An empty file which is created when the item\'s command runs and
    ends with a non-zero exit status. Its last-modification time
    indicates when the command **first** failed - it is not updated when
    subsequent runs fail. This file **is deleted** as soon as the
    command runs and exits with a zero exit status.

*overran*

:   An empty file which is created when the item could not run because
    it was already running and all startup delay options were exhausted.
    Its last-modification time indicates when this **first** happened.
    It **is deleted** the next time the item is able to start.

    This file is never created if the **IgnoreOverrun** setting is
    enabled.

*run-time*

:   The number of seconds the item\'s command most recently took to run,
    followed by a newline. It is updated each time the item\'s command
    finishes a run, not counting any startup delays, and regardless of
    the command\'s exit status.

*.lock*

:   An empty file which the item will lock while running.

*pid*

:   While the item is running, this file contains the item\'s process
    ID, followed by a newline. The file **is deleted** on exit.

*last-status*

:   The status message most recently reported by the item (see the
    **STATUS REPORTING** section).

# NOTES

Commands will always appear to take at least 1 second to complete due to
**scw** waiting for output to cease after the process has ended, when
any receiver strategy other than "*pipe*" is being used. This is so that
any remaining buffered data in the Unix socket is not lost when the
command terminates.

Time periods such as **MaxRunTime** may be written in seconds, or as any
combination of weeks, days, hours, minutes, and seconds, each number
suffixed with the unit. Spaces are allowed between each component of the
time period, but no other words or punctuation. These are all
equivalent:

>     MaxRunTime = 2 weeks 3 days 10 hours 5 minutes 2 seconds
>     MaxRunTime = 2w 3d 10h 5m 2s
>     MaxRunTime = 2w3d10h5m2s
>     MaxRunTime = 17 days 605 minutes 2 seconds
>     MaxRunTime = 1505102 seconds
>     MaxRunTime = 1505102

An item\'s **Prerequisite** command is run when "**scw run**" starts. If
a delay then arises due to concurrency locks, dependencies not being met
yet, or conflicting items still running, then it will be run again, in
case conditions have changed. This means that the **Prerequisite**
command should be carefully chosen to handle being run twice per item,
and ideally have no side effects.

The name of an item must only contain letters, numbers, underscores, and
hyphens.

Item settings files and scripts in **ItemsDir**, the per-user
configuration file in **UserConfigFile**, and the global configuration
file, must be normal files, and must not be symbolic links.

Each setting which takes multiple values - **Schedule**, **DependsOn**,
**ConflictsWith**, **OutputMap** - is limited to 16 values in total
after applying rules from all relevant sources. For example, if the
global configuration defines 3 **OutputMap** values, an item may only
add 13 more unless it first clears the list by assigning an empty value
to **OutputMap**.

The **CrontabFile** written by "*scw update*" is in standard
**crontab**(5) format, unless the filename starts with */etc/cron.d/*,
or the "**\--all-users**" option was passed, in which case the system
crontab format is used, where each command is preceded by the username
of the user to run it as. This allows "*scw update -a*" to write a
system-wide crontab for all users, so applications which run under their
own user ID can have their packages place their schedules under
*/etc/scw/items/USER/*, and they will be run as *USER*.

The **Schedule** settings from each item are checked with a validator
before being added to the crontab, but it is possible that the validator
may accept a schedule that the cron daemon will not. When using crontabs
written by "*scw update*", ensure that monitoring is in place so that if
**cron**(8) rejects the entire crontab due to an invalid entry, you will
be alerted that items are not running.

When using the "**list**" and "**update**" actions with the
"**\--all-users**" option, users are enumerated first, and then each
user\'s items are enumerated. Users are found by replacing *{USER}* in
the **UserConfigFile** setting with a "**\***" and using it as a
**glob**(7) pattern, to find all users with their own distinct
configuration. Then, *{USER}* in each **ItemsDir** setting (its global
value, and any new values found in user config files) is replaced with a
"**\***" and used as a **glob**(7) pattern as well. This means that the
"**\--all-users**" option will not work properly if **UserConfigFile**
uses the placeholder twice in its value, and neither of the "**list**"
or "**update**" actions will work properly if **ItemsDir** use the
placeholder twice, so values like */opt/{USER}/settings-{USER}.cf* are
not recommended.

Transmission over HTTP and HTTPS is implemented by calling **curl**(1),
which must be in the path. Its output is discarded, and errors are
written to stderr.

The value of the **EmailAttachmentName** setting must only contain 7-bit
ASCII characters and no double quotes, as it is not escaped.

Values of the **EmailSender** and **EmailSubject** settings must be
restricted to 7-bit ASCII for the same reason, although they do permit
double quotes.

By design, the alerting template, and the "**faults**" action, will not
report errors about items that have not succeeded if they have no
**SuccessInterval** defined; or if they have a **Prerequisite** command
which fails; or if they have never yet succeeded at all.

When **AmbiguousExitStatus** is used, "**scw run**" will exit 0 if the
command exits with an ambiguous status, since it completed without
error.

# EXAMPLES

The setup guide contains several examples. This is usually installed as
*/usr/share/doc/scw/SETUP.md*.

# REPORTING BUGS

Please report bugs or feature requests via the issue tracker linked from
the [**scw** home page](https://ivarch.com/p/scw).

# SEE ALSO

**crontab**(5), **cron**(8), **curl**(1)

# COPYRIGHT

Copyright © 2024-2025 Andrew Wood.

License GPLv3+: [GNU GPL version 3 or
later](https://www.gnu.org/licenses/gpl-3.0.html).

Includes SHA-256 functions which are copyright © 2021 Alain Mosnier,
licensed under the Zero Clause BSD License.

This is free software: you are free to change and redistribute it. There
is NO WARRANTY, to the extent permitted by law.
