NAME

pmx-qdigest - Generate digests of quarantined email


SYNOPSIS

   pmx-qdigest                          # send configured digests
   pmx-qdigest --quiet                  # good for scheduled jobs and scripts
   pmx-qdigest --verbose                # be verbose
   pmx-qdigest --dry-run                # test digest generation
   pmx-qdigest --min 1000 --max 2000    # specify which messages to include
   pmx-qdigest --earliest "YYYY-MM-DD hh:mm:ss" --latest "YYYY-MM-DD hh:mm:ss"
       # specify messages by quarantine time and date
   pmx-qdigest --digest=DIGEST          # specify which digest to send
   pmx-qdigest --addr=ADDRESS           # specify which address(es) to include
   pmx-qdigest --scan-only              # scan the quarantine, don't send mail
   pmx-qdigest --send-only              # send mail, don't scan the quarantine
   pmx-qdigest --dump                   # dump the state database
   pmx-qdigest --index                  # include the most recent messages
   pmx-qdigest --no-index               # skip recent messages
   pmx-qdigest --help                   # help on options


DESCRIPTION

The pmx-qdigest program generates 'digests' of quarantined messages. Digests contain a summary listing of messages that have been quarantined by PureMessage. These listings are sent to the users to whom the messages were originally addressed. The pmx-qdigest program places messages in the queue; messages in the queue are delivered by the pmx-queue program.

Users can release their quarantined messages by replying to digests; the pmx-qdigest-approve program handles the release requests.

There are two digest generation modes: local and centralized. In local mode, pmx-qdigest only scans messages in the filesystem-based quarantine that is local to the machine running the pmx-qdigest program. In this mode of operation, the digest state information (such as digest ID and the time pmx-qdigest runs) is also stored on the local machine. Multiple PureMessage servers can be configured to generate local digests.

In centralized mode, the pmx-qdigest program generates digests based on messages stored in the centralized, DBMS-based message quarantine. Digest state is stored in the central database as well. Only one machine on the network should be configured to run centralized digests.

The digest mode is specified in the centralized configuration parameter in the etc/pmx-qdigest.conf file. This configuration parameter can have values of true or false. The default is false.

Depending on the mode of operation, the pmx-qdigest program accepts different options for specifying message selection criteria. See Options for more information.

Options

There are several options for debugging digests or generating one-time digests:

-q
--quiet
Suppresses all nonerror output. This is a good idea in scheduled jobs or scripts.

-v
--verbose
Normally, pmx-qdigest prints one line per digest it sends:
   Sending digest 'spam' for <harp@music.net>: 10 messages
   Sending digest 'spam' for <guitar@music.net>: 1 message
   ...

With one --verbose option:

   pmx-qdigest: scanning messages starting from 1150
   pmx-qdigest: scanned 942 messages.
   Sending digest 'spam' for <harp@music.net>: 10 messages
   Sending digest 'spam' for <guitar@music.net>: 1 message

If you repeat it, the output gets even more noisy:

   pmx-qdigest: scanning messages starting from 1150
   pmx-qdigest: message 1150
   pmx-qdigest: including message 1150 in digest for harp@music.net
   pmx-qdigest: message 1151
   pmx-qdigest: including message 1151 in digest for guitar@music.net
   pmx-qdigest: including message 1151 in digest for harp@music.net
   pmx-qdigest: message 1152
   pmx-qdigest: including message 1152 in digest for harp@music.net
   ...
   pmx-qdigest: scanned 942 messages.
   Sending digest 'spam' for <harp@music.net>: 10 messages
   Sending digest 'spam' for <guitar@music.net>: 1 message

-n
--dry-run
Implies --verbose. Does not actually send any email or write digest files into var/digest/pending.

--min
--max
Specify those messages (by message ID) that should be selected from the message store. These options are available only for local mode of operation. By default this is:
   --min 'last_digest_id'+1

--earliest ``YYYY-MM-DD hh:mm:ss''
--latest ``YYYY-MM-DD hh:mm:ss''
Specify those messages (by quarantine time) that should be selected from the message store. These options are available only in centralized mode of operation. By default, all the messages quarantined after the last message scanned in the previous invocation of the pmx-qdigest program are scanned.

--digest DIGEST
Specifies the digest to generate. Normally, all digests defined in etc/pmx-qdigest.conf are generated.

--addr ADDRESS
Specifies the address or addresses for which to generate digests. Repeat the option to add multiple addresses. The syntax is the same as in the etc/digest-users configuration file.

If --addr is specified without --digest, then all configured digests for which the addresses are members are generated.

If --addr and --digest are both specified, then every specified digest is generated for every matching user, regardless of whether the users are subscribed to the digests.

For example, this generates all of tony@example.com's digests.

   pmx-qdigest --addr tony@example.com

This generates a spam digest for all subscribers:

   pmx-qdigest --digest spam

This generates a spam digest for mary@example.com, even if Mary is not subscribed to the spam digest:

   pmx-qdigest --addr mary@example.com --digest spam

--scan-only
--send-only
By default, pmx-qdigest scans the quarantine for any messages that have not been scanned, accumulating digests for users. Then it sends mail based on the data. These options let you to restrict pmx-qdigest to a particular task only.

--dump
Print the contents of the state database in a human-readable format. The state database tracks what pmx-qdigest has already done by user and by digest.

Output is unsorted, because the database quickly gets too large to sort it fast. If sorted output is desired, you can sort or select the results you want using a tool like sort or grep:

   pmx-qdigest --dump | sort
   pmx-qdigest --dump | grep spam

The output is in four columns: user, digest, last_digest_id, and the time their last email was sent.

Output might look like this:

    @ spam 1149
    @ offensive 1149
    foo@example.com spam 1154 1056771578
    foo@example.com offensive 1154

The special @ user means ``any''. It is used to track the last id seen when not run with --addr or --min, and is the default for the --min option.

While scanning the quarantine, a message is added to a user's digest cache if the message's ID is higher than that user's last-digest-id. In the example database above, messages for foo@example.com would be skipped until the ID is 1155 or higher.

--index
By default the digester does not ensure that the messages most recently added to the quarantine are included in the digest. Setting --index ensures that messages that have been quarantined since the last time pmx-qindex, was run are included in the digest.

--no-index
By default the digester does not ensure that the messages most recently added to the quarantine are included in the digest. This option is no longer required, but present to ensure scripts using it are not broken.

--help
Prints out usage information.

Configuration File

The pmx-qdigest program reads its configuration from etc/pmx-qdigest.conf. A simple configuration file might look like this:

 approve_addr = pmx-auto-approve
 date_format = us
 approve_tmpl = approve-failure.tmpl
 consolidate = none
 centralized = false
 <digest>
     expire = 5d
     <spam>
         template = digest-spam.tmpl
         # Messages quarantined for (only) the given reason(s) are included.
         <reason>
             spam
         </reason>
         members = digest-users
     </spam>
 </digest>

The approve_addr setting determines which email address is used as the ``Reply-To'' for generated digests. When users reply to their digests, the script listening at that address releases the requested messages.

The sample configuration file above specifies a single digest type (spam). Site administrators can specify additional digest types. However, it should be noted that unnecessary digest type entries need to be avoided. This is because they may result in more resource consumption during digest runs, regardless of whether quarantined messages corresponding to that digest type exist or not. For example, if the site is not quarantining offensive messages, then the offensive digest type entry should be removed from the configuration file. This is important mainly for digests operating in centralized mode.

The date_format setting specifies the format of the date used to expand the %%SINCE%% template variable. The default template uses this variable in the Subject line of the digest. The default is %b %d %H:%M. Other values include: us (mm-dd-yyyy, equivalent to %m-%d-%Y). uk (dd-mm-yyyy, equivalent to %d-%m-%Y), iso-8601 (yyyy-mm-dd, equivalent to %Y-%m-%d), or any conversion specifiers defined by the ANSI C standard (C89). Consult your system's strftime() manpage for further details.

For example:

 date_format = "%A, %B %d, %Y"

will generate a date where the weekday and the month are based on the pmx user's locale (e.g. mardi, mars 30, 2004 in a fr_FR locale).

The approve_tmpl setting specifies the template used to generate messages sent to end users when quarantine message approval requests fail.

The consolidate parameter can be set to merged or none (default). With consolidate set to none, quarantine digests are generated for each email account that has messages addressed to it in the quarantine. However, some users have multiple email accounts, and prefer that quarantined messages for all accounts be included in one quarantine digest. To enable this type of consolidation, set this option to merged.

Address mapping for consolidation is configured in the notifications file. Create an entry that maps all the addresses that you want consolidated into a single digest on the left side, and the address of the digest recipient on the right. Use spaces to separate multiple addresses. Use a colon to separate the addresses that you wish to consolidate from the digest recipient address. For example:

 jane.doe@example.org jane_doe@*.example.org: jdoe@example.org

The digest recipient specified on the right side must also be included (either explicitly or as part of a wildcard match) in the Quarantine Digest Users list for the consolidated digest to be sent.

The centralized setting controls the mode of digest operation. The default is ``false''. Set to ``true'' to enable centalized digest (see above for information on centralized and local quarantine operation modes).

Each subsection in the digest section defines a new type of digest. By default, only the spam digest is defined; it sends digests to those users who have mail quarantined for the specified reasons.

Digests are generated for the members of a digest. By default, this list is defined in etc/digest-users. Each line of the file corresponds to an email address. Wildcard characters are supported. To suppress mail to a particular address, precede the address with an exclamation mark (!). To mail digests to all users in a given domain, use the following format:

 **@<Domain>

Note: Only users with messages in the quarantine receive digests.

Digest routing can be modified by changing the etc/notifications configuration file. Rather than always sending the digest to the member, pmx-qdigest first transforms addresses using the notifications map. This makes it possible to handle users who have several aliases.

Each time an digest is sent, a digest file is written to the var/digest/pending directory. The file's name is the digest's MD5 checksum, and the file contains the recipient of the digest and the message IDs that were included in the email. These files are used by the pmx-qdigest-approve script to release email to digest recipients.

Scalability

This version of pmx-qdigest is faster and uses less memory than previous versions. Rather than accumulating the digest information in memory, digest cache files are accumulated in the var/digest/cache directory. Each address found while scanning the quarantine gets its own cache file. After pmx-qdigest has finished scanning, it processes the cache files, turning each into a digest email and storing it in the queue.

Per-user Digest ID

The last_digest_id counter is now tracked per-address, per-digest. This allows intelligent processing of requests. For example:

   pmx-qdigest --addr foo@example.com

generates the configured digests for which foo@example.com is a member, and it will update foo@example.com's digest counters so that the next scheduled digest omits the digested messages.

Digest Templates

Each digest is constructed using a template email message. This section describes the template format.

Here is a simple template:

 From: %%ADMIN_ADDR%%
 Subject: Quarantined spam messages %%SINCE%%
 MIME-Version: 1.0
 Date: %%SENT_DATE%%
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 The following messages are spam. Delete the lines you don't want
 approved.
 %{
 H:Id Time Score From Subject
 @[[[[[ @<<<< @]]]] @<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 V:id time reason from subjbody
 %}
 Thanks,
 %%ADMIN_ADDR%%

There are several interesting features:

Template Variables
The %%ADMIN_ADDR%% and %%SINCE%% are template variables that are expanded when the template is parsed. The following template variables are available:
%%ADMIN_ADDR%%
The administrator address, set in etc/pmx.conf.

%%SINCE%%
The first time pmx-qdigest runs for this user, expands to the current date. For subsequent runs, this expands to the date of the last digest sent to the user. The format is specified by date_format in the configuration file.

SENT_DATE
The current date formatted in RFC-822 format for display in an email header.

%%REPLY_TO%%
Only available in the template's From header. This is intended to support mail clients that do not respect the Reply-To header.

Here is an example of how to set the From header to the dynamically-generated Reply-To header:

 From: %%REPLY_TO%%

%%MIME_BOUNDARY%%
A valid RFC 822 MIME boundary. Can be used to make multipart messages which are slightly less ugly than copying an actual MIME boundary.

For example:

 From: %%ADMIN_ADDR%%
 MIME-Version: 1.0
 Content-Type: multipart/alternative;
         boundary="%%MIME_BOUNDARY%%"
 --%%MIME_BOUNDARY%%
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 plain-text part
 --%%MIME_BOUNDARY%%
 Content-Type: text/html; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 <html>html-text part</html>
 --%%MIME_BOUNDARY%%--

Currently, this expands to a fixed string, rather than being dynamically generated.

To support various data encodings, template variables also support lists of formatters. These are colon-separated words following the name of the template variable. For example, to html-encode the %%ADMIN_ADDR%% variable:

 %%ADMIN_ADDR:html%%

The following formatters are available:

html
HTML-encode the data.

upper
Format the data in ALL UPPERCASE.

lower
Format the data in all lowercase.

ascii
The data is transformed to ASCII. Any non-ASCII characters are replaced with a ``?'' character.

address
The email address portion of the data.

mail_name
The email name portion of the data.

h_utf8
The data is treated as a mail header, and is encoded into UTF-8. This formatter does not need to be used with the from, subject, and subjbody digest fields as they are already UTF-8 encoded.

For a list of all built-in templates and formatters, see PureMessage::Template.

Digest Blocks
Digest blocks expand to the actual data in the quarantine, and are delimited by %{ and %}.
 %{
 line 1
 line 2
 line 3
 %}

Each line in the Digest Block is parsed separately. Blank lines are skipped. Other lines are parsed according to the following rules:

  1. Pre-Header

    This line is prefixed with P:, then emitted verbatim just before the digest table.

     P:----start-of-digest----

  2. Header

    The header line is emitted before the data. It is prefixed with H:, and it should have the same number of fields as the format specifies.

    Example:

     H:Id Time Score From Subject

  3. Footer

    The footer line is emitted after all the data. It is prefixed with F:, and it should have the same number of fields as the format specifies.

    Example:

     F:Id Time Score From Subject

  4. Separator

    The separator is emitted between each row in the table. It is prefixed with S: and emitted verbatim.

    Example:

     S:--------------

  5. Variables

    The name of the field variables to output, in order. Prefixed with V:, these are names of data fields that are substituted with the data for each row.

    Example:

     V:id time reason from subjbody

    See Digest Templates for the list of known fields.

  6. Format

    The format line. A sequence of space-separated fields, some of which are expanded to contain the data.

    Any field that does not start with @ is emitted verbatim; otherwise the rest of the characters in the field describe the type of field.

    The different types of fields are:

    <
    A left-justified fixed-width field. The width is the number of < plus one.

    >
    A right-justified fixed-width field. The width is the number of > plus one.

    |
    A centre-justified fixed-width field. The width is the number of | plus one.

    [
    A left-justified expanding field. The minimum width is the number of [ plus one.

    ]
    A right-justified expanding field. The minimum width is the number of ] plus one.

    I
    A centre-justified expanding field. The minimum width is the number of I plus one.

    Example:

     @[[[[ @]]]]] @[[[[[ @<<<<<<< @<<<<<<<
Digest Fields
The following digests fields are available for using in the digest template:
id
The quarantine id of the message. This is extracted from replies by the auto-approve script.

release_href
A URL which, when clicked on, requests the release of the message. Should only be used in an HTML mail because it does not work in text clients.

time
The time the message was received.

date
The date the message was received.

reason
The reason the message was quarantined. If the reason is ``spam'', the spam probability is displayed instead.

from
The message's From header. This field is always UTF-8 encoded.

envfrom
The envelope sender of the message.

envto
The envelope recipient to whom the message was addressed.

subject
The subject of the message. This field is always UTF-8 encoded.

subjbody
The subject, plus a few lines of the body if the subject is short. This field is always UTF-8 encoded.

size
The size of the message in ``human readable'' format; i.e. with a suffix of ``M'' for megabytes and ``K'' for kilobytes.

In addition, you can apply all the same formatters to the digest fields as for Template Variables. The most useful of these is html, for escaping data in HTML sections.

For example:

 V:release_href reason:html time:html from:html subjbody:html


SEE ALSO

the pmx-qdigest-approve manpage, the pmx-qdigest-expire manpage. the pmx-queue manpage


COPYRIGHT

Copyright (C) 2000-2008 Sophos Group. All rights reserved. Sophos and PureMessage are trademarks of Sophos Plc and Sophos Group.