(Logo and goto: Prof. Timo Salmi, Department of Accounting and Finance, University of Vaasa, Finland) <http://www.netikka.net/tsneti/info/proctips.php>
URN:NBN:fi-fe20051953
Copyright © 1999- by Prof. Timo Salmi    
Last modified Wed 26-Oct-2016 09:57:00

(Timo's guestbook)
Prof. Timo Salmi [Offline. For maintenance only.] Prof. Timo Salmi To the home page of the department Accounting and Finance [Offline. For maintenance only.] The home page of the department Accounting and Finance To the home page of the University of Vaasa, Finland [Offline. For maintenance only.] WWW root level Timo Salmi on Twitter Timo on Facebook
Welcoming the user from 54.234.247.118
On Fri 15-Dec-2017 15:58:41 server time

0000634 Recounting since Sat 7-Mar-2015 Eearlier 2013 count 560164

 

Timo's procmail tips and recipes  

Timo's personal home page
Although there already is an abundance of procmail material on the net, here are some of my own tips and observations. This tips page is a companion of my Foiling Spam with an Email Password System page. The items on this page are in no particular order. There is some overlap in the items.
  1. I want to filter my email automatically. How do I get started with procmail?
  2. Building a testbench. How can I test individual procmail recipes?
  3. I know how to make "and" rules in procmail recipes, but how do I make "or" rules?
  4. How can one perform multiple shell commands on the action line?
  5. How can I find out what the subject of a posting is?
  6. How do I get a copy of the headers of all the incoming email into a separate file?
  7. Would you give some further hints for spam foiling recipes?
  8. I have limited disk space. How can I truncate long messages?
  9. How can I quickly test if my rules with regular expressions match?
  10. How can I detect if the email comes, say, from the .com domain?
  11. What alternatives do I have to detect a sender all through the various header-fields?
  12. How can I extract a valid address from the Reply-To field?
  13. How can I extract the address of the sender's postmaster?
  14. How can I weed out an inordinately long recipient list?
  15. What is this procmail scoring? How can I utilize it?
  16. How can I test if the subject is empty or if the subject field is missing altogether?
  17. How can I modify the "To:" field of the email I received?
  18. I have a long list of spammers in a separate file. How can I utilize it?
  19. How do I forward certain messages that I get, and preserve myself a copy?
  20. How do I forward certain messages to two different addresses?
  21. How do I automatically return certain email messages?
  22. My address has changed. How do I forward a copy to myself and tell the sender?
  23. How can I set variable values based on the text in the body of the email message?
  24. How can I insert some token text in front of the body of incoming email?
  25. Do you have any useful tips for regular expression matching?
  26. How can I test if two procmail variables have the same contents?
  27. I am having difficulties with "<". How does one match it?
  28. How can I insert identification text to the beginning of the subject line?
  29. I tried out your tips, but some of them failed on my system. What next?
  30. Is there a cure for the echo and grep blues?
  31. How do I know which of my many procmail recipes has been enacted?
  32. How can I detect Korean, Cyrillic, or Chinese to avoid such frequent spam?
  33. How can I change the subject line and include part of the message body into it?
  34. How can I remove the signature from the incoming email?
  35. What unix manuals relating to procmail should I get?
  36. Is it possible to use procmail to call the vacation program?
  37. How can I avoid duplicate messages sent in rapid succession?
  38. How can I skip logging a certain, matched recipe?
  39. Could you please solve for me this procmail problem of mine?
  40. I liked this material. Do you have anything else on programming?
  41. Exercises
  42. Acknowledgements for useful advice and/or feedback
Web pages or any other reproduction: This page is copyrighted ©. No part of this material, nor its index, nor the entire contents may be reproduced (in any language) on any other World Wide Web pages or in any other electronic, physical or similar manner.

Quoting: However, you are free to quote brief passages from this page or to post links to the items in your messages and Usenet news postings provided you clearly indicate the source.

Asking for programming advice: Please see the item Could you please solve for me this procmail problem of mine?

Submitting: Contributions on the net are acknowledged further down on this page. Note, however, that submissions of outside items to this collection are not invited. This basically is my own collection, not an open repository.

Disclaimer: The author shall not be liable to the user for any direct, indirect or consequential loss arising from the use of, or inability to use, any information, rule, script, program or file, howsoever caused. No warranty is given that the information, rules, scripts, programs or the advice given will work under all circumstances or that they are current. You use everything at your own risk.

I want to filter my email automatically. How do I get started with procmail?
 
Unix email can conveniently be preprocessed with automatic filters such as procmail, the "Autonomous mail processor". This item repeats what already is presented about getting started in many of the other FAQs, including mine on spamfoiling. Nevertheless, this is so crucial that I'll try to give the essential outline also here.
 
Find out what your email directory is. Go ("cd") to the directory where your email folders are located and type "pwd". Assume in this item that you get "/home/myid/Mail". Further assume in the example that "/home/myid" is your home directory so that you can use the variable "${HOME}" to denote it.
 
Find out where your system's Bourne shell is located by typing "which sh". Assume that you get "/usr/bin/sh".
 
Prepare a "~/.procmailrc" file with a suitable editor. For example you might use "emacs ~/.procmailrc". To start with, put something like this into the ~/.procmailrc file:
#Preliminaries
SHELL=/usr/bin/sh               #Use the Bourne shell (check your path!)
MAILDIR=${HOME}/Mail            #First check what your mail directory is!
LOGFILE=${MAILDIR}/procmail.log
LOG="--- Logging ${LOGFILE} for ${LOGNAME}, "

#Whatever recipes you'll use
#The order of the recipes is significant
:0
* ^From: scam@cyberspam\.com
/dev/null

# Accept all the rest to your default mailbox
:0:
${DEFAULT}
For the "~/.procmailrc" file a read permission for the user him/herself will be sufficient. To ensure, give the command "chmod u+r ~/.procmailrc".
 
Find out where the "procmail" program is located on your system by typing "which procmail". Assume below that you get "/usr/local/bin/procmail". Also check what your id is: "whoami". Assume that you get "myid".
 
Next comes the crucial step. Put the following line in your "~/.forward" file. Include the quotes (") into the ~/.forward file contents.
"|IFS=' ' && exec /usr/local/bin/procmail || exit 75 #myid"
Set adequate permissions for accessing the "~/.forward" file: "chmod 644 ~/.forward". Lastly, check ("ls -lFd ~/") that your main directory permissions are at least (the equivalent of) "drwx--s--x". If not, "chmod u+rwx ~/" and "chmod og+x ~/".
 
You should now be set to go. To check, send an email to yourself to see if it gets through. If there is a problem see the advice on troubleshooting.
How can I test individual procmail recipes? I do not wish to disturb my regular ~/.procmailrc recipes file in the process.
 
There are several options. One method is building a simple test environment as follows. It is a very convenient method. If you apply it right, it allows the testing without affecting your normal flow of email in any way. Create the following "proctest" file, preferably at your path. Make it executable using "chmod u+x proctest". Thus you'll have a new command "proctest" available.
 
#!/bin/sh
#The executable file named "proctest"
#
# You need a test directory.
TESTDIR=/home/myid/test
if [ ! -d ${TESTDIR} ] ; then
  echo "Directory ${TESTDIR} does not exist; First create it"
  exit 0
fi
#
#Feed an email message to procmail. Apply proctest.rc recipes file.
#First prepare a mail.msg email file which you wish to use for the
#testing.
procmail ${TESTDIR}/proctest.rc < mail.msg
#
#Show the results.
less ${TESTDIR}/Proctest.log
clear
less ${TESTDIR}/Proctest.mail
#
#Clean up.
rm -i ${TESTDIR}/Proctest.log
rm -i ${TESTDIR}/Proctest.mail

The beauty of this method is that besides "proctest.rc" you can easily edit also "mail.msg" for testing different kinds of incoming mail and the behavior of your recipes in various situations. Note, however, that it is best to test only for one email message at a time. In other words, do not put more than one email message into the mail.msg test file.
 
A question remains. Where does one get the structure of a posting for the "mail.msg" test posting? Easy. Invoke elm, select a suitable, existing posting, and make a copy of it to "mail.msg" by pressing C (capital C) and reply mail.msg to "Copy message to:". Other mail programs probably have similar options.
 
Below is the proctest.rc recipe file which I used in preparing for this item:
SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "

#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all

#Let's test stripping lines from the email message's header
:0 fwh
| egrep -vi "(^Content-|^MIME-Version:.)"

#If it is from myself, store the email message
:0:
* $ ^From:.*${LOGNAME}
${TESTDIR}/Proctest.mail

#Otherwise, discard the email message
:0
/dev/null
Feedback: The header stripping does not work if any of those header lines is continued. It is almost always an error to use grep/egrep/fgrep when filtering a message header. A better recipe would be the following, utilizing formail:
 
#Let's test stripping lines from the email message's header,
#but only when they're there
:0 fwh
* ^(Mime-Version:|Content-)
| formail -IMime-Version: -IContent-
 
To continue myself. The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is about the header of the email message.
 
The formail -I switch means that if the field is found it is to be replaced with a similar field with and "Old-" prefix, provided that the field is not empty (if it is empty the field is removed).
I know how to make "and" rules in procmail recipes, but how do I make "or" rules?
 
Just in case, let's first revisit an "and" rule by a common example:
#Trivial catching of potential spam towards the end of a ~/.procmailrc
#Place only after accepting all the mailing lists you want to receive
:0:
* ! ^TO_ts@([-a-z0-9_]+\.)*uvasa\.fi
* ! ^TO_timo\.salmi
${HOME}/.mail/PotentialSpam.mail
For entering an "or" rule, consider the following example:
#Accept email from Era Eriksson, the author of the major procmail FAQ
:0:
* ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
  ^From:.*era@iki\.fi
${DEFAULT}
Let's look at a few details: There are alternatives. Scoring could be used for the same purpose
:0:
* 1^0 ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi
* 1^0 ^From:.*era@iki\.fi
${DEFAULT}
Likewise, you could alternatively use ( ) grouping
:0:
* ^From:.*(\
reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
era@iki\.fi)
${DEFAULT}

Feedback: That condition looks a bit ugly to me. Let me rephrase it to show you what I mean:
 
* ^From:.*(reriksso@([-a-z0-9]+\.)*helsinki|era@iki)\.fi
 
(an underscore can not be part of a hostname, as far as I know.)
 
Yes, many of the rules presented in this FAQ can be written more concisely and/or effectively. The rules, as presented in the FAQ, are often formulated for easier understanding than efficiency. But it is useful to improve on the efficiency after one first has got the basic logic of a rule outlined.
How can one perform multiple shell commands on the action line?
 
See the action line below (i.e. the one starting with the "|" pipe). Separate the commands with "&&". If you wish to continue on a second line for readability, apply "\" Alternatively, just one long line could have been used. The recipe below is from a test with the testbench. Its purpose is just to show this method of giving multiple commands.
 
#Test if the message has a "Subject:" header and has a subject in it
#(The brackets [] contain a space and a tab)
:0:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* ^Subject:[ ]*\/[^ ].*
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
  echo "${MATCH}" >> ${TESTDIR}/Proctest.mail

 
Likewise, a single command can be subdivided for easier documentation:
 
| echo "A ^Subject: header found but there is no subject" \
  >> ${TESTDIR}/Proctest.mail

 
Below is another example with a slightly different syntax using the semicolon ";" as the separator. The example also demonstrates how to save diskspace by zipping email from a particular source. You'll need Info-ZIP's zip and unzip in order to be able to apply it. (They are available from the proper Unix section of Garbo program archives at the University of Vaasa, Finland.)
 
:0w:Test.mail.lock
* ^From:.*test
| unzip ${HOME}/mail/Test.zip; \
  cat >> Test.mail; \
  zip -oj9 ${HOME}/mail/Test.zip Test.mail; \
  rm -f Test.mail

 
What happens on the action line is this:
  1. The potentially existing "Test.zip" zip-file is unzipped to obtain the earlier email messages that already might be within Test.zip.
     
  2. The incoming email is appended to the extracted Test.mail file.
     
  3. The updated Test.mail file is compressed back into the Test.zip zip-file.
     
  4. The uncompressed Test.mail is deleted.
To be on the safe side procmail is told to wait (the "w" flag in ":0w:Test.mail.lock") until the pipe ("|") has been performed.
How can I find out what the subject of a posting is?
 
Now is a good time to utilize my testbench in order to find out if a logic works. Build a /home/myid/test/proctest.rc file.
SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "
First, a few environment variables are included.
#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all
The above means: Use full reporting for the debugging.
#An auxiliary regular expression to detect text,
#The brackets [] contain a space and a tab
GETTEXT="[  ]*\/[^  ].*"
If the same expression is used several times in a recipe file, it is convenient to put the expression into an environment variable instead of writing it out repeatedly. #Test if the message has a "Subject:" header and has a subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ ^Subject:${GETTEXT}
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
  echo "${MATCH}" >> ${TESTDIR}/Proctest.mail

#Test if the message has a "Subject:" header but has no subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ !^Subject:${GETTEXT}
| echo "A ^Subject: header found but there is no subject" \
  >> ${TESTDIR}/Proctest.mail
 
#Test if the message has a "Subject:" at all
:0c:${TESTDIR}/Proctest.mail.lock
* !^Subject:
| echo "No ^Subject: header was found" >> ${TESTDIR}/Proctest.mail
 
#Otherwise, discard the message
:0
/dev/null

 
After the recipes above have been testbenched and cleared, you know that the methods used in them will work for you in your own environment.
 
Of course, there are other options for extracting the subject into an environment variable. One is to utilize "formail" which is a companion to the procmail program. If you include the following expression at the beginning of your ~/.procmailrc recipes file, you will have the variable ${SUBJECT} available for the rest of the recipes file.
 
#Environment variables for procmail
#
#Get the subject
#Discard some dangerous special chars + any leading and trailing blanks
SUBJECT=`formail -xSubject: \
         | tr '\;\`\\' '   ' \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

 
For an example of usage see the Foiling Spam with an Email Password System page.
 
Feedback: Extracting the header from inside procmail using the \/ token is _much_ faster than the formail solution.
 
Feedback: If the SUBJECT variable is left empty, apply quotes on the first line, i.e.
SUBJECT=`formail -x"Subject: "\
How do I get a copy of the headers of all the incoming email into a separate file?
 
You can use
 
#Header logging
:0hc:${HOME}/.mail/Procmail.head.lock
| cat >> ${HOME}/.mail/Procmail.head
Feedback: Since appending to a file is the result of a normal mailbox delivery, that can be more efficiently written as simply:
 
:0 hc:
$HOME/headers.cut

That eliminates a cat and a shell process, plus the pipe and extra reads and writes.
 
Now, if you want to overwrite the file with each new message [or do some further shell operations within the pipe], then the cat command is a reasonable choice.
 
[A further point] That would have been an odd name for the lockfile. Why not $HOME/headers.cut.$LOCKEXT?


Would you give some further hints for spam foiling recipes?
 
Besides what is on my page Foiling Spam with an Email Password System and a separate item on detecting the sender, below are some instructive little tricks.
 
Perhaps the strongest generic trick against spam is to shirk any email that is not addressed to you directly, since most spam is addressed to some kind of mailing lists. Of course, you first will have to accept email from any legitimate mailing list which you have subscribed to. If you put a suitable recipe after your recipes that accept the legitimate email lists much of the incoming spam will be caught. Below is a simplified and a bit munged) version of what I do in my own ~/.procmailrc:
#Catch potential spam
:0
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
{
  :0 fwh
  * ^Content-Length:
  | formail -IContent-Length:
  :0:
  Spam.mail
}
If you look carefully through this page, you'll find explanations for all the details in the above recipe. It will be a good exercise to do so. :-)
 
Since so much, if not practically all spam comes from forged sender addresses it is much more effective to block certain suspect email routes than to try to match the elusive spammers. The scoring recipe example below treats as spam all email that is routed via dialsprint.net and that is not addressed to "me" personally.
#Spam avoidance of certain routes and if not for me personally
:0:
* -1^0
*  1^0 ? formail -x"Received:" | egrep -is "dialsprint\.net"
*  1^0 ! ^TO_(myid|myFirstName[ _\.]myLastName)@([-a-z0-9_]+\.)*myhost\.mydom
Spam.mail
Fairly often there is a tell-tale exhortation to email to a remove@ or a removeme@ address within the actual message. As you may know, these are just common ploys of the spammers to get your address confirmed to make matters even worse for you.
:0B:
* (remove@|removeme@)
PotentialSpam.mail
The subject line of the allegedly more respectable [sic] unsolicited advertising has an "ADV" marker in upper case on the subject line. (For an imaginary legitimacy such spammers occasionally attach some xenophobic quibble about U.S legislation, not very relevant on the international Internet.)
:0D:
* ^Subject:.*ADV
PotentialSpam.mail
There are some obvious code words that tend to appear on the subject line, such as "make money fast" and "$$$".
:0:
* (^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)
PotentialSpam.mail
Don't overdo it, though, lest you end up weeding also some legitimate email.
 
Feedback: The regexp:
 
(remove@|removeme@)
 
is much slower than
 
remove(me)?@

 
Having the 'top-level' of the regexp be an alternation (via '|') slows down matching by quite a bit. The more that can be factored out at the beginning of the regexp, the better. The same goes for the recipe that matches against the Subject: header-field:
 
^Subject:.*(make.*money.*fast|\$\$\$)
 
is faster than:
 
(^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)

 
My comment: Of course it is commendable to be efficient, especially where easy understanding is not compromised. However, if the two clash, I often prefer clarity of expression and convenience over a strict maximization of code efficiency. Don't we have our powerful modern computers to perform our tasks for us, not vice versa :-). (This is not about the particular feedback above. The improvements are useful. They are both legible and instructive.)
 
More feedback: The "* ^Subject:.*ADV" rule is overly simplistic and catches many non-spam subjects. Maybe rather something like "* ^Subject:\<*ADV\>"
 
My comment: Ok. Let's try
 
:0D:
#(The brackets [] start with a space and a tab)
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
  ^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail

 
It is far from perfect, but it should work reasonably well for regular purposes. Spam detection requires experimenting anyway. Regular expressions are not easy. They are quite a large subject area of their own.
 
The above assumes that there is (at least) one space after the "Subject:" header before the subject begins. This can be ensured by first applying "formail -z" which you can have high up your ~/.procmailrc. For example I have the upper two lines in mine.
 
:0 fwh
| formail -z -iContent-Length:
 
:0D:
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
  ^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail

 
See the other items in this tips file for an explanation of the "fwh" flags. The formail program with the "-z" switch will insert the desired blanks into the header. The "-iContent-Length:" switch (which is outside the theme of the current item) will replace the Content-Length: headers with Old-Content-Length: headers.
 
I use a slightly different recipe in my own ~/.procmailrc recipes file:
 
:0D
* ^Subject:.*([ ]|<|\[)ADV([ ]|>|:|\]|$)
{
  :0
  { RULE="Catch potential spam by detecting an ADV keyword" }
  :0
  /dev/null
}

 
If you wonder about the "RULE" variable, see the item about logging which rules have been used.
 
On to a different facet. Some ISPs (Internet Service Providers) do now allow numbers in the email addresses. Thus, you may identify some of the forged spam by catching a violation in this respect. The following recipe catches email with numbers in the user id before the @ mark from all the various nodes on "respectable.net".
:0:
* ^From:.*[0-9].+@([-a-z0-9_]+\.)*respectable.net
PotentialSpam.mail

Date: Thu, 19 Dec 2002 10:44:44 +1000
From: Philip Gunter
To: Timo Salmi
Subject: A procmail tidbit

Hi Timo, thanks for your excellent procmail reference.

Here is a small recipe you might like to add to your site.
It limits the number of emails being forwarded from an account,
useful to stop sms storms.

Cheers,
Philip.

:0
{
  :0
  {
    # remove any sms-alert files older than 5 minutes
    GLOP_=`find /var/tmp/sms -name sms-alert\* -cmin +5 -exec rm -f {} \;`

    # Create an sms-alert file for this message.
    GLOP_=`touch /var/tmp/sms/sms-alert$$`

    # Count the number of sms-alert files
    COUNT=`ls /var/tmp/sms | grep sms-alert | wc -l`
    COUNT1=`expr ${COUNT}`

    # Check if number of alerts in the last 5 minutes is less than 2
    ISLT=`expr ${COUNT1} \< 2`
  }
  :0:
  # if the expression is true then forward the email
  * ISLT ?? ^^1^^
    ! 0123456789@pager.net
}

I have limited disk space. How can I truncate long messages?
 
Before we proceed any further, there is a very important email feature to observe. If you alter the content-length of a message it is highly advisable first to discard any "Content-Length:" lines from the email's header. If you fail to do that, there is the danger that next time you read the relevant email folder your email program will break your folder because of erroneous length information. Many email programs are brain-dead that way.
 
#Truncate messages longer than 4000 bytes to 100 lines
:0
* > 4000
{
  :0 fwh
  * ^Content-Length:
  | formail -IContent-Length:
 
  :0:Truncated.mail.lock
  | head -100 >> Truncated.mail
}

 
Some details: Let's expand the recipe a bit.
 
#Truncate messages longer than 4000 bytes to 100 + 10 lines
:0
* > 4000
{
:0 fwh
  * ^Content-Length:
  | formail -IContent-Length:
 
  :0c:Truncated.mail.lock
  | head -100 >> Truncated.mail &&\
    echo "-:-:-:- (snip) -:-:-:-" >> Truncated.mail
 
  :0:Truncated.mail.lock
  | tail +101 | tail -10 >> Truncated.mail
}

 
A few observations: Another option is to compress the incoming email instead of truncating it.
How can I quickly test if my rules with regular expressions match? The fuller procmail testbench is a bit heavy a machinery for quick testing.
 
Let's see. A lite version of the testbench could be the following. Put the rules you wish to try out in a "greptest" file of your rules with egrep since procmail matching closely (but not quite!) follows egrep's. Make the file executable with "chmod u+x greptest". Then make a "mail.msg" file with the texts you wish to try to match (or not to match). Thus you might have:
#The executable file named "greptest"
#!/bin/sh
egrep -i '(ts|timo\.salmi)@([-a-z0-9_]+\.)*uvasa\.fi' mail.msg
#
#Allow a quick visual comparison on the screen
echo ""
cat mail.msg

#The mail.msg target file with the trial text for the matching
ts@uvasa.Fi
ts@loisto.uvasa.fi
Timo.Salmi@uvasa.Fi
Timo.Salmi
null@uvasa.fi
Then, just give the command "greptest" and visually compare the outputs.
 
Miscellaneous notes:
How can I detect if the email comes, say, from the .com domain?
 
I have been baffling over this item myself, because it is not as trivial as it first appears. The catch is that the ".com" is exactly at the end of the address. The problem naturally is that in the email headers there can be text after the email address, such as the sender's name. E.g.
From: scam@cyberspam.com (The Big Bad Spammer)
The first solution that comes to mind is the following, but it is not entirely accurate.
:0:
* ^From:.*\.com
* !^From:.*\.com\.
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
ProbableComSpam.mail
Quite possibly there are better solutions, but below is what I came up with for hopefully an accurate match:
# Get the sender's address
# Discard any leading and trailing whitespaces
FROMADDR_=`formail -rt -xTo: \
           | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Test if the email came from the .com domain
:0:
* $ ? echo ${FROMADDR_} | egrep -is '\.com$'
ComDomain.mail
There is one small convenience in the first, inaccurate recipe version. It is easy to include several domains into the same recipe. For example:
:0:
* ^From:.*\.hk|\
  ^From:.*\.kr|\
  ^From:.*\.tr
* !^From:.*\.hk\.|\
  !^From:.*\.kr\.|\
  !^From:.*\.tr\.
* !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
An aside: You could also utilize a more condensed format:
* ^From:.*\.(hk|kr|tr)
(Condensing the rest of the above recipe is left as an exercise.)
 
Using scoring is one option. The recipe could also be rewritten as
#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Whatever other recipes in between.

#Spam screening of certain susceptible domains
:0:
* -1^0
*  1^0 $ ? echo ${FROM_} | egrep -is '\.hk$'
*  1^0 $ ? echo ${FROM_} | egrep -is '\.kr$'
*  1^0 $ ? echo ${FROM_} | egrep -is '\.tr$'
*  1^0 !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
There also is the option
 
:0:
* ^From:.*\.hk([ >]|$)|\
  ^From:.*\.kr([ >]|$)|\
  ^From:.*\.tr([ >]|$)
* ! ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

What alternatives do I have to detect a sender all through the various header-fields?
 
If we only look at the "From:" field in the header we have the familiar:
#Accept all email from myself, weed out autoreplies
:0:
* ^From:.*myid@([-a-z0-9_]+\.)*myhost\.mydom
* ! ^X-Loop: myid@myhost\.mydom
${DEFAULT}
Next, let's extend the matching to more fields in the header:
:0
* ? formail -x"From" -x"From:" -x"Reply-To:" -x"Errors-To:"\
    | egrep -i "scam@cyberspam\.com"
/dev/null
We can utilize a predefined expression to match the header-fields. The clever "FROM" expression below comes from Jari Aalto's procmail material.
 
FROM="^(From[ ]|(Old-|X-)?(Resent-)?(From|Reply-To|Sender):)(.*\<)?"
#(whatever else in between)
:0
* $ ${FROM}scam@cyberspam\.com
/dev/null
You may go even further in your detective work and include the information from the header's "Received:" lines. That is, you also can detect if something what you wish to avoid is along the route where the email came from.
:0
* ? formail -x"Received:"\
    | egrep -i "cyberspam\.com"
/dev/null
Spam email is sometimes indicated by a missing or an empty "From:" line in the header. Furthermore, the "From:" line might contain an empty <> instead of having a proper address within the <>. Using scoring we might have something like
 
:0:
* 1^0 ^From:([ ]$|$)
* 1^0 ! ^From:
#A catch: Don't use here the word-boundary operators \< \>
#Use just the plain <>
* 1^0 ^From:.*<>
NoFrom.mail

Under a worst-case scenario, the various sender headers might all be empty. To test for this unlikely eventuality we can utilize the fact that formail would put a "foo@bar" into the "FROM_" under such circumstances.
# Define getting the sender's address
# Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Test if the sender could not be identified at all
:0:
* FROM_ ?? foo@bar
NoSender.mail
As always, there are several alternatives to solving a problem. Consider a potential case where a spammer poses as the mailer-daemon but the "From:" header is either missing or total gibberish. How to detect this situation? The second condition in the recipe below ensures that there is "From:" line in the header, and that it has some elementary validity.
:0:
* ^From[  ]*MAILER-DAEMON
* ! ? formail -x"From:" | egrep -is "[a-z]"
ProbableSpam.mail

How can I extract a valid address from the Reply-To field, and that field only?
 
One trick is to utilize the following variable definition letting formail do the worrying about the proper address format.
REPLYTO_=`egrep "^Reply-To:" | head -1 \
         | formail -c -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
If you put the REPLYTO_ definition high up in your ~/.procmailrc you will have the variable available to the rest of your recipes.
 
Feedback: Let me suggest this:
REPLYTO_=`formail -cXReply-To: | head -1 | formail -rtzxTo:`
Timo's further comments:
How can I extract the address of the sender's postmaster?
 
Put these definitions high up in your ~/.procmailrc :
#Get the sender's address, the generic version
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`

#Build the postmaster's address
FMAST_="postmaster@${FHOST_}"
Thus, you have the postmaster's alleged address available as ${FMAST_} from this point on in your recipes file. Note, however, that all validity testing of the address is missing.
 
What happens in the FROM_ formula: Formail uses a certain priority order in preparing the reply header. If there is a "Reply-To:" field in the header, the "FROM_" variable will contain that address. In same cases one may wish to ignore that field for example to prevent malicious relaying. Here is the how:
#Get the sender's address, ignore Reply-To:
FROM_=`formail -I"Reply-To:" -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

How can I weed out an inordinately long recipient list? I am one of the recipients of a very useful professional mailing list, but it lists in its "To: " header-field all the recipients to the list. Furthermore, it repeats the messages in HTML format. The text format is sufficient for me.
 
The (only slightly modified) example below is based on a true situation from my own ~/.procmailrc.
#Ensure a whitespace exists between field name and content
#Comment "Old-" the Content-Length field from all the headers
:0 fwh
| formail -z -i"Content-Length:"

#(whatever else in between)

:0
* From:.*the-mailing-list-maintainer
* ^TO_the@first\.recipient\.edu
{
  :0 fw
  | formail -I"To:" -I"X-" -I"Content-Type:" -I"MIME-Version:"\
    -A "To: Maintainer's long recipient list suppressed" \
  | sed -e '/^This is a multi-part /,/^Content-Transfer-Encoding: /d' \
        -e '/------=_NextPart_/,$d'

  :0:
  ${DEFAULT}
}

What is this procmail scoring? How can I utilize it?
 
This is a somewhat complicated subject with material dispersed throughout the various procmail FAQs. Basically scoring is a method to count how many of the conditions are fulfilled in a recipe and if the "score" is positive, that is the score is 1 or more, the action line in the recipe will be performed. There is much, much more to scoring, but this is a good starting point.
 
Consider the following simple spam foiling recipe. It will put the email into the ProbableSpam.mail file if the score adds up to at least to one. If the first condition is met, 1 is added to the score. Ditto for the second condition. Thus if either of the tell-tale spam signals occur, the score will be positive (that is greater than zero) and the action (storing the email message into the ProbableSpam.mail file) will be enacted.
:0:
* 1^0 ^Subject:.*make money fast
* 1^0 ^Subject:.*\$\$\$
ProbableSpam.mail
The example above uses equally-weighted scoring. One can also have unequal scores. Below, a hit of the second condition gives two points while a hit of the first only gives one.
* 1^0 ^Subject:.*make money fast
* 2^0 ^Subject:.*\$\$\$
Scoring can be used to build some extremely trivial artificial intelligence into the recipes. Consider the following
:0:
* -1^0
*  1^0 ^Subject:.*money
*  1^0 ^Subject:.*fast
*  1^0 ^Subject:.*\$\$\$
ProbableSpam.mail
An alternative formulation of scoring to foil spam is given below. This time it is required that at least three of the score-condition lines match. (The [] contain a space and a tab, as usual.)
 
:0:
* ^Subject:[ ]*\/[^ ].*
* -2^0
* 1^0 MATCH ?? ()\<easy\>
* 1^0 MATCH ?? ()\<fast\>
* 1^0 MATCH ?? ()\<(cash|money)\>
* 1^0 MATCH ?? \$\$\$
ProbableSpam.mail
Further, simple examples
#Catch potential spam by examining the email route
:0:
* 1^0 ? formail -x"Received:" | egrep -i "157\.161\.140\.2"
* 1^0 ? formail -x"Received:" | egrep -i "199\.217\.231\.46"
* 1^0 ? formail -x"Received:" | egrep -i "212\.106\.213\.36"
* 1^0 ? formail -x"Received:" | egrep -i "216\.154\.1\.82"
ProbableSpam.mail
This 'precision' recipe checks in the message header both the "From:" field and the "Received:" path of a forgery spam.
#Avoid a specific forgery spam
:0:
* -1^0
*  1^0 ^From:.*mikerobbins2000@hotmail\.com
*  1^0 ? formail -x"Received:" | egrep -is "psi\.net"
Spam.mail
Scoring and ordinary conditions can be mixed in the rules. For example the two recipes below achieve roughly the same thing, but the latter option produces less steps if the email is for you.
:0:
* -1^0
*  1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
*  1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
*  1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
*  1^0 ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
ProbableSpam.mail
The formail switches in the above are The fgrep (search a file for a fixed-character string) switches in the above are The above example could also be written more efficiently without scoring as
:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* ^Received:.*(\
alladvantage\.com|\
ameritech\.net|\
bellatlantic\.net)
ProbableSpam.mail

How can I test if the subject is empty or if the subject field is missing altogether?
 
Scoring seems to be the answer:
:0:
* 1^0 ^Subject:([  ]$|$)
* 1^0 !^Subject:
NoSubject.mail
As usual, the brackets [] contain a space and a tab.
 
There are other options to test for an empty "Subject:" or an entirely missing "Subject:" field. The one below puts the subject contents in a variable. The actual recipe then tests if the value of the "SUBJ_" variable is empty. (Also see the feedback about the syntax.)
#Get the subject discarding any leading and trailing blanks
SUBJ_=`formail -xSubject: \
       | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Test for an empty or missing subject
:0:
* SUBJ_ ?? ^^^^
NoSubject.mail

How can I modify the "To:" field of the email I received?
 
I am not exactly sure why you wish to do this, but here is how to replace the "To:" header-field of a message using formail. Choose the formail "-i" option to rename the old "To:" field to be "old-To:" and to insert the new "To:" header-field. The flags in the recipe are as follows: "f" use the pipe as a filter, "h" it is about the header of the email message, "w" execute before proceeding down the rest of the "~/.procmailrc".
:0 fhw
* To.*myoldid@myoldhost.myolddom
| formail -i "To: mynewid@mynewhost.mynewdom"

I have a long list of spammers and other Internet lowlife in a separate file. How can I utilize it?
 
The technique is fairly simple. Put this in your "~/.procmailrc" file:
MAILDIR=/home/myid/Mail   #The location of your own mail directory
# Whatever other preliminaries

# Whatever other recipes

# Test if the email's sender is in the blacklisted
:0
* ? formail -x"From" -x"From:" -x"Sender:" \
    -x"Reply-To:" -x"Return-Path:" -x"To:" \
    | egrep -is -f black.lst
/dev/null
Prepare a "/home/myid/Mail/black.lst" file with contents something like:
abc23@airnewz.ccn
abdu@advis.com.tr
adexec@mail.com
dinner@dine.com
friend@public.com
helpingyou@mail.com
mk1977@ms1.kingnet.com.tw
nb8MAMxhq@mail.com
no@body.com
owieuj@peterlink.ru
patkline00@usa.net
promotions@web-vertise.com
unknown@unknown.com

How do I forward certain messages that I get, and preserve myself a copy?
 
Below is an example:
#Get the sender's bare email address from the first "From" line
FROM_=`formail -c -x"From " \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
         | awk '{ print $1 }'`

#Get the original subject of the email
#Discard superfluous tabs and spaces
#On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
         | expand \
         | sed -e 's/  */ /g' \
         | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Whatever other recipes you'll use

:0
* ^From:.*infolist@([-a-z0-9_]+\.)*infohost\.infodom
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0c:   #Preserve a copy of the email
  Infolist.mail
  :0fwh  #Adjust some headers before forwarding
  | formail -A"X-Loop: myid@myhost.mydom" \
            -A"X-From-Origin: ${FROM_}" \
            -i"Subject: $SUBJ_ (fwd)"
  # Forward the email
  :0
  !mydept@myhost.mydom
}
Another example, another method for forwarding:
SENDMAIL=/usr/sbin/sendmail
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
SUBJ_=`formail -xSubject: \
       | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Make a copy of all email to my second address
:0
* ! ^X-Loop: myid@myhost\.mydom
{
  :0c:${HOME}/procmail.lock
  | formail -A"X-Loop: myid@myhost.mydom" \
    -I"Subject: ${SUBJ_} [autofwd]" \
    | ${SENDMAIL} -f"${FROM_}" my2ndId@my2ndHost.mydom
}

How do I forward certain messages to two different addresses?
 
I have the following recipe in my ~/.procmailrc file, but the email does not get forwarded to the myid2@myhost.mydom address.
 
  :0 c
  *^From.*info.gov
    ! friend@somehost.domain myid2@myhost.mydom

 
I am not sure what is wrong with that, but at least the solution below should work:
 
:0
* ^From.*info.gov
* ! ^X-Loop: myid@myhost\.mydom
{
  :0fwh
  | formail -A"X-Loop: myid@myhost.mydom"
  :0c
  ! friend@somehost.domain
  :0
  ! myid2@myhost.mydom
}

 
The X-Loop is not relevant from the point of the stated problem, but using it as a safeguard is always advisable.
 
Feedback: The reason that the first one does not work is that the recipients' addresses are separated by space while they should be separated by a comma [as in]
 
  :0
  ! friend@somehost.domain,myid2@myhost.mydom

 
(I have not tested this one.)
How do I automatically return certain email messages?
 
Ah! Another potential case of spam avoidance? (This is a companion page to Foiling Spam with an Email Password System, remember.) Below is an example. But be sensible in using the method, since most spam has forged senders.
 
#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
#Whatever other recipes in between.
 
#Return certain email
:0
#
# Is the email from a frequent spam domain?
# (Note: fgrep takes no regular expressions)
* ? formail -c -x"Received:" | fgrep -is 'cyperspam.com'
#
# Is it for a mailing list rather than to me?
* ! ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
#
# Avoid forgeries that pretend to be from my own site
* ! $ ? echo ${FROM_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FROM_} | fgrep -is '.'
* $ ? echo ${FROM_} | fgrep -is '@'
#
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  # Make a temporary file of the message to be returned
  :0c:formail.lock
  # Discard whitespaces, insert a leading blank
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  # Prepare and send the rejection
  # Be sure to customize your sendmail path
  :0:formail.lock
  | (formail -r -I"Subject: Rejected mail: Recipient refusal" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp) \
    | /usr/lib/sendmail -t
}

There can be many variants of detecting and returning email which one does not wish to get. Below is a fictitious example utilizing variables to enhance the flexibility of the return address handling. (If you are baffled by the "RULE" variable, which is just a sideline here, see the item on identifying executed recipes.)
 
:0
* ^From:.*(charpie|charpie5266)@mydeja\.com
{ REJECT="charpie5266@mydeja.com" }
:0
* ^From:.*umidextr@([-a-z0-9_]+\.)*mindfall\.com
{ REJECT="umidextr@mindfall.com" }
:0
* ^From:.*(rasch|Greg.*\.Rasch)@([-a-z0-9_]+\.)*millkirn\.com
{ REJECT="rasch@millkirn.com" }
:0
* ^From:.*(daren|Daren[_\.]Risenthal)@([-a-z0-9_]+\.)*slunet\.org
{ REJECT="daren@slunet.org" }
 
:0
* ! REJECT ?? ^^^^
{
  :0
  { RULE="These users I do not want to talk with" }
  :0cw
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  :0:procmail.lock
  | (formail -r -I"To: ${REJECT}" \
    -I"Subject: Rejected mail: Recipient refusal" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp) \
    | /usr/lib/sendmail -t
}

 
Note how the above set of rules has two parts, the actual detection plus the return address definition, and the return action. The latter could be written in many alternative ways, including
 
:0
* ! REJECT ?? ^^^^
{
  :0cw
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  :0 fwh
  | formail -r \
    -A"Subject: Rejected mail: Recipient refusal" \
    -A"From: myid@myhost.mydom" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp
  :0
  ! ${REJECT}
}


My address has changed. How do I forward a copy to myself and tell the sender?
 
This is a theme whose constituents already are covered throughout this material. But also take a look at "man procmailex" for the "vacation database" idea even if a better name here would be something like "dejatold database".
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
       | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

:0
# Was it to me
* ^TO_myoldid@myoldhost\.myolddom
# Ignore messages for daemons
* ! ^FROM_DAEMON
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0 c
  ! myid@myhost.mydom
  :0:dejatold.lock
  | formail -rD 8192 dejatold.cache
  :0 eh
  | (formail -r \
     -A"X-Loop: myid@myhost.mydom" \
     -I"Subject: Changed email address" ; \
     echo "Dear Sender," ; \
     echo "" ; \
     echo "Thank you for your email about" ; \
     echo "\"${SUBJ_}\"" ; \
     echo "" ; \
     echo "My email address has changed." ; \
     echo "Old: myoldid@myoldhost.myolddom" ; \
     echo "New: myid@myhost.mydom" ; \
     echo "Your email has been forwarded to my new address." ) \
     | /usr/lib/sendmail -oi -t
}
Some explanations: Naturally, the recipe does not stand alone in the ~/.procmailrc but is a part of it. Thus you would e.g. have previous recipes that take care of the email that is not to you, and email that was for mailer daemons.
How can I set variable values based on the text in the body of the email message?
 
Let's start with another, much simpler question:
 
From: ts(ät)uwasa(dot)fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Procmail: How do I filter by the body
Date: Sun Apr 23 09:34:38 EET DST 2000
X-Comment: Slightly modified
 
I am trying to save all the messages that come to me with "mypassword" in the body to a folder called password. How do I do that?
 
As the manuals state:
Flags can be any of the following:
B   Egrep the body.
Hence, all there is to it is
:0 B:
* mypassword
password
If you want your password case sensitive then use ":0 BD:".
All the best, Timo
..
 
From: ts(ät)uwasa(dot)fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Question of procmail newbie
Date: Tue Nov 23 23:09:41 EET 1999
X-Comment: Slightly modified
 
How could I solve the following problem with procmail: I receive e-mails with a body like this:
Category: aaa
Subcategory: bbb
File: ccc
I need to store this mail to the folder aaa/bbb/ccc, so procmail should create directories aaa/bbb . What kind of .procmailrc should I write?
 
The trick is to extract the appropriate text from the body of the email message and to set procmail variable values on the basis of the results. This is how it can be done.
 
#Preliminaries
SHELL=/usr/bin/sh #Use the Bourne shell (check your path!)
 
CATE=`cat | egrep "^Category:" | awk '{ print $2 }'`
SCAT=`cat | egrep "^Subcategory:" | awk '{ print $2 }'`
FILE=`cat | egrep "^File:" | awk '{ print $2 }'`
 
#Whatever other recipes
 
:0B:Procmail.lock
* ^Category:[ ].+[a-z0-9]
* ^Subcategory:[ ].+[a-z0-9]
* ^File:[ ].+[a-z0-9]
| mkdir ${CATE} ; mkdir ${CATE}/${SCAT} ;\
  cat >> ${CATE}/${SCAT}/${FILE}
 
#Whatever other recipes

 
As a validity check the condition lines require that all the key-lines are present in the email message body and that the lines contain names.
All the best, Timo
Feedback: It would be much more efficient rewriting these definitions using awk's pattern matching, such as:
 
CATE=`cat | awk '/^Category:/ { print $2 }'`
etc

 
Apropos awk. On the Usenet there are dedicated was newsgroups comp.lang.awk and alt.lang.awk. Furthermore, although used in quite another connection than procmail, there are several awk (actually GnuAWK) usage examples in my Assorted NT/2000/XP/.. CMD.EXE Script Tricks collection.
 


Next, let's consider a trickier task. Find from the body of the text the last line that potentially contains the string "mailto:". Insert the contents of that line into a MAILTO_ variable.
:0
* ^Subject:.*Whatever
{
  :0
  {
  MAILTO_=`sed -e '1,/^$/ d' \
           | egrep "mailto:" \
           | tail -1 \
           | expand \
           | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
           | sed -e 's/[^o]://g' -e 's/^://g' \
           | awk -F: '{ print $2 }' | awk '{ print $1 }'`
  }
  :0:
  WhichEverFolderYouWant
}
Consider the MAILTO_ construct. (The test of the recipe should be self-explanatory.) Should you wish to get the entire line with the "mailto:" into the MAILTO_ variable instead of just the email address there, simply leave out the last two lines from the MAILTO_ definition.
How can I insert some token text in front of the body of incoming email?
 
I have a really simple procmail question. All I want to do is add a line
"======= Forwarded Mail =========="
to the top of the body of all incoming messages, and forward them to another account.

 
Let start by considering the first part of the question only. This is how it is done. The solution owes heavily to Philip Guenther.
:0
{
  :0 fhw
  | cat - ; \
  echo "===== Filtered email ====="
  :0:
  ${DEFAULT}
}
So far so good. Next let's add the forwarding so that the token will only appear in the forwarded message. (If you wish to change that, adjust the order of the rules.)
:0
{
  :0c:
  ${DEFAULT}
  :0 fhw
  | cat - ; \
  echo "======= Forwarded Mail =========="
  :0
  !forward@myhost.mydom
}
Finally, let's add avoiding email loops.
# Discard loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

:0
{
  :0c:
  ${DEFAULT}
  :0 fhw
  | cat - ; \
  echo "======= Forwarded Mail =========="
  :0 fhw
  | formail -A"X-Loop: myid@myhost.mydom"
  :0
  !forward@myhost.mydom
}

Do you have any useful tips for regular expression matching?
 
..
This is a terribly complicated subject involving many features which I do not know. Let's nevertheless look at some further example recipes.
# Matching a few undelivery and such reports
:0:
* ^Subject:.*Undeliver(ed|able) (e)?mail|\
  ^Subject:.*Returned (spam )?(e)?mail
* ^TO_(myid|firstname\.lastname)@([-a-z0-9_]+\.)*myhost\.mydom
Returned.mail
Consider the first rule of the recipe above. It will match all email with the following on the "Subject:" line in the header: The continuation line will match What if you don't want to match "Re: Undelivered mail"? The following condition gives a more exact match
* ^Subject:[  ]+Undeliver(ed|able) (e)?mail
In other words only spaces and/or tabs are allowed between "Subject:" and the start of the actual subject.
 
..
Let's consider another example. Say that we have two hosts How to catch email from the former, but not the latter:
:0:
* ^From:.*cyber.com([^\.]|$)
ProbableSpam.mail
That is, do not allow a dot after the .com or alternatively require that the line ends there. However, cyber.comet would be matched! Thus, depending on what you want to achieve, you might have e.g.
 
:0:
* ^From:.*cyber.com( |"|>|$)
ProbableSpam.mail

 
..
What is the difference between the rules below?
* ^From:.*myid@([-a-z0-9_]+\.)*myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)?myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)+myhost.mydom
The first one matches any of
  1. myid@myhost.mydom
  2. myid@subhost1.myhost.mydom
  3. myid@mypc.subhost1.myhost.mydom
..
To recount the purpose of the main special regexp symbols:
 
Symbol Interpretation
* Match zero or more times
? Match zero or one times
+ Match one or more times
. Any character
[ ] Match from the list within the brackets
^ The start of the line (within [] however, a negation)
$ The end of the line
\ Quote the next character to take it literally
( ) Grouping


How can I test if two procmail variables have the same contents?
 
Basically the syntax for variable value tests is
VAR1_=Whichever expression you devise
:0:
* VAR1_ ?? regexp
wherever
But you can build rules like
VAR1_=Whichever expression you devise
VAR2_=whatever
:0:
* $ VAR1_ ?? ${VAR2_}
wherever
Note, however, that the above still is regular expression matching, not an equality.
 
The blank after the first $ is significant. It tells that the variable references on the line (${VAR2_}) are to be expanded, not to be taken as a literal text.
 
Feedback: That's easily resolved using $\var expansion and anchoring both ends of the regexp:
        * VAR1_ ?? $ ^^$\VAR2_^^
That condition will succeed if and only if VAR1_ and VAR2_ have the same contents, with the possible exception of VAR1_ having one more trailing newline than VAR2_.
I am having difficulties with "<". How does one match it?
 
Date: 09 Dec 1999 23:06:41 -0600
From: Philip Guenther
Newsgroups: comp.mail.misc
Subject: Re: procmail, trivial html detection, and a quirk
 
ts(ät)uwasa(dot)fi (Timo Salmi) wrote:
> I just noted that, at least in procmail v3.13.1 1999/04/05
>
> :0B:
> * </body>
> * </html>
>
> does not work. Instead one has to apply
>
> :0B:
> * [<]/body>
> * [<]/html>
 
Yep. A leading '<' or '>' on a condition causes procmail to interpret the condition as a size test. If you want a normal regexp condition that starts by matching a literal '<' or '>' character you have to protect the leading character from such interpretation. There are several ways of doing so. The most efficient are to use parens or a backslash:
* ()</body>
or
* (<)/body>
or
* (</body>)
or
* \</body>
That last one is generally avoided because it looks like you're using the \< regexp special when you really aren't. Putting the '<' or '>' in brackets also works, as you did above, but it slows down the matching ever so slightly as a character class is slower to match than a single normal character. Thus, one of the above four methods is usually preferred.
 
Philip Guenther
 
(Timo's addendum: As far as I understand \< is a word-boundary in procmail. Hence \< is best avoided, when not used as an actual boundary.)
How can I insert identification text to the beginning of the subject line?
 
I know how to sort my incoming email with procmail into different folders, but how do I use formail to automatically add some suitable identification text to the subject line of the email that I receive?
 
The general idea is this
#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
       | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

:0
* YourFirstSelectionCriterion
{
  :0 fwh
  | formail -I"Subject: WhateverYouAdd_1 ${SUBJ_}"
  :0:
  YourFirstFolder
}

:0
* YourSecondSelectionCriterion
{
  :0 fwh
  | formail -I"Subject: WhateverYouAdd_2 ${SUBJ_}"
  :0:
  YourSecondFolder
}
The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is about the header of the email message.
 
The -I option in formail removes and replaces the old header. Should you wish to retain the old subject header with an "Old-" prefix added, use -i instead.
I tried out your tips, but some of them failed on my system. What next?
 
Here are a few ideas:
  1. Have you copied right? For example:
     
  2. Have you customized all your file-paths right? Some of the recipes may require a slightly different setup in your environment than assumed in this FAQ.
     
  3. Check that procmail is getting your proper path. Try "echo ${PATH}" and then include "PATH=WhatYouGot" high up in your ~/.procmailrc recipe file.
     
  4. Include "VERBOSE=yes" high up in your ~/.procmailrc recipe file. Then see what is in the logfile procmail produced for debugging. The testbench is a useful aid in the debugging.
     
  5. The shell you use may affect some actions. Check where your Bourne shell sh is with "which sh". If it is e.g. /bin/sh then include "SHELL=/bin/sh" at the beginning ~/.procmailrc recipe file and see if anything changes. Bourne shell is the shell I have used in preparing this tips page.
     
  6. Work systematically. Try to pinpoint which particular line is causing the offense and how. If the problem is with the condition part make general enough a version to get it match. Then narrow it down towards what you wanted until the recipe fails. If the problem is with an action, try to separate whether the problem is with the actual action or your procmail syntax. For example if you pipe the email to a program, try to separate if it is the call syntax that is in error (e.g. do you manage to convey the parameters right) or if it the actual program you called that fails.
     
  7. If you have a procmail problem which you can't solve after trying properly, post your problem to the comp.mail.misc Usenet newsgroup and/or your corresponding local newsgroup. If you have genuine feedback about my procmail tips, your email is most welcome, but please refrain from using email for private consultation requests.

Echo and grep blues. I am having difficulties with echo and grep usages in procmail.
 
The combination of quoting and regular expressions can cause some subtle problems when the Unix echo and one of the greps (grep, fgrep, egrep) is used in the procmail recipes.
 
Consider
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject:`

# Responses to filter reports
:0:
* -1^0
*  1^0 $ ? echo \"${SUBJ_}\" | fgrep -is 'Re: Filter report'
*  1^0 ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
Response.mail
Consider a more complicated expression to extract the subject:
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
         | expand | tr '\;\|\$\`\\]/' '     ' \
         | sed -e 's/  */ /g' \
         | sed -e 's/(/\\\(/g' -e 's/)/\\\)/g' \
         | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
There is much more to the echo and grep interactions with the shell and the regular expressions. That is why sufficient trials using the testbench are advisable before including the more complicated recipes into one's "~/.procmailrc" file.
How do I know which of my many procmail recipes has been enacted?
 
To get a log of what happens you set at the beginning of your ~/.procmailrc recipes file
SHELL=/usr/bin/sh                 # Use Bourne shell
MAILDIR=${HOME}/Mail              # Customize as appropriate
LOGFILE=${MAILDIR}/procmail.log   # Your procmail log
VERBOSE=yes                       # Produce full information
LOGABSTRACT=all                   #       - " -
However, this produces so much information that it is not convenient for a routine checking by a visual examination. But you can include a suitable (dummy) variable definition in each one of your recipes and then search the log file for occurrences of that variable. Here is an example demonstrating how it goes. Consider a recipe that originally is
# Discard probable spam mail, set 1
:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
ProbableSpam.mail
Change this to be
:0
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
{
  :0
  { RULE="Discard probable spam mail, set 1" }
  :0:
  ProbableSpam.mail
}
Apply the same principle for all your recipes in your ~/.procmailrc file. Then, as email has arrived, you can check which rules have been used by searching the log file with the command grep "RULE=" ${HOME}/Mail/procmail.log. If you need this regularly, make the grep search one of your Unix scripts:
#!/usr/bin/sh
grep "Assigning \"RULE=" ${HOME}/Mail/procmail.log
In the altered procmail recipe, further up, carefully note some of the syntax Procmail recipes nesting can get fairly complicated. Consider the following example involving setting the RULE variable and procmail else if conditions ":0E".
:0
* ^TO_my-mailing-list
{
  :0
  * ^From:.*@([-a-z0-9_]+\.)*myhost\.mydom
    {
      :0
      { RULE="To my-mailing-list, probably legitimate" }
      :0:
      ${DEFAULT}
    }
  :0E
    {
      :0
      { RULE="To my-mailing-list, probably spam" }
      :0:
      Spam.mail
    }
}
Feedback: There is a method for logging which action took place without using the VERBOSE yes which creates large log files. This method uses the LOG variable:
 
LOGFILE=$HOME/.MailFilter_log
SHELL=/bin/sh
 
:0 B
* .*spam
{
  LOG="TRAPPED SPAM - "
  :0
    /dev/null
}
 
#- Accept All other mail -#
:0
{
  LOG="ACCEPTED MAIL - "
  :0
  $ORGMAIL
}
 
the out put looks something like this:
 
  TRAPPED SPAM - From spammer@spam.com Thu May 16 03:52:42 2002
   Subject: Make Money Fast
    Folder:
/dev/null 43140
ACCEPTED MAIL - From goodguy@save.com Thu May 16 03:54:08 2002
 Subject: Legitimate email message
  Folder:
var/spool/mail/username 4683

My comment: If you look at the example for testing for individual procmail recipes you'll see that for logging one sets (usually for troubleshooting)
#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all
For the method in the feedback above, leave those variables out or set
VERBOSE=no
However, do not set
LOGABSTRACT=no
because then you'll miss all but the actual log variable identification. Instead, just leave the line out.
How can I detect Korean, Cyrillic, or Chinese to avoid such frequent spam?
 
There is a very good page by Walter Dnes explaining the method. The method relies on ad-hoc approximation. In brief, scoring is used to detect if more than 5 per cent of the characters in the body of the message are high-bit characters typical of the said language codes. If you have gone through the items in my procmail FAQ, it should be easy to understand the inventive method given on Walter's page. See the exercise at the end of the current FAQ involving detecting Korean.
 
If you wish to be even more reticent about what you wish to receive you could even filter all messages that have any the following combinations appearing anywhere in the body of the message
    àé
    àì
    áà
    áò
    áô
    áù
    áú
    éè
    éê
    éî
    éù
    éú
and so on. Put those, and others you may wish to skip, in a bl_body.lst file
 
# Probable spam mail, by message body
:0B
* $ ? fgrep -is -f bl_body.lst
/dev/null

How can I change the subject line and include part of the message body into it?
 
I have a cellular phone. I want to save the incoming email normally and also to send a modified copy to my second account (a Short Message Service). The forwarded copy should include the original subject AND five lines of the original message text. The original body should not be included. Is this possible with procmail?
 
Well yes, it is. It takes some figuring out needing many of the principles presented in the other items in my proctips collection. It also needs a few tricks with Bourne shell programming. Perhaps most importantly, this item demonstrates how to put the body of the message into a variable.
 
# Customize these paths if they do not match yours
SHELL=/usr/bin/sh
SENDMAIL=/usr/lib/sendmail
 
:0
* ^Subject:.*Timo testing
{
  # Put the email intact in the default folder
  :0c:
  ${DEFAULT}
  # The "c" flag above tells the recipe to continue
  # Now we prepare a different version of the message
  :0
  {
    # Get the subject into a variable
    # Expand the possible tabs into blanks
    # Discard any leading and trailing blanks
    # On some systems -xSubject: has to be -x"Subject: "
    SUBJ_=`formail -xSubject: \
      | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
    # Get the body of the message into a variable
    # Accept only the first five lines
    # Discard newlines, i.e. put everything on one line
    BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
  }
  # Prepare and send a message with no body
  # -X "" extracts just the header (discards the body)
  # Plug in the new subject
  # Content fields might cause problems if not discarded
  # Change to To: address
  :0:proc.lock
  | formail -X "" \
      -I"Subject: ${SUBJ_} ${BODY_}" \
      -i"Content-Type:" \
      -i"Content-Length:" \
      -I"To: your@second.address" \
  | ${SENDMAIL} -t
}

 
The line
BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
retrieves the first five lines from the body of the text. It would be more useful to retrieve a specified number of characters from it. Say we wish to retrieve 160 characters. This is how to do that.
BODY_=`sed -e '1,/^$/ d' | tr -d '\n' | dd bs=1 count=160`
Solving the alternative of having a maximum of 160 characters in the concatenated SUBJ_ and BODY_ is left as an exercise to the reader.
 
There also is another, more important improvement that can be made in the action above. Replace tr -d '\n' with tr '\n' ' ' so that when the lines are concatenated a space is put in between them.
How can I remove the signature from the incoming email?
 
The recipe below assumes that the signature properly adheres to the Internet "-- " convention to denote where the signature starts.
 
:0
* ^Subject: Whatever
{
  :0 fbw
  | sed -e '/^-- /,$ d'
  :0:
  ${DEFAULT}
}

Let's look at what we've got:
In the above the sed script will delete everything in the message body starting from the "-- " until the end of the incoming message. Substituting
 
sed -e '/^-- /,$ d'
 
with
 
sed -e '/^-- /,/^$/ d'
 
will instead delete everything starting from the "-- " until the first encountered empty line. Thus if there is e.g. an attachment after the signature, the attachment will not be thrown away.

What unix manuals relating to procmail should I get?
 
Unix manuals are not very helpful as starting points, but after you have got the rudiments under your belt, you may wish to browse the following manuals for additional information. Below is a simple "manuals" Bourne shell script. It prepares plain text format files of some of the essential Unix man manuals for a procmail user, especially suited for offline reading.
 
Note that the "^H" is not a "^" and an "H", but a CTRL-H, i.e. ASCII 8 (the backspace character). To make the "manuals" file executable type "chmod u+x manuals".
 
#!/bin/sh
TODIR=${HOME}/myman
echo ${TODIR}
man egrep      | sed -e 's/_^H//g' > ${TODIR}/egrep.man
man formail    | sed -e 's/_^H//g' > ${TODIR}/formail.man
man procmail   | sed -e 's/_^H//g' > ${TODIR}/procmail.man
man procmailex | sed -e 's/_^H//g' > ${TODIR}/procmaex.man
man procmailrc | sed -e 's/_^H//g' > ${TODIR}/procmarc.man
man regexp     | sed -e 's/_^H//g' > ${TODIR}/regexp.man
man sendmail   | sed -e 's/_^H//g' > ${TODIR}/sendmail.man
ls -lF ${TODIR}

Many of the recipes in this FAQ utilize sed and/or awk. Some useful links (note, however, as is common with links, I can't guarantee that they still are current):
Is it possible to use procmail to call the vacation program?
 
Yes, it is, but it is not quite as straight-forward as one would expect.
 
Since this is a procmail, not the vacation program advice collection I'll assume that you are reasonably familiar with the vacation program. If not, start with "man vacation". You have to use procmail to customize the ~/.vacation.msg file because when invoked via procmail, the vacation $SUBJECT variable is not necessarily set.
 
Usually, when vacation is used, it is first called interactively to crate the ~/.vacation.msg file and to replace the ~/.forward file. If you are going to use the procmail solution it is very important not to do this. In particular, the ~/.forward file must not be touched in any way. The reason is that in this solution it is used to invoke procmail, not vacation. (The vacation program is, of course, called by procmail now.)
 
# Set a number of variables high up in your ~/.procmailrc
#
VACATION=/usr/bin/vacation
ONVACAT=yes
VACFREQ=5d
VACMSG=${HOME}/.vacation.msg
MYNAME_="MyFirstName MyLastName"
MYEMAIL_=myid@myhost.mydom
 
# Get the subject discarding any leading and trailing blanks
# Note: On some systems -xSubject: has to be -x"Subject: "
#
SUBJ_=`formail -xSubject: \
    | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# Prepare the vacation message's base
# This is done only once in ~/.procmailrc
#
:0 cwi
* ONVACAT ?? ^^yes^^
| echo "From: ${MYEMAIL_}" > ${VACMSG} ;\
  echo "Subject: ${MYNAME_}, away from my mail" >> ${VACMSG} ;\
  echo "X-Loop: myid@myhost.mydom" >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  echo "Thank you for your email about:" >> ${VACMSG} ;\
  echo "\"$SUBJ_\"" >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  echo "Your email will be seen to when I return." >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  cat ${HOME}/.signature >> ${VACMSG}
 
# Here we go invoking vacation and also saving the email
# You might have several, different of these recipes
#
:0
* ^Subject:.*Whatever
{
  :0
  { RULE="Testing" }
  :0 cwi
  * ONVACAT ?? ^^yes^^
  * ! ^X-Loop:.*myid@myhost\.mydom
  | ${VACATION} -t${VACFREQ} myid
  :0:
  WhateverFolder
}

Feedback: Maybe I [Collin Park] can add one more comment: I think you need a global LOCKFILE to cover the area from when you generate the vacation message to the place where you invoke $VACATION.
 
Otherwise, message #N may generate .vacation.msg, then message #N+1 overwrites it before #N invokes $VACATION.

How can I avoid duplicate messages sent in rapid succession?
 
One, but not the only option is the following heuristics. You will wish to customize and streamline it in accordance to your own preferences.
 
#Some variables
FROM2_=`formail -c -I"Reply-To:" -rt -xTo: \
 | tr '\;\|\$\`\\]/' '     ' \
 | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
DFROM2_=`echo /${FROM2_}/ \
 | expand | sed -e 's/[ \<\>\+\?\$]//g'`
SUBJ_=`formail -z -c -xSubject: \
 | expand | tr '\;\|\$\`\\]/' '     ' \
 | sed -e 's/ */ /g' \
 | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
DSUBJ_=`echo /${SUBJ_}/ | expand | sed -e 's/[ \<\>\+\?\$]//g'`
DWC_=/`wc -w`/
 
#Discard doubles
# W Wait for the filter or program to finish,
# suppress any 'Program failure' message.
:0W
* $ ? sed -n 1p LastIn | egrep -is '${DFROM2_}'
* $ ? sed -n 2p LastIn | egrep -is '${DSUBJ_}'
* $ ? sed -n 3p LastIn | egrep -is '${DWC_}'
{
  :0
  { RULE="Discard doubles" }
  :0
  /dev/null
}
 
#Store some information about the latest message
# c then continue
:0Wc
| echo "${DFROM2_}" > LastIn ;\
  echo "${DSUBJ_}" >> LastIn ;\
  echo "${DWC_}" >> LastIn

How can I skip logging a certain, matched recipe? Say virus warnings from my postmaster.
 
The solution is rather simple. Direct LOGFILE to /dev/null (or anywhere you may wish) for the duration of the relevant recipe. For example
 
:
LOGFILE_=${LOGFILE}
LOGFILE=/dev/null
:0:
* ^Subject:.*Virus in a mail for you
* ^From:.*postmaster
VirusWarnings
LOGFILE=${LOGFILE_}
:

Alternatively you could likewise (re)set
VERBOSE=no
LOGABSTRACT=no
but the first solution is the more flexible.

Could you please solve for me this procmail problem of mine?
 
It is nice that you have found my proctips so useful that you ask for my personal advice. Nevertheless, if you ask me by email for individualized procmail consultation my response has to be similar to that as in asking me for any programming advice. Briefly, the response is that I do not do email consultation. If you have a procmail related problem please post your question to the Usenet news to a newsgroup like comp.mail.misc. The added advantage of posting is that in a newsgroup both the question and the potential answers will have a wider forum. That way everyone will benefit.
 
Please also be aware that I have retired in 2011. My interests now lie elsewhere. It is not motivated enough for me to invest the considerable effort required to look into other users' procmail or other programming problems nor even partucularly maintain this procmail information. It is currently presented "as is".

On rare occasions I have also been asked to email my own personal ~/.procmailrc or my own spamfoiling scripts. The answer is a definite no. There are two main reasons. First, that material is private. Second, I have neither the willingness nor the time to send out material to users on individual requests. If and when I want to share my material I make it available for the users to themselves retrieve it via WWW or FTP.
I liked this material. Do you have anything else on programming, etc?
 
Yes, notably this:
 
Programming
Timo's assorted NT/2000/XP/.. CMD.EXE Script Tricks 
NT/2000/XP command line programming material links 
MS-DOS batch programming material 
Turbo Pascal programming material 
Unix Bourne shell scripts programming material
Etc
More links to Timo's FAQ materials 


Some exercises
 
Let's see if we can put to work the methods presented in this FAQ to solve some tasks, part of them having come up on the Usenet news.
 
Ex.1) Keep a copy of incoming email, and at the same time, get only the first five lines from the message body and forward it to another account.
# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

:0
* Any rule(s) you might wish to have
{
  # Keep a copy, but don't stop yet ( the c )
  :0c:
  ${DEFAULT}

  # Comment with "Old-" the Content-Length field from the header
  # Ensure that a whitespace exists between field name and content
  :0 fwh
  * ^Content-Length:
  | formail -z -i"Content-Length:"

  # Add the loop avoidance
  # ( f for piping; w for waiting for completion; h for headers )
  :0 fwh
  | formail -A"X-Loop: myid@myhost.mydom"

  # Truncate the body ( the b ) to five lines
  :0 fwb
  | head -5

  # Forward to the other account
  :0
  ! myid2@myhost.mydom
}
It is important to handle the content-length header-field when the length of the email is altered. This is done to ensure that the receiving email program will not break the forwarded message when it is read. The -i switch is used to retain the information about the original message length to the attention of the receiver.
 


Ex.2) Forward the first 10 lines of the message body to the user's second account while preserving all the original message headers -- I.e. at the receiving side, the user wants to see all the message travel history and only first 10 line of the message body.
 
This is a more complicated version of the first exercise. The transformed task is not trivial, since when you forward, the original message headers will be replaced by your forwarding headers. Therefore, you'll have to see to preserving also the original headers. Below is how I would solve the problem based on several items in this FAQ.
 
# A trick to extract the subject into a variable
# Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# The actual recipe to solve the exercise starts here
:0
* Whatever condition(s) you wish to select the messages for forwarding
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0c: #If you want to, preserve a full copy of the email, else omit
  ${DEFAULT}
  :0fwh #Preserve the information about the original content length
  * ^Content-Length:
  | formail -z -i"Content-Length:"
  :0fwb #Truncate the body of the message to ten lines
  | head -10
  :0fwh #Insert a blank line at the beginning of the body for clarity
  | cat - ; echo ""
  :0fwh #Store the original headers, quoting them to avoid problems
  | sed -e 's/^/\> /'
  :0fwh #Insert some of your own information before forwarding
  | formail -A"X-Loop: myid@myhost.mydom" \
    -A"X-Info: Forwarded body truncated to 10 lines" \
    -i"Subject: $SUBJ_ (fwd)"
  #Finally, forward the adjusted email
  :0
  !my2dnId@myhost.mydom
}
 
# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

 
Feedback: The recipe with head probably needs an "i" on the flags line, as:
:0 fwbi
| head -10
since write errors on the pipe are likely for messages larger than a certain size. (I've seen numbers like 4096 and 10240... it apparently varies with the system.)
 


Ex.3) Match a potential [TS999] identification in the Subject header, such as "[TS001] Timo testing". If found, insert a "Subject id: [TS999]" as the first line in the body of the message. (The rest of the original subject line must not reappear in the id.)
 
:0
* ^Subject:.*\/\[TS[0-9]+\]
{
  :0 fhw
  | cat - ; \
  echo "Subject id: ${MATCH}"
  :0:
  ${DEFAULT}
}
But what if you do want to include the rest of the original subject line? In that case use
* ^Subject:.*\/\[TS[0-9]+\].*



Ex.4) Multi-part messages (which typically include attachments) have in their headers a field like the two examples below:
 
Content-Type: multipart/mixed; boundary=ELM965173874-25050-0_
Content-Type: multipart/mixed; boundary="------------BA45271FBDAA479CECA7E20A"

Write a recipe that inserts into a variable (call it BOUND) the boundary string. Note that the potential quotes (") are not to be part of that string. Also note that the header might be divided on multiple lines as in
 
Content-Type: multipart/mixed;
  boundary=ELM965173874-25050-0_

 
There are alternative solutions, which not necessarily are quite equivalent. The first one is putting high up in your ~/.procmailrc recipe file the line(s)
 
BOUND1=`formail -z -x"Content-Type:" \
  | awk -F= '{ print $2 }' \
  | sed -e 's/\"//g' | tr -d '\n'`

 
A second one is:
 
:0h
* ^Content-Type:
{ BOUND2=`egrep -i 'boundary=' \
  | awk -F= '{ print $2 }' | sed -e 's/\"//g'` }

 
This was not in the exercise, but you can then have recipes like
 
:0:
* ! BOUND2 ?? ^^^^
WhateverFolder

 


Ex.5) Identify if the arriving email is in Korean. If so, return the message to the sender and his/her postmaster. Ignore a potential Reply-To: field in the header. Avoid email loops. Avoid forgeries which appear to come from your own host. Avoid forgeries which lack a host name. Be careful not to take Finnish/Swedish or French as Korean.
 
This is quite a difficult exercise with many details involved.
 
# Get the sender's address, ignore Reply-To:
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`
 
# Your path to sendmail
SENDMAIL="/usr/lib/sendmail"
 
# Reject probable Korean email using character scoring
:0
* ! ^X-Loop:.*myid@myhost\.mydom
* ! $ ? echo ${FHOST_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FHOST_} | fgrep -is '.'
{
  :0BD
  *  -1^1 .
  *   2^1 =[0-9A-F][0-9A-F]
  *  20^1 [¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿]
  *  20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
  *  20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
  *  20^1 =[89A-F][0-9A-F]
  * -20^1 [åÅäÄöÖàáâçèéêë]
  * -20^1 =(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)
  {
    :0
    { RULE="Probable Korean email" }
    #
    :0c:${HOME}/procmail.lock
    | expand | sed -e 's/[ ]*$//g' \
      | sed -e 's/^/ /' > ${HOME}/procmail.reject.korean
    #
    :0:${HOME}/procmail.lock
    | (formail -r -I"Subject: Autorejected email" \
      -I"To: ${FROM_}" \
      -I"Cc: postmaster@${FHOST_}" \
      -A"X-Loop: myid@myhost.mydom" ; \
      echo "--- begin rejected probable Korean email ---" ; \
      echo "" ; \
      cat ${HOME}/procmail.reject.korean ; \
      echo "--- end of rejected probable Korean email ---" ; \
      rm -f ${HOME}/procmail.reject.korean) \
        | ${SENDMAIL} -t
  }
}


Ex.6) If the subject of the email contains the identifier [INFO], in capitals, put the body of the incoming email into a temporary file. Ensure that the name of the temporary file is unique. Insert the full subject line at the top of the temporary file. (Why, and what then is beyond this exercise.)
 
#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# Assign a temporary file name
TMPFILE_=proctemp.$$
 
:0D
* ^Subject.*\[INFO\]
{
  :0 fwbi
    | echo "Subject: ${SUBJ_}" > ${TMPFILE_}; \
    echo >> ${TMPFILE_}; \
    cat >> ${TMPFILE_}
}


Ex.7) If the email comes from a certain sender, check if the time-zone information is present in the Date header. If not, add it assuming +3 hours.
 
#Get the date discarding any leading and trailing blanks
DATE_=`formail -xDate: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
:0
* ^From:.*TheCertainSender
* ! ^Date:.*(EET|DST|GMT)
{
  :0 fwhi
  | formail -i"Date: ${DATE_} +0300 (EET DST)"
  :0:
  ${DEFAULT}
}


Ex.8) The simple spamfoiling recipe below won't work. Correct it.
 
:0:
* !^TO$USER@xxxxxxx.xxx
ProbableSpam.mail
 
:0
{
  :0
  { USER=`whoami` }
  :0:
  * $ ! ^TO_${USER}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
  ProbableSpam.mail
}
 
The ([-a-z0-9_]+\.)* is optional.
 
Another solution:
 
:0:
$ ! ^TO_${LOGNAME}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
 


Ex.9) Insert at the beginning of the subject the date/time of receiving the incoming message in the YYYYMMDD HHMMSS format.
 
:0
* Whatever rules
{
  :0
  { SUBJ_=`formail -c -xSubject: \
    | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` }
  :0
  { DATETIME_=`date "+%Y%m%d %k%M%S"` }
  :0 fhwi
  | formail -I"Subject: ${DATETIME_} ${SUBJ_}"
  :0:
  ${DEFAULT}
}


Ex.10) This partly is based on an actual incident. Consider the following recipe with three small, but crucial syntax errors, and one omission. Find them.
 
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com\|
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
  :0
  {RULE="Abuse reception notes"}
  :0
  ReceivedNotes
}

The answer is a bit further down
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
 
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@\
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com|\
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
  :0
  { RULE="Abuse reception notes" }
  :0:
  ReceivedNotes
}


Ex.11) Write a recipe to match the subject line below. The (RECENT) may or may not be there, and the numbers will change from posting to posting.
Subject: Re: [SpamCop:(RECENT)38.204.225.29,id:16135684] Make lotsof $$$

 
:0:
* ^Subject: Re: \[SpamCop:(\(RECENT\))?[0-9\.]+,id:[0-9]+\]
WhateverFolder


Ex.12) It is fairly common that spam email has the same sender and recipient in the From: and To: fields. Device a recipe that detects such postings.
 
This is not quite as simple as it first sounds, since it is advisable to take into the account the fact that the contents of the two fields may not be quite identical even in the case of the actual addresses being the same. Thus I would use regular expression matching both ways as below as one of the optional solutions. By default, variable comparisons are regular expression matching, not strict equalities. Also note avoiding email loops and falsely targeting email which one may have sent to oneself.
 
WHOFROM=`formail -xFrom: \
  | expand \
  | sed -e 's/  */ /g' \
  | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
WHOTO=`formail -xTo: \
  | expand \
  | sed -e 's/  */ /g' \
  | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
:0:
* -100^0 ^X-Loop: myid@myhost\.mydom
* -100^0 ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
* -100^0 ^From:.*LegitimateMailingList
* 1^0 $ WHOFROM ?? ${WHOTO}
* 1^0 $ WHOTO ?? ${WHOFROM}
ProbableSpam.mail
 


Ex.13) Write a (spam avoidance) recipe to detect email with more than seven recipients in the "To:" header field. Assume for simplicity that each address will have exactly one "@" character in it.
 
:0
* ^Subject:.*The information you requested
{
  :0
  {
    WHOTO=`formail -z -xTo:`
    COUNT=`echo ${WHOTO} | sed -e 's/[^@]//g' | wc -c`
    COUNT1=`expr ${COUNT} - 1`
    ISGT=`expr ${COUNT1} \> 7`
  }
  :0:
  * ISGT ?? ^^1^^
  ProbableSpam.mail
}


Ex.14) Make procmail forward email that arrives between 9am and 5pm to a predefined daytime email address.
 
:0
# Omit the condition line below if this is for all email
* ^Subject:.*Whatever
{
  :0
  {
    TIME=`date +%H%M`
    ISGT=`expr ${TIME} \> 0900`
    ISLT=`expr ${TIME} \< 1700`
  }
  :0
  * ISGT ?? ^^1^^
  * ISLT ?? ^^1^^
  ! daytime_forward_address
}


Ex.15) Write a Procmail recipe which detects if there is a Word document attached to the incoming email.
 
# Email with a Word document attached
:0
* ^Content-Type: multipart/
{
  :0 B
  * ^Content-.*attachment.*name=.*\.(doc|rtf)
  {
    :0
    { RULE="Email with a Word document attached" }
    :0:
    WordAttachmentEmail
  }
}


Ex.16) Write a recipe to detect a "whatever pattern" on exactly the second line of the body of an incoming message. Ignore case in the pattern.
 
:0B:
* ? sed -n 2p | egrep -is 'whatever pattern'
WhateverPatternMail
 
A tip: Even if there is no direct relation with procmail, my collection of useful MS-DOS batch files and tricks contains several examples of the sed (and awk) usages. So does my collection of useful NT/2000/XP script tricks and tips.
 


Ex.17) Write a spam detection recipe that does the following:
1. Check the body of the message against the keywords (collected spam sites' www addresses etc.) in a BlackList.lst pattern-file. The pattern-file might contain something like:
   This letter may come to you as a surprise
   Urgent business proposal
   cheap-medz.com
   discreetdelivery.net
   http://homemarketplace.cjb.net
   mailto:reklamapoezd@
   quityourjobworkforus
   statesmoneyz.com
   www.badcrednp4u.biz
2. If a KEEPSPAM variable has been set to "yes" save the spam to Spam.mail, truncated to 100 lines. If not, discard the message.
# Probable spam mail, by message body
:0B
* $ ? fgrep -is -f BlackList.lst
{
  :0
  * KEEPSPAM ?? ^^yes^^
  {
    :0:MyProcmail.lock
    | sed -n 1,100p >> Spam.mail
  }
  :0E
  {
    :0
    /dev/null
  }
}

Acknowledgements for useful advice and/or feedback:
 
 Aughey, John
 Bump, Jorey
 Davey, David
 Dnes, Walter
 Eriksson, Era
 Guenther, Philip
 Guckes, Sven
 Hebeisen, Christoph
 Hirvonen, Hannu
 Lane, John
 Melish, Jacob
 Menezes, Evandro
 Novak, Curtis
 Park, Collin
 Pettigrew, John
 van Tol, Ruud
 Van Steenkist, Vernon

Any errors and inadequacies are, however, solely my own responsibility.
 
§ The legal note: The author shall not be liable to the user for any direct, indirect or consequential loss arising from the use of, or inability to use, any information, rule, script, program or file, howsoever caused. No warranty is given that the information, rules, scripts, programs or the advice given will work under all circumstances or that they are current. You use everything at your own risk.


[ts(ät)uwasa.fi ] [AlbumsUwasa ] [Garbo ] [FAQs ] [Research ] [Lectures ] [Acc&Fin ] [Faculty ] [University ]
 
[Revalidate]


C:\_G\WWW\~NETIKKA\SALMTI02\INFO\proctips.php
C:\_G\WWW\~NETIKKA\FTPCMD\SALMTI02.CMD /i5