1. The great Email Survey of '06... ==================================== The email survey is a collection of scripts written in Perl for collecting fairly anonymous data about your email and returning it to the author so he can work on some nice graphs for his thesis. [ WARNING: the author can assume no liability for ths program eating your mail. Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. ] The thing is composed of several scripts, described below: esrp.pl Email Survey Response Parser, parses responses for some simple stats. survey.pl This is the main program that runs all the others stats `->exchange A symlink to IMAP that makes it ignore "Public Folders/*" `->imap Collects stats from IMAP accounts. `->mail.app Collects stats from Apple's Mail.app program (possibly broken, almost works). `->maildir-data A symlink to mbox `->maildir Collects stats from maildirs `->mbox Collects stats from mbox files or similar files `->msg-parser.pl Accepts messages on STDIN and outputs stats to STDOUT. So non-perl stat collectors don't need to understand MIME. search.pl Searches likely places on your system for mail sanity.pl Checks a file is the stated format (wraps search.pl) survey.pl.conf Contains some configuration information for survey.pl So survey.pl runs, optionally asks the user a number of questions about how they want things to be set up. If they don't offer a list of mail locations it uses search.pl to find likely places for mail. If they don't switch it off it then checks all listed locations with sanity.pl. Then it runs the correct script from the stats directory on each file. So if theres an entry of "mbox:/home/user/Mail/important" it will run: $ENV{CWD}/stats/mbox /home/user/Mail/important And collect the output. It will then do as the user wants with it, writing it to a file, outputting it to STDOUT (the users terminal) or sending it via email to the account specified in survey.pl.conf. 2. What Information Is Being Collected ====================================== The program outputs two things primarily, (A) a line identifing the source of the data, and (B) secondly the data itself. (A):- The identification line. This is composed of the following things: - Its prefixed with # so that I can sort it out of the results easily. - It has the following fields (comma seperated): - Name of the program run (normally survey.pl) - Version of the survey program (normally 0.1) - Username of the user running it - Fully Qualified Domain Name of the computer - The seconds since epoch (aka Unix time, seconds since Jan 1st 1970) If you'd rather not give away this information then the -a (Anonymize) flag is available on survey.pl This means that the identification line is composed only of: - Name of the program run - Version of the program run So that if the output changes format between versions I can compensate by filtering different versions in different manners. (B):- The data itself This is composed of a number of lines, one per-email, with comma-seperated fields. Each field is prefixed by a tag (^\S:) which is used to differenciate its type, there may be more than one item of each tag per email. - m: MIME version, 0 for non-MIME mail, usually 1.0 for MIME mail. - t: Total Size of the mail, in bytes - h: Size of the emails header, in bytes - b: Total size of the emails body, including all MIME chunks - r: Number of Recieved headers in the email's header - cs: A checksum of the headers and/or other information about the mail as a 32 bit int. To prevent the removal of identically sized but different emails by the filter. - Then: - A variable number of sections, one per MIME chunk of the email - Each one bearing the tag 'mc' - Each in the following format:- type/subtype:size-in-bytes 3. How to run this Program ========================== The survey is not designed to be installed into the file-system, since there is little expectation of wanting to run it more than once, instead it is designed to be unpacked, run from within that directory, then removed from the file system. The following instructions are in two parts, obtaining and unpacking the program, and secondly running it and returning the data from a survey. In both cases $ is considered the shell prompt and should not be typed. X.Y is considered to be the current version, such as 0.1 or 0.2 3.1) Obtaining and unpacking the program: $ wget http://www.lancs.ac.uk/~tipper/projects/email-survey/email-survey_X.Y.tar.bz2 $ tar xjf email-survey_X.Y.tar.bz2 $ cd email-survey_X.Y/ At this point you should have it unpacked, and should consult the INSTALL file for getting it working. If you can run something like: find . -perm -100 -type f -exec {} -v \; Which will run the -v (version) option on every executable, then its probably working, however you'll probably need to install LibMagic and MIME-tools first. Again, see the INSTALL file for details. 3.2) Surveying your mail and returning a report. Probably the easiest method is to run the survey.pl script in interactive mode: $ ./survey.pl -i This will ask you what output you want (sent over email, to a file or STDOUT) and if you want the program to search for mail or if you want to specify mail locations. It will then run through as normal. See ./survey.pl -h for more information. It should be noted that the email output option attempts (by default) to use /usr/sbin/sendmail to send the email. If your system doesn't have its own mail server, or a sendmail program that will at least forward mail to a full SMTP server then this won't work, however many (most?) Unix systems will be fine with this system. If you have any doubts that this will work its probably best to output the results to a file and manually email it to the contact address listed in survey.pl.conf If you want to see what the searching mode of the program will find in the way of mail on your system then you can just run the search script on its own. ./search.pl Should return a list of all the mailboxes it can find. If you wish to use some shell-trickery to make generating this easier you could use a simple line such as the following to search your ~/Mail folder for all mbox files (that aren't a backup folder, or mail you've sent) and then feed that into survey.pl: ./survey.pl -n -f `date +%Y-%m-%d`_`hostname -f` `find ~/Mail -type f -not -name 'backup' -not -name 'sent' -exec echo "mbox:{}" \; | tr '\n' ',' | sed 's/,$/\n/'` Such things will of course need adjusting to your local situation, the Interactive mode is probably easiest. (For example on MacOSX boxes `hostname -f` doesn't work, just using `hostname` will however work). 3.3) Collecting stats from an IMAP or Exchange account Using stats/imap and stats/exchange are special cases. Instead of accepting locations for mail they accept locations of files that describe the accounts they want to access. For example if you type: ./survey.pl imap:foo Then it will attempt to open a file called "foo" and read the first three lines, which it expects to be username, password and imap server to connect to. So if your username was "bar" your password "baz" and your imap server "mail.example" then you would create a file with the following three lines in it: bar baz mail.example And it would connect to mail.example as bar, give it the password baz and survey all the mail there. stats/exchange works in exactly the same manner.