This is gonna be like a small info bulletin on how attachments are sent. This work is copyright © 2000 Philip S Tellis This work is licensed under a Creative Commons License: http://creativecommons.org/licenses/by-nc-sa/2.0/ When you send mail using the SMTP protocol, you basically have only the following commands: HELO - Greet the mail server. Used once per session - at the beginning of the session MAIL FROM: - Announce who the sender is. Used once per mail, before specifying any recipients for each mail, or after a RSET RCPT TO: - Announce who the mail is to. Multiple recipients are allowed, each must have its own RCPT TO: entered immediately after a MAIL FROM: DATA - Starts mail entry mode. Everything entered on the line following DATA is treated as the body of the message and is sent to the recipients. The DATA terminates with a . (period) on a line by itself. A mail may be queued or sent immediately when the . is entered. It cannot however be reset at this stage. RSET - Reset the state of the current transaction. The MAIL FROM: and RCPT TO: for the current transaction are cleared. QUIT - End the session. No commits happen here. I'll deal with return codes later (maybe a different mail). All other commands simply extend the usage of these. If you look at the commands above, you will notice a few things. 1. Almost all commands need an argument. The exceptions are RSET and QUIT. 2. DATA takes a multiline argument on the line after DATA has been acknowledged. 3. If multiline arguments (like DATA) terminate with a . on a line by itself, then how do you enter a . on a line by itself? 4. What about the other header fields? Subject:, Reply-To:, custom headers etc? 5. There is no provision for attaching files. Let's now look at these points. 1. HELO, MAIL FROM: and RCPT TO: tell the SMTP server all it needs to know to get the mail through, and bounce it in case of error. Actually, HELO isn't required, but ensures that the client and server are speaking the same language. (Ref: EHLO) 2. DATA tells the SMTP server what the contents of the mail are. If you read through /var/spool/mail/$USER or the eqivalent on your system, you will find it full of what was typed as DATA, with maybe a few lines added by the SMTP server and mail client. 3. Escaping a . is the same as escaping a \ in C. Preceed it with another . In fact, the SMTP protocol states that any line that starts with a . should be preceeded by another . So if I wanted to say this: . is a period I would have to enter this: .. is a period The MUA is supposed to translate .. at the start of a line into . 4. If you send a mail directly using the SMTP protocol (telnet to the smtp server), and simply type the body of the message, then all you receive is the body along with the Date: and From: fields. Try it. All other fields have to be entered as part of the body. In fact, you could override the default Date and From fields in the body of the mail. These two fields serve a dual purpose, acting as what is known as the UNIX_FROM_LINE in you mail file. At the start of every mail, you will see a line like this: From philip Mon May 22 17:35:13 2000 To enter other fields, just enter them as the first part of your DATA, separated from the actual body by a completely blank line. Not even a space is allowed on this line. eg: DATA 354 Enter Data end with . From: philip.tellis@iname.com To: linuxers@ilug-bom.org.in Subject: SMTP and Mail Attachments - [informational] This is gonna be like a small info bulletin on how... ... . 250 Mail accepted for delivery When the mail is received, the MUA separates the header from the rest of the body. 5. So, now we come to the final question. Attachments. Attachments may be binary or text files. We do not know, nor should we care. The main question is how do we get 8 bit data across all networks. Note that the Internet is a collection of heterogenous networks. Most speak TCP/IP, some speak X.25, some speak even more obscure and outdated protocols. Some of these protocols have a 6 bit character set - 64 characters that may pass through their networks. We must be able to get all our data through in these 64 characters. Fortunately though, these are the most used characters in emails - A-Z, a-z, 0-9, +/ Unfortunately though, we also have 192 other characters that are used, though not as often. Now, most of the networks on the Internet can actually handle 7 bit ASCII data, so we don't really worry about it too much. With plain text that is. When you're sending attachments, you probably want it to get through correctly. We therefore need to encode our 8 bit attachment into 7 or 6 bits. We figure, since we're gonna encode it anyway, let's go with 6 bits and cover the whole net. We use what is called Base64 encoding and I am not going to go into it here. There are other encoding formats - quoted printable being very common - that only code characters that are outside the 64 character set. Base 64 is the most used though. Now, how do we get the attachment into the mail? Assuming that we have already encoded it, two things remain. i) Add appropriate headers to the mail so that the MUA knows that there are attachments and where they can be found. ii) Add headers to the attachments that will allow the MUA to properly decode and save the attached file. The main mail /must/ contain the following headers: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="" A note about mail headers... any line that starts with white space is appended to the preceeding line. Thus, boundary is actually part of the Content-Type. The is a random string of base64 allowed characters. You may also want to add a Content-Disposition: mentioning that the attachments are attached inline with their own headers: Content-Disposition: inline Now, each attachment /must/ start with this same boundary, preceeded by two hyphens -- -- Attachment header encoded attachment After the final attachment, will be the boundary, preceeded and succeeded by two hyphens: ---- Simple so far right? Now, the attachment header. Immediately after the boundary, you enter the attachment header. The only required field is the Content-Transfer-Encoding: which tells the MUA what was used to encode the data. There is also the Content-Type and the Content-Disposition that tell the MUA what the original mime type of the attachment was. This would also mention the original file name. Content-Transfer-Encoding: base64 Content-Type: text/plain; name="" Content-Disposition: attachment; filename="" As before, leave a blank line, and enter the attachment. That pretty much looks like the end, except for a small addition. Your attachment itself could be a multipart attachment, in which case it would have a multipart/mixed mime type, and a second boundary and attachments under it. So think of it as being pretty recursive. Looking at it now, we could probably consider our entire mailbox file as a single mail with each mail being an attachment and the UNIX_FROM_LINE being the boundary. Notes: MUA - Mail User Agent (Netscape, PINE, mutt etc). SMTP - Simple Mail Transfer Protocol. EHLO - Extended Helo protocol - the successor to HELO. MIME - Multipart Internet Mail Extensions References: RFC 821 - SMTP RFC 2821 - SMTP RFC 822 - Message headers RFC 2045-2049 - MIME Hope ya'll found this interesting. I'll put it up on my page soon. Philip -- Unfair animal names: -- tsetse fly -- bullhead -- booby -- duck-billed platypus -- sapsucker -- Clarence -- Gary Larson