| Du bon usage de...
 
 Son-of-RFC 1036 News Article Format and Transmission
7. Control Messages
The following sections document the  currently-defined  con-
trol  messages.   "Message"  is used herein as a synonym for
"article" unless context indicates otherwise.
 
Posting agents are warned that since  certain  control  mes-
sages require article bodies in quite specific formats, sig-
natures SHOULD not be appended to such articles, and it  may
be  wise to take greater care than usual to avoid unintended
(although perhaps well-meaning) alterations to text supplied
by  the  poster.  Relayers MUST assume that control messages
mean what they say; they MAY be obeyed as  is  or  rejected,
but MUST not be reinterpreted.
 
The  execution  of the actions requested by control messages
is subject to local administrative restrictions,  which  MAY
deny   requests  or  refer  them  to  an  administrator  for
approval.  The descriptions below are generally  phrased  in
terms  suggesting mandatory actions, but any or all of these
MAY be subject to local administrative approval (either as a
class  or case-by-case).  Analogously, where the description
below specifies that a message or portion thereof is  to  be
ignored, this action MAY include reporting it to an adminis-
trator.
 
     NOTE: The  exact  choice  of  local  action  might
     depend   on   what   action  the  control  message
     requests, who it claims to come from, etc.
 
Relayers MUST propagate even control messages  they  do  not
understand.
 
In  the  following sections, each type of control message is
defined syntactically by  defining  its  arguments  and  its
body.   For example, "cancel" is defined by defining cancel-
arguments and cancel-body.
 
7.1. cancel
The cancel message requests that one or more previous  arti-
cles be "cancelled":
 
     cancel-arguments  = message-id *( space message-id )
     cancel-body       = body
The  argument(s)  identify  the articles to be cancelled, by
message ID.  The body is  a  comment,  which  software  MUST
ignore,  and SHOULD contain an indication of why the cancel-
lation was requested.  The cancel message SHOULD  be  posted
to  the same newsgroup(s), with the same distribution(s), as
the article(s) it is attempting to cancel.
 
     NOTE: Using the same newsgroups and  distributions
     maximizes the chances of the cancel message propa-
     gating everywhere the target articles went.
 
     NOTE: RFC 1036 permitted only a single  message-id
     in  a cancel message.  Support for cancelling mul-
     tiple articles is highly desirable, especially for
     use  with  Supersedes (see section 6.14).  If sev-
     eral revisions of an article appear in  fast  suc-
     cession,  each using Supersedes to cancel the pre-
     vious one, it is possible for a middle revision to
     be  destroyed  by cancellation before it is propa-
     gated onward to cancel its predecessor.   Allowing
     each   article   to  cancel  several  predecessors
     greatly alleviates this problem.  (Posting  agents
     preparing a cancel of an article which itself can-
     cels other articles might wish to add those  arti-
     cles  to  the cancel-arguments.)  However, posters
     should be aware that much old  software  does  not
     implement   multiple  cancellation  properly,  and
     should avoid using it when  reliable  cancellation
     is vitally important.
 
When  an  article (the "target article") is to be cancelled,
there are four cases of interest: the article hasn't arrived
yet,  it  has  arrived  and  been filed and is available for
reading, it has expired and  been  archived  on  some  less-
accessible  storage  medium,  or  it  has  expired  and been
deleted.  The next few paragraphs discuss each case in  turn
(in reverse order, which is convenient for the explanation).
 
EXPIRED AND DELETED.  Take no action.
 
EXPIRED AND ARCHIVED.  If the article is
readily  accessible and can be deleted or made unreadable easily, treat
as under AVAILABLE below.   Otherwise  treat  as  under  EXPIRED  AND
DELETED.
 
     NOTE:  While it is desirable for archived articles
     to be cancellable, this can easily involve rewrit-
     ing  an  entire  archive volume just to get rid of
     one article, perhaps with manual actions  required
     to arrange it.  It is difficult to envision a sit-
     uation so dire as to require  such  measures  from
     hundreds  or  thousands  of administrators, or for
     that matter one  in  which  widespread  compliance
     with such a request is likely.
 
AVAILABLE.   Compare  the  mailing
addresses  from the From lines of the cancel message and the target
article,  bearing in mind that local parts (except for "postmaster") are
case- sensitive and domains are case-insensitive.  If they do  not
match,  either  refer  the  issue  to an administrator for a
case-by-case decision, or treat as if they matched.
 
     NOTE: It is generally trivial to  forge  articles,
     so  nothing  short of cryptographic authentication
     is really adequate to ensure that  a  cancel  came
     from  the original article's author.  Moreover, it
     is highly desirable to  permit  authorities  other
     than  the  author to cancel articles, to allow for
     cases in which the author is unavailable,  uncoop-
     erative,  or malicious, and in which damage and/or
     legal problems may be minimized by prompt  cancel-
     lation.  Reliable authentication that would permit
     such administrative cancels would be a  worthwhile
     extension  to this Draft, and experimental work in
     this area is encouraged.
 
     NOTE: Meanwhile, a simple check  of  addresses  is
     useful  accident  prevention  and catches at least
     the most simple-minded forgers.  Since the  intent
     is  accident prevention rather than ironclad secu-
     rity, use of the From address is appropriate,  all
     the  more  so  because in the presence of gateways
     (especially  redundant  multiple  gateways),   the
     author may not have full control over Sender head-
     ers.
 
     NOTE: The "refer... or treat as if  they  matched"
     rule  is  intended  to specifically forbid quietly
     ignoring cancels with mismatched addresses.
 
If the addresses match, then if  technically  possible,  the
relayer  MUST delete the target article completely and imme-
diately.  Failing that, it  MUST  make  the  target  article
unreadable  (preferably  to  everyone, minimally to everyone
but the administrator) and  either  arrange  for  it  to  be
deleted  as  soon  as possible or notify an administrator at
once.
 
     NOTE:  To  allow  for  events  such  as   criminal
     actions,   malicious   forgeries,   and  copyright
     infringements, where damage and/or legal  problems
     may  be minimized by prompt cancellation, complete
     removal is strongly preferred over  merely  making
     the  target article unreadable.  The potential for
     malice is outweighed by the importance  of  really
     getting  rid of the target article in some legiti-
     mate cases.  (In cases  of  inadvertent  copyright
     violation  in  particular,  the ability to quickly
     remedy the  violation  is  of  considerable  legal
     importance.)   Failing  that, making it unreadable
     is better than nothing.
 
     NOTE: Merely annotating the article so that  read-
     ers  see  an  indication that the author wanted it
     cancelled is not acceptable.  Making  the  article
     unreadable is the minimum action.
 
     NOTE: There have been experiments with making can-
     celled articles unreadable,  so  that  local  news
     administrators  could  reverse  cancellations.  In
     practice, administrators almost never  find  cause
     to  do  so.  Removal appears to be clearly prefer-
     able where technically feasible.
 
NOT ARRIVED YET.  If practical, retain  the
cancel  message until  the  target article does arrive, or until there
is no further possibility of it arriving and being  accepted  (see
section  9.2),  and  then treat as under AVAILABLE.  Failing that,
arrange for the target article to be rejected and discarded if it does arrive.
 
     NOTE:  It  may  well  be impractical to retain the
     control message, given uncertainty  about  whether
     the  target  article  will  ever arrive.  Existing
     practice in such cases is to assume that addresses
     would  match  and  arrange the equivalent of dele-
     tion.  This is often done  by  making  a  spurious
     entry  in  a  database of already-seen message IDs
     (see section 9.3), so that  if  the  article  does
     arrive, it will be rejected as a duplicate.
 
The  cancel  message  MUST be propagated onward in the usual
fashion, regardless of which of the four cases  applied,  so
that the target article will be cancelled everywhere even if
cancellation and target article follow different routes.
 
     NOTE: RFC 1036 appeared to require stopping cancel
     propagation  in the NOT ARRIVED YET case, although
     the wording was somewhat unclear.  This appears to
     have  been  an  unwise  decision;  there are known
     cases of important  cancellations  (in  situations
     of, e.g., inadvertent copyright violation) achiev-
     ing rather  poorer  propagation  than  the  target
     article.   News  propagation  is often a much less
     orderly process  than  the  authors  of  RFC  1036
     apparently   envisioned.   Modern  implementations
     generally propagate the cancellation regardless.
 
Posting agents meant for  use  by  ordinary  posters  SHOULD
reject  an  attempt  to  post a cancel message if the target
article is available and the mailing  address  in  its  From
header  does  not match the one in the cancel message's From
header.
 
     NOTE: This, again, is primarily  accident  preven-
     tion.
 
7.2. ihave, sendme
The  ihave  and  sendme  control  messages implement a crude
batched predecessor of the NNTP [rrr]  protocol.   They  are
largely  obsolete  in the Internet, but still see use in the
UUCP environment, especially for backup feeds that  normally
are active only when a primary feed path has failed.
 
     NOTE:  The  ihave and sendme messages defined here
     have ABSOLUTELY NOTHING TO DO WITH  NNTP,  despite
     similarities of terminology.
 
The two messages share the same syntax:
 
     ihave-arguments   = *( message-id space ) relayer-name
     sendme-arguments  = ihave-arguments
     ihave-body        = *( message-id eol )
     sendme-body       = ihave-body
Message IDs MUST appear in either the arguments or the body,
but not both.  Relayers SHOULD  generate  the  form  putting
message  IDs  in  the  body, but the other form MUST be sup-
ported for backward compatibility.
 
     NOTE: RFC 1036 made the relayer name optional, but
     difficulties could easily ensue in determining the
     origin of the message, and this option is believed
     to be unused nowadays.  Putting the message IDs in
     the body is strongly preferred over  putting  them
     in the arguments because it lends itself much bet-
     ter to large numbers of message IDs and avoids the
     empty-body problem mentioned in section 4.3.1.
 
The  ihave  message  states that the named relayer has filed
articles with the specified message IDs,  which  may  be  of
interest to the relayer(s) receiving the ihave message.  The
sendme message requests that the relayer receiving  it  send
the  articles  having the specified message IDs to the named
relayer.
 
These control messages  are  normally  sent  essentially  as
point-to-point messages, by using "to." newsgroups (see sec-
tion 5.5) that are sent only to the relayer the messages are
intended  for.  The two relayers MUST be neighbors, exchang-
ing news directly with each other.  Each relayer  advertises
its new arrivals to the other using ihave messages, and each
uses sendme messages to request the articles it lacks.
 
     NOTE: Arguably these point-to-point  control  mes-
     sages  should  flow  by  some other protocol, e.g.
     mail, but administrative  and  interfacing  issues
     are  simplified if the news system doesn't need to
     talk to the mail system.
 
To reduce overhead, ihave and sendme messages SHOULD be sent
relatively  infrequently and SHOULD contain substantial num-
bers of message IDs.  If ihave and sendme are being used  to
implement  a  backup  feed,  it may be desirable to insert a
delay between reception of an  ihave  and  generation  of  a
sendme,  so that a slightly slow primary feed will not cause
large numbers of articles to be requested unnecessarily  via
sendme.
 
7.3. newgroup
The  newgroup  control message requests that a new newsgroup
be created:
 
     newgroup-arguments  = newsgroup-name [ space moderation ]
     moderation          = "moderated" / "unmoderated"
     newgroup-body       = body
                         / [ body ] descriptor [ body ]
     descriptor          = descriptor-tag eol description-line eol
     descriptor-tag      = "For your newsgroups file:"
     description-line    = newsgroup-name space description
     description         = nonblank-text [ " (Moderated)" ]
The first argument names the newsgroup to  be  created,  and
the  second  one (if present) indicates whether it is moder-
ated.  If there  is  no  second  argument,  the  default  is
"unmoderated".
 
     NOTE:  Implementors are warned that there is occa-
     sional use of other forms in the second  argument.
     It  is  suggested  that  such  violations  of this
     Draft, which are  also  violations  of  RFC  1036,
     cause  the  newgroup  message  to be ignored.  RFC
     1036 was slightly vague about how second arguments
     other than "moderated" were to be treated (specif-
     ically,  whether  they  were   illegal   or   just
     ignored),  but  it  is  thought  that all existing
     major implementations  will  handle  "unmoderated"
     correctly,  and it appears desirable to tighten up
     the specs to make it possible for other  forms  to
     be used in future.
 
The  body  is  a comment, which software MUST ignore, except
that if it contains a descriptor, the  description  line  is
intended  to be suitable for addition to a list of newsgroup
descriptions.  The  description  cannot  be  continued  onto
later  lines,  but  is  not  constrained  to  any particular
length.  Moderated newsgroups  have  descriptions  that  end
with the string " (Moderated)" (note that this string begins
with a blank).
 
     NOTE: It is unfortunate that the description  line
     is part of the body, rather than being supplied in
     a header, but this is established practice.  News-
     group  creators  are cautioned that the descriptor
     tag must be reproduced  exactly  as  given  above,
     alone  on  a  line,  and  is  case-sensitive.  (To
     reduce errors in this regard, posting agents might
     wish to question or reject newgroup messages which
     do not contain a descriptor.)   Given  the  desire
     for  short lines, description writers should avoid
     content-free  phrases  like  "discussion  of"  and
     "news  about",  and  stick  to  defining  what the
     newsgroup is about.
 
The remainder of the body SHOULD contain an  explanation  of
the  purpose of the newsgroup and the decision to create it.
 
     NOTE: Criteria for newsgroup creation vary  widely
     and  are  outside  the scope of this Draft, but if
     formal procedures of one kind or another were fol-
     lowed  in  the  decision,  the body should mention
     this.  Administrators often look for such informa-
     tion  when  deciding  whether  to comply with cre-
     ation/deletion requests.
 
A newgroup message which lacks an Approved  header  MUST  be
ignored.
 
     NOTE:  It would also be desirable to ignore a new-
     group message unless its Approved header  names  a
     person who is authorized (in some sense) to create
     such a newsgroup.  A cooperating subnet with  suf-
     ficiently  strong  coordination to maintain a cor-
     rect and current list of authorized creators might
     wish  to  do  so  for its internal newsgroups.  It
     also (or alternatively) might  wish  to  ignore  a
     newgroup  message  for  an internal newsgroup that
     was posted (or  cross-posted)  to  a  non-internal
     newsgroup.
 
     NOTE:  As  mentioned in section 6.10, some form of
     (cryptographic?) authentication of Approved  head-
     ers would be highly desirable, especially for con-
     trol messages.
 
It would be desirable to provide some  way  of  supplying  a
moderator's  address  in  a newgroup message for a moderated
newsgroup, but this will  cause  problems  unless  effective
authentication  is available, so it is left for future work.
 
     NOTE: This leaves news administrators  stuck  with
     the  annoying chore of arranging proper mailing of
     moderated-newsgroup submissions.  On Usenet,  this
     can  be  simplified  by  exploiting  a  forwarding
     facility that some major sites provide: they main-
     tain forwarding addresses, each the name of a mod-
     erated newsgroup with all periods (".", ASCII  46)
     replaced by hyphens ("-", ASCII 45), which forward
     mail to the current  newsgroup  moderators.   More
     advice  on the subject of forwarding to moderators
     can be found in the document titled "How  to  Con-
     struct  the  Mailpaths  File", posted regularly to
     the Usenet newsgroups news.lists, news.admin.misc,
     and news.answers.
 
A newgroup message naming a newsgroup that already exists is
requesting a change in the moderation status or  description
of the newsgroup.  The same rules apply.
 
7.4. rmgroup
The rmgroup message requests that a newsgroup be deleted:
 
     rmgroup-arguments  = newsgroup-name
     rmgroup-body       = body
The sole argument is the newsgroup name.  The body is a com-
ment, which software  MUST  ignore;  it  SHOULD  contain  an
explanation of the decision to delete the newsgroup.
 
     NOTE:  Criteria for newsgroup deletion vary widely
     and are outside the scope of this  Draft,  but  if
     formal procedures of one kind or another were fol-
     lowed in the decision,  the  body  should  mention
     this.  Administrators often look for such informa-
     tion when deciding whether  to  comply  with  cre-
     ation/deletion requests.
 
A  rmgroup  message  which  lacks an Approved header MUST be
ignored.
 
     NOTE: It would  also  be  desirable  to  ignore  a
     rmgroup message unless its Approved header names a
     person who is authorized (in some sense) to delete
     such  a newsgroup.  A cooperating subnet with suf-
     ficiently strong coordination to maintain  a  cor-
     rect and current list of authorized deleters might
     wish to do so for  its  internal  newsgroups.   It
     also  (or  alternatively)  might  wish to ignore a
     rmgroup message for an internal newsgroup that was
     posted  (or  cross-posted) to a non-internal news-
     group.
 
Unexpected  deletion  of  a  newsgroup  being  a  disruptive
action,   implementations  are  strongly  advised  to  refer
rmgroup messages to an administrator by default, unless per-
haps the message can be determined to have originated within
a cooperating subnet whose members are considered  trustwor-
thy.  Abuses have occurred.
 
7.5. sendsys, version, whogets
The  sendsys  message  requests  that  a  description of the
relayer's news feeds to other  relayers  be  mailed  to  the
article's reply address:
 
     sendsys-arguments  = [ relayer-name ]
     sendsys-body       = body
If  there  is an argument, relayers other than the one named
by the argument MUST not respond.  The body  is  a  comment,
which software MUST ignore; it SHOULD contain an explanation
of the reason for the request.
 
The version message requests that the name  and  version  of
the relayer software be mailed to the reply address:
 
     version-arguments  =
     version-body       = body
There  are no arguments.  The body is a comment, which soft-
ware MUST ignore; it SHOULD contain an  explanation  of  the
reason for the request.
 
The  whogets  message  requests  that  a  description of the
relayer and its news feeds to other relayers  be  mailed  to
the article's reply address:
 
     whogets-arguments  = newsgroup-name [ space relayer-name ]
     whogets-body       = body
The  first  argument  is the name of the "target newsgroup",
specifying the newsgroup for which  propagation  information
is desired.  This MUST be a complete newsgroup name, not the
name of a hierarchy or a portion of a newsgroup name that is
not  itself  the  name of a newsgroup.  If there is a second
argument, only the relayer named  by  that  argument  should
respond.  The body is a comment, which software MUST ignore;
it SHOULD contain an  explanation  of  the  reason  for  the
request.
 
     NOTE:  Whogets  is  intended  as a replacement for
     sendsys (and version) with  a  precisely-specified
     reply  format.   Since  the  syntax for specifying
     what newsgroups get sent to  what  other  relayers
     varies  widely  between different forms of relayer
     software, the only practical  way  to  standardize
     the  reply  format is to indicate a specific news-
     group and ask  where  THAT  newsgroup  propagates.
     The  requirement  that  it be a complete newsgroup
     name is intended to (largely) avoid the problem of
     having  to  answer "yes and no" in cases where not
     all newsgroups in a hierarchy are sent.
 
Any of these messages lacking an  Approved  header  MUST  be
ignored.   Response  to  any  of  these  messages  SHOULD be
delayed for at least 24 hours, and  no  response  should  be
attempted  if  the  message has been cancelled in that time.
Also, no response SHOULD be attempted unless the local  part
of    the    destination   address   is   "newsmap".    News
administrators SHOULD arrange for mail to "newsmap" on their
systems  to  be  discarded (without reply) unless legitimate
use is in progress.
 
     NOTE: Because these messages can cause many,  many
     relayers  to  send  mail  to one person, such mes-
     sages, specifying mailing to an innocent  person's
     mailbox, have been forged as a half-witted practi-
     cal joke.  A delay gives  administrators  time  to
     notice a fraudulent message and act (by cancelling
     the message, preparing to divert the flood of mail
     into the bit bucket, or both).  Restriction of the
     destination  address  to  "newsmap"  reduces   the
     appeal  of fraud by making it impossible to use it
     to harass a normal user.  (A site which  does  NOT
     discard  mail  to "newsmap", but rather bounces it
     back, may incur higher communications  costs  than
     if  the mail had been accepted into a user's mail-
     box... but a  malicious  forger  could  accomplish
     this  anyway, by using an address whose local part
     is very unlikely to be a legitimate mailbox name.)
 
     NOTE: RFC 1036 did not require the Approved header
     for these control messages.  This has  been  added
     because  of  the  possibility  that  cryptographic
     authentication of  Approved  headers  will  become
     available.
 
The  body of the reply to a sendsys message SHOULD be of the
form:
 
     sendsys-reply      = responder 1*sys-line
     responder          = "Responding-System:" space domain eol
     sys-line           = relayer-name ":" newsgroup-patterns [ ":" text ] eol
     newsgroup-patterns = newsgroup-name *( "," newsgroup-name )
The first line identifies the  responding  system,  using  a
syntax  resembling a header (but note that it is part of the
BODY).  Remaining lines indicate what newsgroups are sent to
what other systems.  The syntax of newsgroup patterns is not
well standardized; the form described is common (often  with
newsgroup  names  only  partially  given, denoting all names
starting with a particular set of components) but  not  uni-
versal.   The  whogets  message  provides  a  better-defined
alternative.
 
The reply to a version message is  of  somewhat  ill-defined
form,  with  a  body normally consisting of a single line of
text that somehow describes the version of the relayer soft-
ware.   The whogets message provides a better-defined alter-
native.
 
The body of the reply to a whogets message MUST  be  of  the
form:
 
     whogets-reply      = responder-domain responder-relayer response-date
                          responding-to arrived-via responder-version
                          whogets-delimiter *pass-line
     responder-domain   = "Responding-System:" space domain eol
     responder-relayer  = "Responding-Relayer:" space relayer-name eol
     response-date      = "Response-Date:" space date eol
     responding-to      = "Responding-To:" space message-id eol
     arrived-via        = "Arrived-Via:" path-list eol
     responder-version  = "Responding-Version:" space nonblank-text eol
     whogets-delimiter  = eol
     pass-line          = relayer-name [ space domain ] eol
The  first  six lines identify the responding relayer by its
Internet domain name  (use  of  the  ".uucp"  and  ".bitnet"
pseudo-domains is permissible, for registered hosts in them,
but discouraged) and its relayer name, specify the date when
the  reply  was  generated and the message ID of the whogets
message being replied to, give the path list (from the  Path
header)  of  the  whogets  message (which MAY, if absolutely
necessary, be truncated to a  convenient  length,  but  MUST
contain at least the leading three relayer names), and indi-
cate the version of relayer software responding.  Note  that
these  lines  are  part of the BODY even though their format
resembles that of  headers.   Despite  the  apparently-fixed
order  specified by the syntax above, they can appear in any
order, but there must be exactly one of each.
 
After those preliminaries, and an empty  line  to  unambigu-
ously  define their end, the remaining lines are the relayer
names (which MAY be accompanied by the corresponding  domain
names,  if  known)  of  systems  which the responding system
passes the target newsgroup to.   Only  the  names  of  news
relayers are to be included.
 
     NOTE:  It is desirable for a reply to identify its
     source  by  both  domain  name  and  relayer  name
     because news propagation is governed by the latter
     but location in a broader context is  best  deter-
     mined by the former.  The date and whogets message
     ID should, in principle, be present  in  the  MAIL
     headers,  but are included in the body for robust-
     ness in the presence of  uncooperative  mail  sys-
     tems.   The  reason for the path list is discussed
     below.  Adding version information eliminates  the
     need for a separate message to gather it.
 
     NOTE: The limitation of pass lines to contain only
     names of news relayers is meant to  exclude  names
     used within a single host (as identifiers for mail
     gateways,  portions  of  ihave/sendme  implementa-
     tions, etc.), which do not actually refer to other
     hosts.
 
A relayer which is unaware of the existence  of  the  target
newsgroup  MUST  not  reply  to  a  whogets  message at all,
although this MUST not influence  decisions  on  whether  to
pass the article on to other relayers.
 
     NOTE:  While this may result in discontinuous maps
     in  cases  where  some  hosts  have  not   honored
     requests for creation of a newsgroup, it will also
     prevent a flood of useless responses in the  event
     that  a  whogets  message  intended to map a small
     region "leaks" out to a larger one.  The possibil-
     ity  of  discontinuous  recognition of a newsgroup
     does make it important that  the  whogets  message
     itself  continue  to  propagate (if other criteria
     permit).  This is also the reason for  the  inclu-
     sion  of  the  whogets  message's path list, or at
     least the leading portion of it, in the reply:  to
     permit  reconstruction  of  at least small gaps in
     maps.
 
Different networks set different rules for the legitimacy of
these  messages, given that they may reveal details of orga-
nization-internal topology  that  are  sometimes  considered
proprietary.
 
     NOTE:  On  Usenet,  in  particular, willingness to
     respond to these messages is held to be  a  condi-
     tion of network membership: the topology of Usenet
     is public information.  Organizations  wishing  to
     belong to such networks while keeping their inter-
     nal topology confidential might wish  to  organize
     their  internal news software so that all articles
     reaching outsiders appear  to  be  from  a  single
     "gatekeeper"  system, with the details of internal
     topology hidden behind that system.
 
     UNRESOLVED ISSUE: It might be useful to have a way
     to set some sort of hop limit for these.
 
7.6. checkgroups
The   checkgroups  control  message  contains  a  supposedly
authoritative list of the valid newsgroups within some  sub-
set of the newsgroup name space:
 
     checkgroups-arguments  =
     checkgroups-body       = [ invalidation ] valid-groups
                            / invalidation
     invalidation           = "!" plain-component *( "," plain-component ) eol
     valid-groups           = 1*( description-line eol )
There are no arguments.  The body lines (except possibly for
an initial invalidation) each contain a description line for
a  newsgroup, as defined under the newgroup message (section
7.3).
 
     NOTE: Some other, ill-defined, forms of the check-
     groups body were formerly used.  See appendix A.
 
The  checkgroups message applies to all hierarchies contain-
ing any of the newsgroups listed in the  body.   The  check-
groups  message asserts that the newsgroups it lists are the
only newsgroups in those hierarchies.  If there is an inval-
idation,  it asserts that the hierarchies it names no longer
contain any newsgroups.
 
Processing a checkgroups message MAY cause a local  list  of
newsgroup  descriptions to be updated.  It SHOULD also cause
the local lists of newsgroups  (and  their  moderation  sta-
tuses)  in  the  mentioned hierarchies to be checked against
the message.  The results of the check MAY be used for auto-
matic  corrective  action,  or  MAY  be reported to the news
administrator in some way.
 
     NOTE:  Automatically  updating   descriptions   of
     existing  newsgroups  is  relatively safe.  In the
     case of newsgroup additions or  deletions,  simply
     notifying  the administrator is generally the wis-
     est action, unless  perhaps  the  message  can  be
     determined to have originated within a cooperating
     subnet whose members are considered trustworthy.
 
     NOTE: There is a problem with the checkgroups con-
     cept:  not all newsgroups in a hierarchy necessar-
     ily  propagate  to  the  same  set  of   machines.
     (Notably,  there  is  a set of newsgroups known as
     the "inet" newsgroups, which have relatively  lim-
     ited  distribution  but coexist in several hierar-
     chies with  more  widely-distributed  newsgroups.)
     The  advice  of checkgroups should always be taken
     with a grain of salt, and should never be followed
     blindly.
 
 
8. Transmission Formats
While  this  Draft  does  not  specify  transmission methods
except to place a few constraints on them,  there  are  some
data  formats  used only for transmission that are unique to
news.
 
8.1. Batches
For efficient bulk transmission and processing of news arti-
cles,  it is often desirable to transmit a number of them as
a single block of data, a "batch".  The format  of  a  batch
is:
 
     batch         = 1*( batch-header article )
     batch-header  = "#! rnews " article-size eol
     article-size  = 1*digit
A batch is a sequence of articles, each prefixed by a header
line that includes its size.  The article size is a  decimal
count of the octets in the article, counting each EOL as one
octet regardless of how it is actually represented.
 
     NOTE: A relayer might wish to accept either a sin-
     gle article or a batch as input.  Since "#" cannot
     appear in a header name, examination of the  first
     octet of the input will reveal its nature.
 
     NOTE:  In  the  header  line, there is exactly one
     blank before "rnews", there is exactly  one  blank
     after "rnews", and the EOL immediately follows the
     article size.  Beware that some  software  inserts
     non-standard trash after the size.
 
     NOTE: Despite the similarity of this format to the
     executable-script format used  by  some  operating
     systems,  it  is  EXTREMELY  unwise  to  just feed
     incoming batches to a command interpreter  in  the
     anticipation  that  it  will  run  a command named
     "rnews" to process the batch.  Unless arrangements
     are  made  to  very  tightly restrict the range of
     commands that can be executed by this  means,  the
     security implications are disastrous.
 
8.2. Encoded Batches
When transmitting news, especially over communications links
that are slow or are billed by the bit, it is  often  desir-
able  to  batch  news  and  apply  data  compression  to the
batches.   Transmission  links  sending  compressed  batches
SHOULD use out-of-band means of communication to specify the
compression algorithm being used.  If there  is  no  way  to
send out-of-band information along with a batch, the follow-
ing encapsulation for a compressed batch MAY be used:
 
     ec-batch             = "#! " compression-keyword eol compressed-batch
     compression-keyword  = "cunbatch"
A line containing a keyword indicating the type of  compres-
sion  is  followed  by the compressed batch.  The only truly
widespread compression keyword  at  present  is  "cunbatch",
indicating  compression  using  the widely-distributed "com-
press" program.  Other compression keywords MAY be  used  by
mutual agreement between the hosts involved.
 
     NOTE:  An encapsulated compressed batch is NOT, in
     general, a text file, despite  having  an  initial
     text  line.  This combination of text and non-text
     data is often  awkward  to  handle;  for  example,
     standard  decompression  programs  cannot  be used
     without first stripping off the initial line,  and
     that  in  turn is painful to do because many text-
     handling tools that are  superficially  suited  to
     the  job  do  not  cope  well  with non-text data.
     Hence the recommendation that out-of-band communi-
     cation be used instead when possible.
 
     NOTE: For UUCP transmission, where a batch is typ-
     ically transmitted by invoking the remote  command
     "rnews"  with  the  batch  as  its input stream, a
     plausible out-of-band method for indicating a com-
     pression  type would be to give a compression key-
     word in an option to "rnews", perhaps in the form:
 
          rnews -d decompressor
     where  "decompressor"  is the name of a decompres-
     sion program (e.g. "uncompress" for a  batch  com-
     pressed  with  "compress"  or "gunzip" for a batch
     compressed with "gzip").  How  this  decompression
     program  is  located  and invoked by the receiving
     relayer is implementation-specific.
 
     NOTE: See the notes in section 8.1 on the inadvis-
     ability  of  feeding  batches  directly to command
     interpreters.
 
     NOTE: There is exactly one blank between "#!"  and
     the  compression  keyword, and the EOL immediately
     follows the keyword.
 
8.3. News Within Mail
It is often desirable to transmit news as mail,  either  for
the  convenience of a human recipient or because that is the
only type of transmission available on a restrictive  commu-
nication path.
 
Given  the  similarity  between the news format and the MAIL
format, it is superficially attractive to just send the news
article  as  a  mail  message.  This is typically a mistake:
mail-handling software often feels free to manipulate  vari-
ous  headers  in  undesirable  ways  (in some cases, such as
Sender, such manipulation is actually mandatory),  and  mail
transmission  problems etc. MUST be reported to the adminis-
trators responsible for the mail transmission rather than to
the  article's author.  In general, news sent as mail should
be encapsulated to separate the mail headers  and  the  news
headers.
 
When  the intended recipient is a human, any convenient form
of encapsulation may be used.  Recommended  practice  is  to
use   MIME  encapsulation  with  a  content  type  of  "mes-
sage/news", given that news articles have additional  seman-
tics beyond what "message/rfc822" implies.
 
     NOTE:  "message/news" was registered as a standard
     subtype by IANA 22 June 1993.
 
When mail is being used as a transmission path  between  two
relayers,  however,  a  standard  method is desirable.  Cur-
rently the standard method is to send the mail to an address
whose  local part is "rnews", with whatever mail headers are
necessary for successful  transmission.   The  news  article
(including its headers) is sent as the body of the mail mes-
sage, with an "N" prepended to each line.
 
     NOTE: The "N" reduces the probability of an  inno-
     cent line in a news article being taken as a magic
     command to mail software, and makes  it  easy  for
     receiving software to strip off any lines added by
     mail software (e.g. the trailing empty line  added
     by some UUCP mail software).
 
This  method  has its weaknesses.  In particular, it assumes
that the mail  transmission  channel  can  transmit  nearly-
arbitrary body text undamaged.  When mail is being used as a
transmission path of last resort, however, the  mail  system
often has inconvenient preconceived notions about the format
of message bodies.  Various  ad-hoc  encoding  schemes  have
been used to avoid such problems.  The recommended method is
to send a news article or batch as the body of a  MIME  mail
message,  using content type "application/news-transmission"
and MIME's "base64" encoding (which is specifically designed
to survive all known major mail systems).
 
     NOTE:  In  the  process, MIME conventions could be
     used to fragment and reassemble an  article  which
     is  too  large to be sent as a single mail message
     over a transmission path  that  restricts  message
     length.   In addition, the "conversions" parameter
     to the content type could be used to indicate what
     (if  any)  compression  method has been used.  And
     the Content-MD5 header [rrr 1544] can be used as a
     "checksum" to provide high confidence of detecting
     accidental damage to the contents.
 
     UNRESOLVED ISSUE: The "conversions"  parameter  no
     longer exists.  What should be done about this, if
     anything?
 
     NOTE: It might look tempting to use a content type
     such  as  "message/X-netnews",  but MIME bans non-
     trivial encodings of the entire body  of  messages
     with  content  type  "message".   The intent is to
     avoid obscuring nested structure underneath encod-
     ings.   For inter-relayer news transmission, there
     is no nested structure  of  interest,  and  it  is
     important  that  the entire article (including its
     headers, not just its body) be  protected  against
     the  vagaries  of intervening mail software.  This
     situation appears to fit the MIME  description  of
     circumstances in which "application" is the proper
     content type.
 
     NOTE:  "application/news-transmission",   with   a
     "conversions" parameter, was registered as a stan-
     dard subtype by IANA 22 June 1993.
 
     UNRESOLVED ISSUE: The "conversions"  parameter  no
     longer  exists  in  MIME.  What should we do about
     this?
 
8.4. Partial Batches
     UNRESOLVED ISSUE: The existing  batch  conventions
     assemble  (potentially)  many  articles  into  one
     batch.  Handling very large articles would be sub-
     stantially  less  troublesome  if there was also a
     fragmentation convention  for  splitting  a  large
     article  into  several  batches.   Is  this  worth
     defining at this time?
 
 
9. Propagation and Processing
Most aspects of news propagation and processing  are  imple-
mentation-specific.   The  basic propagation algorithms, and
certain details of how they  are  implemented,  nevertheless
need to be standard.
 
There  are  two  important principles that news implementors
(and administrators) need to keep in mind.  The first is the
well-known Internet Robustness Principle:
 
     Be liberal in what you accept, and conservative in what you send.
 
However, in the case of news there is an even more important
principle, derived from a much older code of  practice,  the
Hippocratic  Oath  (we  will  thus call this the Hippocratic
Principle):
 
     First, do no harm.
 
It is VITAL to realize that decisions which might be  merely
suboptimal  in a smaller context can become devastating mis-
takes when amplified by the actions of  thousands  of  hosts
within a few hours.
 
9.1. Relayer General Issues
Relayers  MUST not alter the content of articles unnecessar-
ily.  Well-intentioned attempts  to  "improve"  headers,  in
particular,  typically do more harm than good.  It is neces-
sary for a relayer to prepend its own name to the Path  con-
tent  (see section 5.6) and permissible for it to rewrite or
delete the Xref header (see  section  6.12).   Relayers  MAY
delete the thoroughly-obsolete headers described in appendix
A.3, although this behavior no longer seems useful enough to
encourage.   Other  alterations  SHOULD  be  avoided  at all
costs, as per the Hippocratic Principle.
 
     NOTE: As discussed in section 2.3, tidying up  the
     headers  of  a user-prepared article is the job of
     the posting agent, not the relayer.  The relayer's
     purpose  is  to  move  already-compliant  articles
     around efficiently without  damaging  them.   Note
     that  in  existing  implementations, specific pro-
     grams may contain both posting-agent functions and
     relayer  functions.  The distinction is that post-
     ing-agent functions are invoked only  on  articles
     posted   by   local  posters,  never  on  articles
     received from other relayers.
 
     NOTE: A particular corollary of this rule is  that
     relayers  should not add headers unless truly nec-
     essary.  In particular, this is not SMTP;  do  not
     add Received headers.
 
Relayers  MUST  not pass non-conforming articles on to other
relayers, except perhaps in a cooperating  subnet  that  has
agreed  to  permit certain kinds of non-conforming behavior.
This is a direct  consequence  of  the  Internet  Robustness
Principle.
 
The  two  preceding paragraphs may appear to be in conflict.
What  is  to  be  done  when  a  non-conforming  article  is
received?  The Robustness Principle argues that it should be
accepted but must not be passed on to other  relayers  while
still non-conforming, and the Hippocratic Principle strongly
discourages attempts at repair.  The  conclusion  that  this
appears  to lead to is correct: a non-conforming article MAY
be accepted for local filing and processing, or  it  MAY  be
discarded  entirely,  but  it MUST not be passed on to other
relayers.
 
A relayer MUST not respond to the arrival of an  article  by
sending mail to any destination, other than a local adminis-
trator, except by explicit prearrangement with  the  recipi-
ent.   Neither  posting an article (other than certain types
of control message, see section 7.5) nor being the moderator
of  a  moderated  newsgroup constitutes such prearrangement.
UNDER NO CIRCUMSTANCES WHATSOEVER may a relayer  attempt  to
send  mail to either an article's originator or a moderator.
 
     NOTE: Reporting apparent errors in message  compo-
     sition  is  the  job  of  a  posting  agent, not a
     relayer.  The same is true of  mailing  moderated-
     newsgroup  postings to moderators.  In networks of
     thousands of cooperating relayers,  it  is  simply
     unacceptable  for  there  to  be  any circumstance
     whatsoever that causes any significant fraction of
     them  to simultaneously send mail to the same des-
     tination.  (Some control messages are  exceptions,
     although  perhaps  ill-advised ones.)  What might,
     in a smaller network, be a useful notification  or
     forwarding becomes a deluge of near-identical mes-
     sages that can bring mail software  to  its  knees
     and  severely  inconvenience  recipients.  Modera-
     tors, in particular,  historically  have  suffered
     grievously from this.
 
Notification  of  problems  in  incoming  articles MAY go to
local administrators, or at most  (by  prearrangement!)   to
the administrators of the neighboring relayer(s) that passed
on the problematic articles.
 
     NOTE: It would be desirable to notify  the  author
     that his posting is not propagating as he expects.
     However, there is no known method for  doing  this
     that  will  scale  up gracefully.  (In particular,
     "notify only if within N relayers of the  origina-
     tor" falls down in the presence of commercial news
     services like UUNET:  there  may  be  hundreds  or
     thousands  of  relayers within a couple of hops of
     the originator.)  The best that can be done  right
     now is to notify neighbors, in hopes that the word
     will eventually propagate up the line, or organize
     regional monitoring at major hubs.
 
If it is necessary to alter an article, e.g. translate it to
another character  set  or  alter  its  EOL  representation,
strenuous  efforts should be made to ensure that such trans-
formations are reversible, and that relayers or other  soft-
ware  that might wish to reverse them know exactly how to do
so.
 
     NOTE:  For  example,  a  cooperating  subnet  that
     exchanges articles using a non-ASCII character set
     like EBCDIC should define a  standard,  reversible
     ASCII-EBCDIC mapping and take pains to see that it
     is used at all points where the subnet  meets  the
     outside.   If  the only reason for using EBCDIC is
     that the readers typically employ EBCDIC  devices,
     it  would  be  more  robust to employ ASCII as the
     interchange format and do  the  transformation  in
     the reading and posting agents.
 
9.2. Article Acceptance And Propagation
When  a  relayer  first  receives an article, it must decide
whether to accept it.  (This applies regardless  of  whether
the  article arrived by itself or as part of a batch, and in
principle regardless of whether it  originated  as  a  local
posting or as traffic from another relayer.)  In a cooperat-
ing subnet with well-controlled propagation paths,  some  of
the  tests  specified  here  MAY  be delegated to centrally-
located relayers; that is, relayers that  can  receive  news
ONLY  via  one of the central relayers might simplify accep-
tance testing based on the assumption that incoming  traffic
has  already  passed  the  full  set  of  tests at a central
relayer.
 
The wording that follows is based on a model in which  arti-
cles  arrive on a relayer's host before acceptance tests are
done.  However, depending on the degree  of  integration  of
the  transport  mechanisms  and  the relayer, some or all of
these tests MAY be  done  before  the  article  is  actually
transmitted,  so  that articles which definitely will not be
accepted need not be transmitted at all.
 
The wording that follows also specifies a  particular  order
for  the  acceptance tests.  While this order is the obvious
one, the tests MAY be done in any order.
 
First, the relayer MUST verify that the article is  a  legal
news  article, with all mandatory headers present with legal
contents.
 
     NOTE: This check in principle is done by the first
     relayer  to see an article, so an article received
     from another relayer should always be  legal,  but
     there  is  enough  old  software still operational
     that this cannot be taken  for  granted;  see  the
     discussion of the Internet Robustness Principle in
     section 9.1.
 
Second, the relayer MUST determine whether  it  has  already
seen  this  article (identified by its message ID).  This is
normally done by retaining a history of all article  message
IDs seen in the last N days, where the value of N is decided
by the relayer's administrator but SHOULD  be  at  least  7.
Since  N cannot practically be infinite, articles whose Date
content indicates that  they  are  older  than  N  days  are
declared "stale" and are deemed to have been seen already.
 
     NOTE:  This check is important because news propa-
     gation  topology  is  typically  redundant,  often
     highly  so,  and  it  is not at all uncommon for a
     relayer to receive the same article  from  several
     neighbors.   The  history  of already-seen message
     IDs can get quite large, hence the desire to limit
     its  length... but it is important that it be long
     enough that slowly-propagating  articles  are  not
     classed  as  stale.   News  propagation within the
     Internet is normally very  rapid,  but  when  UUCP
     links  are  involved, end-to-end delays of several
     days are not rare, so a week is not a particularly
     generous minimum.
 
     NOTE:  Despite generally more rapid propagation in
     recent times, it is still not unheard-of for  some
     propagation  paths  to  be  very  slow.   This can
     introduce the possibility of old articles arriving
     again after they are gone from the history.  Hence
     the "stale" rule.
 
Third, the relayer MUST determine whether any of  the  arti-
cle's newsgroups are "subscribed to" by the host, i.e. fit a
description of what hierarchies or newsgroups the site wants
to receive.
 
     NOTE:  This  check is significant because informa-
     tion  on  what  newsgroups  a  relayer  wishes  to
     receive  is often stored at its neighbors, who may
     not have up-to-date information  or  may  simplify
     the  rules for implementation reasons.  As a hedge
     against the possibility of missed or delayed  new-
     group  control  messages,  relayers  may  wish  to
     observe a notion of a newsgroup subscription  that
     is  independent of the list of newsgroups actually
     known to the relayer.  This would permit reception
     and  relaying  of  articles in newsgroups that the
     relayer is not (yet) aware  of,  subject  to  more
     general  criteria  indicating that they are likely
     to be of interest.
 
Once an article has been accepted, it may be  passed  on  to
other  relayers.  The fundamental news propagation rule is a
flooding algorithm: on receiving and accepting  an  article,
send  it to all neighboring relayers not already in its path
list that are sent its newsgroup(s) and distribution(s).
 
     NOTE: The path list's role in loop prevention  may
     appear  relatively unimportant, given that looping
     articles would typically be rejected as duplicates
     anyway.    However,   the   path  list's  role  in
     preventing superfluous transmissions is not  triv-
     ial.   In  particular,  the  path list is the only
     thing that prevents relayer  X,  on  receiving  an
     article  from relayer Y, from sending it back to Y
     again.  (Indeed, the usual  symptom  of  confusion
     about  relayer  names  is that incoming news loops
     back in this manner.)  The looping articles  would
     be rejected as duplicates, but doubling the commu-
     nications load on every news transmission path  is
     not to be taken lightly!
 
In  general,  relayers SHOULD not make propagation decisions
by "anticipation": relayer X, noting that the article's path
list  already  contains relayer Y, decides not to send it to
relayer Z because X anticipates that Z will get the  article
by  a  better  path.  If that is generally true, then why is
there a news feed from X to Z at all?  In fact, the  "better
path"  may  be running slowly or may be down.  News propaga-
tion is very robust precisely because some redundant  trans-
mission  is  done  "just  in  case".  If it is imperative to
limit unnecessary traffic on a path, use of  NNTP  [rrr]  or
ihave/sendme  (see  section  7.2) to pass articles only when
necessary is better than arbitrary  decisions  not  to  pass
articles at all.
 
Anticipation  is  occasionally  justified  in special cases.
Such cases should involve  both  (1)  a  cooperating  subnet
whose   propagation  paths  are  well-understood  and  well-
monitored, with failures and  slowdowns  noticed  and  dealt
with  promptly, and (2) a persistent pattern of heavy unnec-
essary traffic on a path that is either slow or costly.   In
addition,  there  should be some reason why neither NNTP nor
ihave/sendme is suitable as a solution to the problem.
 
9.3. Administrator Contact
It is desirable to have a standardized contact address for a
relayer's  administrators, in the spirit of the "postmaster"
address for mail administrators.  Mail addressed  to  "news-
master"  on a relayer's host MUST go to the administrator(s)
of  that  relayer.   Mail  addressed  to  "usenet"  on   the
relayer's  host  SHOULD be handled likewise.  Mail addressed
to either  address  on  other  hosts  using  the  same  news
database SHOULD be handled likewise.
 
     NOTE: These addresses are case-sensitive, although
     it would be desirable for sequences equivalent  to
     them  using case-insensitive comparison to be han-
     dled likewise.  While "newsmaster" seems the  pre-
     ferred  network-independent address, by analogy to
     "postmaster", there is  an  existing  practice  of
     using  "usenet"  for this purpose, and so "usenet"
     should be supported if at all possible (especially
     on  hosts  belonging  to  Usenet!).   The  address
     `news" is also sometimes used  for  purposes  like
     this, but less consistently.
 
 
10. Gatewaying
Gatewaying of traffic between news networks using this Draft
and those using other exchange mechanisms can be useful, but
must  be done cautiously.  Gateway administrators are taking
on significant responsibilities, and must recognize that the
consequences of error can be quite serious.
 
10.1. General Gatewaying Issues
This section will primarily address the problems of gateway-
ing traffic INTO news networks.  Little can  be  said  about
the  other  direction without some specific knowledge of the
network(s)  involved.   However,  the  two  issues  are  not
entirely  independent:  if  a  non-news network is gatewayed
into a news network at more than one point, traffic injected
into  the  non-news  network  by  one  gateway may appear at
another as a candidate for injection back into the news net-
work.
 
This raises a more general principle, the single most impor-
tant issue for gatewaying:
 
     Above all, prevent loops.
 
The normal loop prevention of news transmission  is  vitally
dependent on the Message-ID header.  Any gateway which finds
it necessary to remove this header, alter it,  or  supersede
it (by moving it into the body), MUST take equally effective
precautions against looping.
 
     NOTE: There are few things more effective at turn-
     ing  news readers into a lynch mob than a malfunc-
     tioning gateway, or pair of gateways,  that  takes
     in news articles, mangles them just enough to pre-
     vent news relayers from recognizing them as dupli-
     cates,  and  regurgitates  them back into the news
     stream.  This happens rather too often.
 
Gateway implementors should realize that gateways  have  all
the  responsibilities  of relayers, plus the added complica-
tions introduced by transformations between different infor-
mation  formats.   Much of section 9's discussion of relayer
issues is relevant to  gateways  as  well.   In  particular,
gateways SHOULD keep a history of recently-seen articles, as
described in section 9.2, and not assume that articles  will
never reappear.  This is particularly important for networks
that have their own concept  analogous  to  message  IDs:  a
gateway  should  keep  a  history  of traffic seen from BOTH
directions.
 
If at all possible, articles entering the  non-news  network
SHOULD  be  marked  in some way so that they will NOT be re-
gatewayed back into news.  Multiple gateways obviously  must
agree  on  the  marking method used; if it is done by having
them know each others' names, name changes MUST  be  coordi-
nated  with  great  care.   If  marking  cannot be done, all
transformations MUST be reversible so  that  a  re-gatewayed
article  is  identical to the original (except perhaps for a
longer Path header).
 
Gateways MUST not pass control messages (articles containing
Control, Also-Control, or Supersedes headers) without remov-
ing the headers that  make  them  control  messages,  unless
there  are compelling reasons to believe that they are rele-
vant to both sides and that conventions are compatible.   If
it  is truly desirable to pass them unaltered, suitable pre-
cautions MUST be taken to ensure that there is NO  POSSIBIL-
ITY of a looping control message.
 
     NOTE:  The damage done by looping articles is mul-
     tiplied a thousandfold  if  one  of  the  affected
     articles  is something like a sendsys message (see
     section  7.3)  that  requests  multiple  automatic
     replies.   Most  gateways  simply  should not pass
     control messages at all.  If some  unusual  reason
     dictates doing so, gateway implementors and admin-
     istrators are urged to consider bulletproof  rate-
     limiting  measures  for  the more destructive ones
     like sendsys, e.g. passing only one  per  hour  no
     matter how many are offered.
 
Gateways,  like  relayers, SHOULD make determined efforts to
avoid mangling articles unnecessarily.  In the case of gate-
ways,  some  transformations  may be inevitable, but keeping
them to a minimum and ensuring that they are  reversible  is
still highly desirable.
 
Gateways  MUST avoid destroying information.  In particular,
the restrictions of section 4.2.2  are  best  taken  with  a
grain  of salt in the context of gateways.  Information that
does not translate directly  into  news  headers  SHOULD  be
retained, perhaps in "X-" headers, both because it may be of
interest to sophisticated readers and because it may be cru-
cial to tracing propagation problems.
 
Gateway implementors should take particular note of the dis-
cussion of mailed replies, or  more  precisely  the  ban  on
same,  in section 9.1.  Gateway problems MUST be reported to
the local administration, not to the innocent originator  of
traffic.   "Gateway  problems"  here  includes  all forms of
propagation anomaly on the non-news  side  of  the  gateway,
e.g.  unreachable  addresses  on  a mailing list.  Note that
this  requires  consideration  of  possible  misbehavior  of
"downstream" hosts, not just the gateway host.
 
10.2. Header Synthesis
News  articles prepared by gateways MUST be legal news arti-
cles.  In particular, they MUST include all of the mandatory
headers  (see  section  5)  and  MUST  fully  conform to the
restrictions on said headers.  This often  requires  that  a
gateway function not only as a relayer, but also partly as a
posting agent, aiding in the synthesis of a conforming arti-
cle from non-conforming input.
 
     NOTE:  The full-conformance requirement needs par-
     ticularly careful attention when gatewaying  mail-
     ing  lists to news, because a number of constructs
     that are legal in MAIL headers are NOT permissible
     in  news  headers.   (Note  also that not all mail
     traffic fully conforms to even the MAIL specifica-
     tion.)   The  rest of this section will be phrased
     in terms of mail-to-news gatewaying, but  most  of
     it is more generally applicable.
 
The mandatory headers generally present few problems.
 
If no date information is available, the gateway should sup-
ply a Date header with the gateway's current date.  If  only
partial  information  is available (e.g. date but not time),
this should be fleshed out to a full Date header  by  adding
default values, not by mixing in parts of the gateway's cur-
rent date.  (Defaults should be chosen so  that  fleshed-out
dates  will  not  be in the future!)  It may be necessary to
map timezone information to the restricted  forms  permitted
in the news Date header.  See section 5.1.
 
     NOTE:  The  prohibition  of mixing dates is on the
     theory that it is better to admit  ignorance  than
     to lie.
 
If  the author's address as supplied in the original message
is not suitable for inclusion in a From header, the  gateway
MUST  transform it so it is, e.g. by use of the "% hack" and
the domain address of the gateway.  The desire  to  preserve
information  is  NOT  an excuse for violating the rules.  If
the transformation is drastic enough that there is reason to
suspect  loss of information, it may be desirable to include
the original form in an X- header,  but  the  From  header's
contents MUST be as specified in section 5.2.
 
If  the  message  contains a Message-ID header, the contents
should be dealt with as discussed in section 10.3.  If there
is no message ID present, it will be necessary to synthesize
one, following the news rules (see section 5.3).
 
Every effort should be made to produce a meaningful  Subject
header;  see section 5.4.  Many news readers select articles
to read based on Subject headers,  and  inserting  a  place-
holder  like  "<no  subject available>" is considered highly
objectionable.  Even synthesizing a Subject header by  pick-
ing  out  the  first  half-dozen nouns and adjectives in the
article body is better than using a  placeholder,  since  it
offers SOME indication of what the article might contain.
 
The contents of the Newsgroups header (section 5.5) are usu-
ally predetermined by gateway configuration, but  a  gateway
to  a network that has its own concept of newsgroups or dis-
cussions might have to make transformations.  Such transfor-
mations  should be reversible; otherwise confusion is likely
on both sides.
 
It will rarely be possible for gateways to  provide  a  Path
header  that is both an accurate history of the relayers the
article has passed  through  AS  NEWS  and  a  usable  reply
address.   The  history function MUST be given priority; see
the discussion in section 5.6.  It will usually be necessary
for  a  gateway to supply an empty path list, abandoning the
reply function.
 
It is desirable for gatewayed articles  to  convey  as  much
useful information as possible, e.g. by use of optional news
headers (see section 6) when  the  relevant  information  is
available.  Synthesis of optional headers can generally fol-
low similar rules.
 
Software synthesizing References  headers  should  note  the
discussion  in  section  6.5  concerning the incompatibility
between MAIL and news.  Also of interest is the  possibility
of  incorporating  information  from In-Reply-To headers and
from attribution lines in the body; an incomplete  or  some-
what  conjectural References header is much better than none
at all, and reading agents already have to cope with  incom-
plete or slightly erroneous References lists.
 
10.3. Message ID Mapping
This  section, like the previous one, is phrased in terms of
mail being gatewayed into news, but most of  the  discussion
should be more generally applicable.
 
A  particularly  sticky problem of gatewaying mail into news
is supplying legal news message IDs.  Note,  in  particular,
that  not  all  MAIL message IDs are legal in news; the news
syntax (specified in section 5.3, with related  material  in
5.2)  is  more  restrictive.   Generating a fully-conforming
news article from a mail message  may  require  transforming
the message ID somewhat.
 
Generation and transformation of message IDs assumes partic-
ular importance if a given mailing  list  (or  whatever)  is
being handled by more than one gateway.  It is highly desir-
able that the same article contents not appear twice in  the
same  newsgroup,  which  requires that they receive the same
message ID from all gateways.  Gateways SHOULD use the  fol-
lowing  algorithm (possibly modified by the later discussion
of gatewaying into more than  one  newsgroup)  unless  local
considerations dictate another:
 
 
        Separate message ID from surroundings, if necessary.
        A plausible method for this is to start at the first
        "<",  end at the next ">", and reject the message if
        no ">" is found or a second "<" is seen  before  the
        ">".  Also reject the message if the message ID con-
        tains no "@" or more than one "@", or if it contains
        no  ".".   Also reject the message if the message ID
        contains non-ASCII characters, ASCII control charac-
        ters, or white space.
        
          NOTE:  Any  legitimate domain will include at
          least one ".".  RFC 822 section 6.2.2 forbids
          white space in this context when passing mail
          on to non-MAIL software.
         
          Delete the leading "<" and trailing ">".  Separate
          message  ID into local part and domain at the "@".
 
          In both  components,  transliterate  leading  dots
          (".", ASCII 46), trailing dots, and dots after the
          first in sequences  of  two  or  more  consecutive
          dots, into underscores (ASCII 95).
 
          In both components, transliterate disallowed char-
          acters other than  dots  (see  the  definition  of
          <unquoted-char>  in  section  5.2)  to underscores
          (ASCII 95).
 
          Form the message ID as "<" local-part "@" domain ">"
 
     NOTE: This algorithm is approximately that of Rich
     Salz's successful gatewaying package.
 
Despite  the  desire  to  keep message IDs consistent across
multiple gateways, there is also a more  subtle  issue  that
can  require a different approach.  If the same articles are
being gatewayed into more than one newsgroup, and it is  not
possible  to  arrange  that all gateways gateway them to the
same cross-posted set of newsgroups, then the message IDs in
the different newsgroups MUST be DIFFERENT.
 
     NOTE:  Otherwise,  arrival  of  an  article in one
     newsgroup  will  prevent  it  from  appearing   in
     another,  and which newsgroup a particular article
     appears in will be an accident of which  direction
     it  arrives  from  first.  It is very difficult to
     maintain a coherent discussion when each  partici-
     pant  sees a randomly-selected 50% of the traffic.
     The fundamental problem here  is  that  the  basic
     assumption  behind  message IDs is being violated:
     the gateways are assigning the same message ID  to
     articles  that  differ  in  an  important  respect
     (Newsgroups header).
 
In such cases, it is suggested that the newsgroup  name,  or
an agreed-on abbreviation thereof, be prepended to the local
part of the message ID (with a separating ".") by the  gate-
way.   This  will ensure that multiple gateways generate the
same message ID, while also ensuring  that  different  news-
groups can be read independently.
 
     NOTE:  It  is  preferable  to  have the gateway(s)
     cross-post the article, avoiding the  issue  alto-
     gether,  but  this may not be feasible, especially
     if one newsgroup is widespread and  the  other  is
     purely local.
 
10.4. Mail to and from News
Gatewaying mail to news, and vice-versa, is the most obvious
form of news gatewaying.  It is common to  set  up  gateways
between news and mail rather too casually.
 
It  is hard to go very wrong in gatewaying news into a mail-
ing list, except for the non-trivial matter of  making  sure
that  error  reports  go  to the local administration rather
than to the authors of news articles.  (This requires atten-
tion  to  the  "envelope  address" as well as to the message
headers.)  Doing the reverse connection  correctly  is  much
harder than it looks.
 
     NOTE: In particular, just feeding the mail message
     to "inews -h" or the  equivalent  is  NOT,  repeat
     NOT,  adequate  to gateway mail to news.  Signifi-
     cant gatewaying software is  necessary  to  do  it
     right.   Not  all headers of mail messages conform
     to even the MAIL specifications,  never  mind  the
     stricter rules for news.
 
It  is  useful to distinguish between two different forms of
mail-to-news gatewaying: gatewaying a mailing  list  into  a
newsgroup,  and  operating a "post-by-mail" service in which
individual articles can be posted to a newsgroup by  mailing
them  to a specific address.  In the first case, the message
is already being  "broadcast",  and  the  situation  can  be
viewed  as  gatewaying  one  form of news into another.  The
second case is closer to that of a moderator posting submis-
sions to a moderated newsgroup.
 
In  either  case,  the discussions in the preceding two sec-
tions are relevant, as is the Hippocratic Principle of  sec-
tion  9.   However,  some additional considerations are spe-
cific to mail-to-news gatewaying.
 
As mentioned in section 6, point-to-point  headers  like  To
and  Cc  SHOULD  not  appear as such in news, although it is
suggested that they be transformed to "X-" headers, e.g.  X-
To  and X-Cc, to preserve their information content for pos-
sible use  by  readers  or  troubleshooters.   The  Received
header  is  entirely  specific to MAIL and SHOULD be deleted
completely  during  gatewaying,  except  perhaps   for   the
Received header supplied by the gateway host itself.
 
The  Sender  header is a tricky case, one where mailing-list
and post-by-mail practice  should  differ.   For  gatewaying
mailing  lists, the mailing-list host should be considered a
relayer, and the From and Sender  headers  supplied  in  its
transmissions left strictly untouched.  For post-by-mail, as
for a moderator posting  a  mailed  submission,  the  Sender
header should reflect the poster rather than the author.  If
a post-by-mail gateway  receives  a  message  with  its  own
Sender  header,  it might wish to preserve the content in an
X-Sender header.
 
It will generally be necessary to transform  between  mail's
In-Reply-To/References convention and news's References/See-
Also convention, to preserve correct semantics of cross ref-
erences.   This also requires attention when going the other
way, from news to mail.  See the discussion of  the  differ-
ence in section 6.5.
 
10.5. Gateway Administration
Any  news  system will benefit from an attentive administra-
tor, preferably assisted by automated monitoring for  anoma-
lies.  This is particularly true of gateways.  Gateway soft-
ware SHOULD be instrumented  so  that  unusual  occurrences,
such  as  sudden  massive  surges  in  traffic, are reported
promptly.  It is desirable, in fact, to go further:  gateway
software  SHOULD endeavour to limit damage in the event that
the administrator does not respond promptly.
 
     NOTE: For example, software might limit the  gate-
     waying  rate by queueing incoming traffic and emp-
     tying the queue at a  finite  maximum  rate  (well
     below  the  maximum  that the host is capable of!)
     which is set  by  the  administrator  and  is  not
     raised automatically.
 
Traffic gatewayed into a news network SHOULD include a suit-
able  header,  perhaps  X-Gateway-Administrator,  giving  an
electronic  address  that  can  be  used to report problems.
This SHOULD be an address that goes direct to a  human,  not
to  a  "routine administrative issues" mailbox that is exam-
ined only occasionally, since the point is  to  be  able  to
reach  the  administrator  quickly in an emergency.  Gateway
administrators SHOULD arrange substitutes to  cover  gateway
operation  (with suitable redirection of mail) when they are
on vacation etc.
 
 
11. Security And Related Issues
Although the interchange format itself raises no significant
security issues, the wider context does.
 
11.1. Leakage
The  most  obvious  form  of  security  problem with news is
"leakage" of  articles  which  are  intended  to  have  only
restricted circulation.  The flooding algorithm is EXTREMELY
good at finding any path by which articles can leave a  sub-
net  with  supposedly-restrictive  boundaries.   Substantial
administrative effort is required to ensure that local news-
groups remain local, unless connections to the outside world
are tightly restricted.
 
A related problem is that the sendme control message can  be
used  to ask for any article by its message ID.  The useful-
ness of this has declined  as  message-ID  generation  algo-
rithms have become less predictable, but it remains a poten-
tial problem for "secure" newsgroups.  Hosts with such news-
groups  may  wish  to  disable  the  sendme  control message
entirely.
 
The sendsys, version,  and  whogets  control  messages  also
allow  "outsiders"  to  request  information  from "inside",
which may reveal details of internal topology  (etc.)   that
are  considered  confidential.   (Note that at least limited
openness about such matters may be a condition of membership
in such networks, e.g. Usenet.)
 
Organizations  wishing to control these forms of leakage are
strongly advised to designate a small  number  of  "official
gateway"  hosts to handle all news exchange with the outside
world, so that a bounded amount of administrative effort  is
needed   to  control  propagation  and  eliminate  problems.
Attempts to keep news out entirely, by refusing  to  support
an  official  gateway,  typically result in large numbers of
unofficial partial gateways appearing  over  time.   Such  a
configuration is much more difficult to troubleshoot.
 
A somewhat-related problem is the possibility of proprietary
material being disclosed unintentionally  by  a  poster  who
does  not  realize  how far his words will propagate, either
from sheer misunderstanding or because of  errors  made  (by
human or software) in followup preparation.  There is little
that can be done about this except education.
 
11.2. Attacks
Although the limitations of the medium restrict what can  be
done  to  attack  a host via news, some possibilities exist,
most of them problems news shares with mail.
 
If reading  agents  are  careless  about  transmitting  non-
printable  characters  to  output devices, malicious posters
may post articles  containing  control  sequences  ("letter-
bombs")  meant to have various destructive effects on output
devices.  Possible effects depend on the  device,  but  they
can  include  hardware  damage  (e.g. by repeated writing of
values into configuration memories that can tolerate only  a
limited number of write cycles) and security violation (e.g.
by reprogramming function keys potentially  used  by  privi-
leged readers).
 
A  more  sophisticated variation on the letterbomb is inclu-
sion of "Trojan horses"  in  programs.   Obviously,  readers
must  be  cautious  about  using software found in news, but
more subtly, reading agents must also exercise  care.   MIME
messages  can  include  material  that is executable in some
sense, such as PostScript documents (which  are  programs!),
and letterbombs may be introduced into such material.
 
Given  the  presence  of finite resources and other software
limitations,  some  degree  of  system  disruption  can   be
achieved  by  posting  otherwise-innocent  material in great
volume, either in single huge articles (see section 4.6)  or
in  a stream of modest-sized articles.  (Some would say that
the steady growth of Usenet volume constitutes a subtle  and
unintentional  attack  of  the latter type; certainly it can
have disruptive effects if administrators are  inattentive.)
Systems  need some ability to cope with surges, because sin-
gle huge articles occur occasionally as the result of  soft-
ware error, innocent misunderstanding, or deliberate malice,
and downtime at upstream hosts can cause droughts,  followed
by floods, of legitimate articles.  (There is also a certain
amount of normal variation; for example, Usenet  traffic  is
noticeably  lighter  on  weekends and during Christmas holi-
days, and rises noticeably at the start of the  school  term
of  North  American  universities.)   However,  a  site that
normally receives little traffic may be quite vulnerable  to
"swamping" attack if its software is insufficiently careful.
 
In general, careless implementation may open doors that  are
not  intrinsic  to  news.   In particular, implementation of
control messages (see sections 6.6  and  7)  and  unbatchers
(see section 8.1 and 8.2) via a command interpreter requires
substantial precautions to ensure  that  only  the  intended
capabilities  are  available.   Care must also be taken that
article-supplied text is  not  fed  to  programs  that  have
escapes to command interpreters.
 
Finally,  there  is considerable potential for malice in the
sendsys, version, and whogets control  messages.   They  are
not  harmful  to  the hosts receiving them as news, but they
can be used to enlist those  hosts  (by  the  thousands)  as
unwitting  allies  in a mail-swamping attack on a victim who
may not even receive news.   The  precautions  discussed  in
section  7.5  can reduce the potential for such attacks con-
siderably, but the hazard cannot be eliminated  as  long  as
these control messages exist.
 
11.3. Anarchy
The  highly  distributed nature of news propagation, and the
lack of adequate authentication  protocols  (especially  for
use  over  the less-interactive transport mechanisms such as
UUCP), make article forgery relatively straightforward.   It
may  be  possible to at least track a forgery to its source,
once it is recognized as such, but clever forgers  can  make
even  that  relatively difficult.  The assumption that forg-
eries will be recognized as such is also not to be taken for
granted;  readers  are notoriously prone to blindly assuming
authenticity.  If  a  forged  article's  initial  path  list
includes the relayer name of the supposed poster's host, the
article will never be sent to that  host,  and  the  alleged
author may learn about the forgery secondhand or not at all.
 
A particularly noxious form of forgery is the  forged  "can-
cel"  control  message.  Notably, it is relatively straight-
forward to write software that will automatically send out a
(forged)  cancel message for any article meeting some crite-
rion, e.g. written by a specific author.  The authentication
problems discussed in section 7.1 make it difficult to solve
this without crippling cancel's important functionality.
 
A related problem is the possibility of  disagreements  over
newsgroup  creation,  on  networks where such things are not
decided by central authorities.  There have  been  cases  of
"rmgroup wars", where one poster persistently sends out new-
group messages to create a newsgroup  and  another,  equally
persistently,  sends  out rmgroup messages asking that it be
removed.  This is not particularly damaging, if relayers are
configured  to  be cautious, but can cause serious confusion
among innocent third parties who just want to  know  whether
they can use the newsgroup for communication or not.
 
11.4. Liability
News shares the legal uncertainty surrounding other forms of
electronic communication:  what  rules  apply  to  this  new
medium  of  information  exchange?   News  is a particularly
problematic case because it is  a  broadcast  medium  rather
than  a point-to-point one like mail, and analogies to older
forms of communication are particularly weak.
 
Are news-carrying hosts common carriers, like the phone com-
panies, providing communications paths without having either
authority over or responsibility for content?  Or  are  they
publishers,   responsible  for  the  content  regardless  of
whether they are aware  of  it  or  not?   Or  something  in
between?   Such  questions are particularly significant when
the content is technically criminal, e.g. some types of sex-
ually-oriented material in some jurisdictions, in which case
ignorance of its presence may not be an adequate defence.
 
Even in milder situations such as libel or copyright  viola-
tion,  the  responsibilities  of  the  poster, his host, and
other hosts carrying the traffic are unclear.  Note, in par-
ticular, the problems arising when the article is a forgery,
or when the alleged author claims it is a forgery but cannot
prove this.
 
 |