S&P: Maintaining the merceworld Website

Superceded

This document (dated 2007) has been partly superceded by the new specification for the new Website(s) we are creating for the company. Wherever there is conflict between the new specs (dated 2010) and this old one, the new specs will apply. Many of file naming conventions and tools will remain unchanged. The biggest change will be the move from a single Website to a collection of sites, with domains *.merceworld.com.

The new specs are given here.

This document specifies what rules we follow for maintaining the Merceworld website (i.e. trial.merceworld.com and www.merceworld.com).

Staging area

We will always maintain a staging area, which will be a second Website. This will be called trial.merceworld.com. This Website will contain all the material which is being developed, edited and tested. The final Website's DocumentRoot area will never be edited or modified by hand.

The final Website's contents will always be generated by a program-based (not manual) copying of files from the trial area to the final area. This program will also reproduce all Apache configuration options of the staging server to the final server, and will re-start the Web server. The site copying script will restart the Apache Web server only if the virtual host configuration for the final Website has changed after the copying.

The staging Website will be on our public server but will be password protected. Only one username and password will be created for protecting this Website. The aim is not to provide individual AAA controls for site access, but to prevent automatic spiders and bots from trawling the site and adding content to search engines. It will also prevent complete outsiders from accessing our trial site.

The single username for accessing the staging site, and its password, will be replaced every few (initially, six) months. At any point in time, there will only be one valid userID defined for access to this site.

Directory structure

A directory structure for the Webpages will be decided a priori and all changes or additions to this directory structure will be done only after prior clearance by Shuvam. This is because it is important to maintain a coherent directory design, and it's best if one person maintains this coherence. If not, each new maintainer will add new directories based on his interpretation of what's appropriate and very soon, redundancies and inconsistencies will show up. (For instance, where should the Services section go? Should it be p/svc/ or p/merce/svc/?)

Directory structures will be decided after careful deliberation and will be expected to remain unchanged for years. These directory paths will form part of the URL for the pages inside the directories, therefore any change to the directory structure will render browser bookmarks, cached URLs or pointers from other referencing sites invalid. Hence, additions to the directory structure will be done with great care, and all changes will be done in a manner that will ensure that all old URLs remain valid. A URL based on the physical directory structure on our Web server's document area will henceforth be referred to as "a physical URL".

If we strive to create a valuable Website, but URLs to its pages begin to generate 404 errors, there is no point in maintaining the Website at all. URLs on the Internet have a habit of being retained for over ten years in some cases.

File name standards

  • Physical file and directory names will always be in pure lower-case. URL aliases can have mixed case, depending on case by case (pun not intended).
  • We will always use only hyphens if needed, never underscores.
  • We will use brief names for directories, which will be mnemonic only by our own team, not mnemonic for an outsider who tries to read a directory name and deduce its meaning. In general, we'll try to lean towards short URLs.
  • We will never use the any filenames of the following patterns for any real, permanent files or directories:
    • *tst*
    • *test*
    • *tmp*
    • *temp*

    These are reserved for temporary files, used for experiments, and will never be copied to the final Website from the staging area. Conversely, all temporary files and directories must have names matching these patterns.

URL aliases

The Apache Alias directive will be used to create aliases to certain sections or pages. These URL aliases will be created to make some sections easy to reach for people who do not come from another Web page. This is the case when people read a URL on a TV ad or a brochure and want to type it in, to reach the Web page mentioned. A frequent ad seen currently is Microsoft's "software for the people-ready business". They use the URL

http://www.microsoft.com/business/peopleready/

to tell you more about what they are offering. This is not necessarily a part of their permanent content tree structure. This is a specially created URL which will probably have a life-time of two years. This is a case of a special-purpose URL alias.

Our first example of an alias for the Merce Website is of course /academia. This is the URL we are printing in the direct mailer that RedHat is sending out to many colleges and universities.

  • URL aliases will always be shortlived. Well, they will certainly be more short-lived than the directory tree structure which, we expect, will be as permanent as can be on a Website.
  • URL aliases will have their wording and spelling chosen primarily by the Business Development Division, with some minor technical inputs from the Website maintenance team.
  • If a URL alias has mixed case, then we will always define a second URL alias with exactly the same letters, except that this one will be totally in lower-case. This will help us get those viewers who do not know case sensitive typing. (No, we will not have a third version of the alias in full UPPER CASE. Don't scream; I'm not deaf.)

We are expecting to carry special offers for Mtracks or Merce. Each special offer will be short-lived, and will have an easy-to-remember human-friendly URL, e.g. /for-rotary/ or /imc-members/. These will map to underlying physical URLs like /en/p/merce/sploff/2007/rotary/ and /en/p/merce/sploff/2007/imc/.

URLs within our pages

When putting in URLs from one HTML page to another or from a page to images, it is necessary to follow the rules listed below:

  • Do not use relative URLs to any page above the current page. In other words, do not use href='../../../common/something.frag'. The reason for this is that the current page may be referenced by its physical URL, or by a URL alias. You have no way of knowing which it is, because URL aliases are created after you have published your page. And depending on whether your page has been reached by its physical URL or by a URL alias, the meaning of the '../../..' will be totally different. Therefore, in such cases, you must use '/en/common/something.frag'. Use the absolute URL.

    If you are inserting links to pages lower down in the physical tree, then relative hyperlinks are accepted and in fact encouraged.

  • Never define links in terms of URL aliases. Always define links only based on physical URLs. Aliases are short-lived, like your first crush.

  • Do not refer to a directory's "covering page" (usually index.html or index.shtml) explicitly by name. In other words, do not use the hyperlink "/en/p/merce/cs/index.shtml" Use the link "/en/p/merce/cs/". This allows the Website maintainer to replace index.shtml with index.html or index.cgi or index.php at a later date, without having to edit all the hyperlinks which point to it. And in general, lots of hyperlinks point to the index page of any directory.

  • Never embed the site name in the hyperlink. It is not OK to have href="http://www.merceworld.com/en/p/merce/cs". If the reasons for this are not obvious to you after having read till this point, shift to some other role in the company; you do not know enough about Web technology to be permitted to maintain our Website.

  • It may be necessary to add the directive class="genericlink" in the <a... directive for any hyperlink. Without this tag, the hyperlink may not show up as an underlined text block. This of course applies only when the text being linked as the hyperlink is plain text, not when it's an image.

  • When linking from the Merce Website to any page anywhere outside our site, e.g. somewhere on Red Hat's site, then it's necessary to add a target="_blank" tag to the <a.... This will make the browser open that target page in a new window. This helps keep the reader's concentration on our Website and return to what he was reading.

    Remember not to put this tag when linking from one page within our site to another page within our site.

  • When the text of the link is some words of text, not an image, then make sure that there is no space between the <a... tag and the start of the text, or between the end of the text and the </a>. For instance,

    See <a class="genericlink" href="/en/p/merce/cs/"> here </a> for more info.

    is not allowed. The right way to do this will be:

    See <a class="genericlink" href="/en/p/merce/cs/">here</a> for more info.

    The only difference between the two is the extra space character before and after the word "here". This seems like a very minor point now, but it looks ugly with the extra spaces in an actual Web page. I have had to go around removing these spaces from many places on our Website in the initial days, once I saw how ugly it looked.

File ownership

There will be a special userID created on our public server, which will own all the files of both the staging and final Websites. This owner will not be root. Only those officers of the company who have access to edit the Website contents will have their individual SSH keys installed in this user's home directory. The infra user on apps1 will not have its key installed here. All editing will be done by logging in as this user, not as root.

This user will be allowed to run the script which copies the staging server contents to the final server. This script is supposed to also restart the Apache Web server. Since this action requires supervisor privileges, a small C program will be compiled and installed by root in this user's ~/bin directory, to allow Apache restart. This binary executable will be setuid-root, and will be programmed to execute only one command, the Apache restart command. It will also be programmed to check that only the site-editing userID can run this program, not any other user.

General matters of process discipline

Do not keep .bac or other junk files lying around. Once you finish spell-checks, delete the .bac files. Once you finish experimentation, remove the tmp* and test* files.