HTTP Server

Purpose

This document was written with the intention to provide enough detailed information about HTTP to allow someone to implement a simple HTTP server. This is not intended to be a complete specification of HTTP; people interested in fully implementing HTTP should refer to the HTTP 1.0 specification document listed below. There is enough information given to implement a server that can server to commercial Web Browsers plain text, html, GIF, and JPEG documents and images. This information was used to implement such a Web Server in Java.

HTTP Overview

When a web browser looks up a URL of the form

   http://hostname:port/pathname

it establishes a TCP connection to the server listening on the specified port on the specified hostname.  It then transmits the command:

   GET /pathname HTTP/1.0

followed by a blank line, and waits for the server to reply.   This tells the server that you want to speak HTTP version 1.0 and that you want to get the file pathname.   There is also a version 1.1 of the protocol.  In version 1.1, the request looks the same, except that it says "HTTP/1.1" rather than "HTTP/1.0".

If everything goes well, the server's reply then has the following form:

   HTTP/1.0 200 OK
   Attribute: value
   Attribute: value
   Attribute: value
   ...

   file data ...
You should use \r\n to terminate each line.  The browsers are picky about this.  For example, when you print the OK message it should be something like
write(sock, "HTTP/1.0 200 OK\r\n", strlen ("HTTP/1.0 200 OK\r\n"));

This says that the server is speaking HTTP version 1.0 and that the request was OK.   The attributes can specify things like the date the data was sent, the version of the server program that's running, etc.  Two attributes you may want to pay attention to are the Content-type attribute and the Content-length. The Content-type attribute says what kind of file data is about to be transmitted.  For raw text files, it should be "text/plain" and for HTML files, it should be "text/html", similarly "image/jpg" and "image/gif" for ".jpg" files and ".gif" files.   After the attributes, there's another blank line and finally the data for the file.  

An HTTP Session

All HTTP transactions follow the same general format. Each client request and server response has three parts: the request or response line, a header section, and the entity body. The client initiates a transaction as follows:

  1. The client contacts the server at a designated port number (by default, 80). Then it sends a document request by specifying an HTTP command called a method, followed by a document address, and an HTTP version number.

    For example:

    GET /index.html HTTP/1.0
    
    
    uses the GET method to request the document index.html using version 1.0 of HTTP.
  2. Next, the client sends optional header information to inform the server of its configuration and the document formats it will accept. All header information is given line by line, each with a header name and value. For example, this header information sent by the client indicates its name and version number and specifies several document preferences:
    User-Agent: Mozilla/2.02Gold (WinNT; I)
    
    Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
    
    
    The client sends a blank line to end the header.
  3. After sending the request and headers, the client may send additional data. This data is mostly used by CGI programs using the POST method.
The server responds in the following way to the client's request:
  1. The server replies with a status line containing three fields: HTTP version, status code, and description. The HTTP version indicates the version of HTTP that the server is using to respond.

    The status code is a three digit number that indicates the server's result of the client's request. The description following the status code is just human-readable text that describes the status code. For example, this status line:

    HTTP/1.0 200 OK
    
    
    indicates that the server uses version 1.0 of HTTP in its response. A status code of 200 means that the client's request was successful and the requested data will be supplied after the headers.
  2. After the status line, the server sends header information to the client about itself and the requested document. For example:
    Date: Fri, 20 Sep 1998 08:17:58 GMT
    
    Server: Apache/1.5.2
    
    Last-modified: Mon, 17 Jun 1996 21:53:08 GMT
    
    Content-type: text/html
    
    Content-length: 2482
    
    
    A blank line ends the header.
  3. If the client's request is successful, the requested data is sent. This data may be a copy of a file, or the response from a CGI program. If the client's request could not be fulfilled, additional data may be a human-readable explanation of why the server could not fulfill the request.

    In HTTP 1.0, after the server has finished sending the requested data, it disconnects from the client and the transaction is over unless a Connection: Keep Alive header is sent. In HTTP 1.1, however, the default is for the server to maintain the connection and allow the client to make additional requests. Since many documents embed other documents as inline images, frames, applets, etc., this saves the overhead of the client having to repeatedly connect to the same server just to draw a single page. Under HTTP 1.1, therefore, the transaction might cycle back to the beginning, until either the client or server explicitly closes the connection.

Another HTTP Session

Note: The numbering of the steps is completely meaningless. It was just used to show order and break up the different steps of an HTTP session into manageable units.

  1. The server listens on port 80, or another port. TCP/IP is the connection protocol.
  2. The client connects to port 80 on the server, or a different port as specified in the URL. For example, port 8080 can be accessed on

    www.cs.byu.edu

    with the url

    http://www.cs.byu.edu:8080/

    It then writes to the server a request. Requests are of the form:

    GET <path> ...

    There are a lot of parameters and information that follow the path that the client can pass to the server, but it is not needed by the server to fulfill the request. Once the client has determined the server destination, the client strips the server information from the URL and sends the rest as the path in the GET request. Example:

    http://students.cs.byu.edu/~richjack

    becomes the request

    GET /~richjack <probably with a bunch of junk after it>
    The HTTP version and content-types that the browser will accept are passed in the request. However, a simple HTTP server can ignore this information and still function properly. A more advanced server would look at this information and handle it. Further, the browser is not required to send this information. All browsers are expected, by the protocol, to understand plain text and html content-types.
  3. The server reads this request from the client's socket. (Hint: it might be helpful to remove the CR/LF characters from the end of the request). As mentioned before, a simple web server just needs to recognize the "GET" request and the document being request. Other information can be ignored in a simple implementation of a HTTP server.
  4. After parsing out the filename, the server must find the file and identify the file type. Depending on the file type, the reply will change slightly. If the file is a directory, a directory listing must be generated and returned as the requested document. If the file is a text, html, gif, or jpeg file, the appropriate content-type is specified in the header and the file is written after the header. If the file does not exist, the server must set the status to "404 error" and send some kind of error message file.

    The reply format is as follows:

    HTTP/1.0 <STATUS CODE><CRLF>
    (Following lines are optional, but recommended)
    MIME-Version:1.0<CRLF>
    Content-Type:<CONTENT TYPE><CRLF>
    Content-Length:<file-length in bytes><CRLF>
    <CRLF>
    <file>


    The default status code should be "200 OK". If the file is not found, "404 error" should be the status code. Here is a complete (as of version 1.0) listing of status codes. The basic content types are as follow:

    Make sure that you don't put spaces at the beginning of lines or any additional spaces between fields.

    Text: text/plain
    Html: text/html
    GIF: image/gif
    JPEG: image/jpeg

    After the header is written to the client socket, the file is written to the client socket, separated by a blank line (an extra carriage return/line feed). No modifications are made to the file, it should be written byte by byte as it is read.
  5. The browser reads the header and interprets the file based upon the content type. If the file requested was an image that was to be displayed inside an html document and a "404 error" status code is received, most browsers display some kind of "broken link" image. The content length can be used to verify that the entire document was received.

CGIs:

GET <path>/<cgi-script>?query=<query text>+<param>|+<param>|+...

    The query text goes into an environment variable HTTP_QUERY.
  1. To identify, parse for .cgi at end of document name.
  2. On Unix implementations, it is suggested a chroot to <path> is executed before the script for security reasons. The chroot command prevents anything from executing outside of the script's directory and its subdirectories by causing all processes that fork off the current process to think that the current directory is the root directory.
  3. First you need to set the environment variables.  The minimal environment variables are REQUEST_METHOD, QUERY_STRING, SERVER_PORT.  They can be set with the putenv system function.  You then run the cgi script and it will inherit the environment variables.  One easy way of executing the command is through the popen system function.  Here is a brief example from the man page

main()
{
char *cmd = "/usr/bin/ls *.c";
char buf[BUFSIZ];
FILE *ptr;

if ((ptr = popen(cmd, "r")) != NULL)
while (fgets(buf, BUFSIZ, ptr) != NULL)
(void) printf("%s", buf);
return 0;
}

You might also be interested in this code that sets up environment variables

 

/*Create the environmental variables*/
sprintf(query_dest,"QUERY_STRING=%s",query);
sprintf(content,"CONTENT_LENGTH=%s",length);
sprintf(shell_cmd,"%s > %s",filename,handle);
putenv("REQUEST_METHOD=GET");
putenv(content);
putenv(query_dest);

 

The following cgi helps may be useful in writing the CGI part of your code.


References


Last Updated July 30, 1999
Written by Jack Richins, modified by Ryan Bailey