HTTP Server
Purpose
This document was written with the intention to provide enough detailed information about HTTP to allow someone to implement a simple HTTP server. This is not intended to be a complete specification of HTTP; people interested in fully implementing HTTP should refer to the HTTP 1.0 specification document listed below. There is enough information given to implement a server that can server to commercial Web Browsers plain text, html, GIF, and JPEG documents and images. This information was used to implement such a Web Server in Java.
HTTP Overview
When a web browser looks up a URL of the form
http://hostname:port/pathname
it establishes a TCP connection to the server listening on the specified port on the specified hostname. It then transmits the command:
GET /pathname HTTP/1.0
followed by a blank line, and waits for the server to reply. This tells the server that you want to speak HTTP version 1.0 and that you want to get the file pathname. There is also a version 1.1 of the protocol. In version 1.1, the request looks the same, except that it says "HTTP/1.1" rather than "HTTP/1.0".
If everything goes well, the server's reply then has the following form:
HTTP/1.0 200 OK Attribute: value Attribute: value Attribute: value ... file data ...
You should use \r\n to terminate each line. The browsers are picky about this. For example, when you print the OK message it should be something like
write(sock, "HTTP/1.0 200 OK\r\n", strlen ("HTTP/1.0 200 OK\r\n"));
This says that the server is speaking HTTP version 1.0 and that the request was OK. The attributes can specify things like the date the data was sent, the version of the server program that's running, etc. Two attributes you may want to pay attention to are the Content-type attribute and the Content-length. The Content-type attribute says what kind of file data is about to be transmitted. For raw text files, it should be "text/plain" and for HTML files, it should be "text/html", similarly "image/jpg" and "image/gif" for ".jpg" files and ".gif" files. After the attributes, there's another blank line and finally the data for the file.
An HTTP Session
All HTTP transactions follow the same general format. Each client request and server response has three parts: the request or response line, a header section, and the entity body. The client initiates a transaction as follows:
For example:
GET /index.html HTTP/1.0uses the GET method to request the document index.html using version 1.0 of HTTP.
User-Agent: Mozilla/2.02Gold (WinNT; I) Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*The client sends a blank line to end the header.
The status code is a three digit number that indicates the server's result of the client's request. The description following the status code is just human-readable text that describes the status code. For example, this status line:
HTTP/1.0 200 OKindicates that the server uses version 1.0 of HTTP in its response. A status code of 200 means that the client's request was successful and the requested data will be supplied after the headers.
Date: Fri, 20 Sep 1998 08:17:58 GMT Server: Apache/1.5.2 Last-modified: Mon, 17 Jun 1996 21:53:08 GMT Content-type: text/html Content-length: 2482A blank line ends the header.
In HTTP 1.0, after the server has finished sending the requested data, it disconnects from the client and the transaction is over unless a Connection: Keep Alive header is sent. In HTTP 1.1, however, the default is for the server to maintain the connection and allow the client to make additional requests. Since many documents embed other documents as inline images, frames, applets, etc., this saves the overhead of the client having to repeatedly connect to the same server just to draw a single page. Under HTTP 1.1, therefore, the transaction might cycle back to the beginning, until either the client or server explicitly closes the connection.
Another HTTP Session
Note: The numbering of the steps is completely meaningless. It was just used to show order and break up the different steps of an HTTP session into manageable units.
www.cs.byu.edu
with the url
http://www.cs.byu.edu:8080/
GET <path>
...http://students.cs.byu.edu/~richjackGET /~richjack <probably with a bunch of junk
after it>
The HTTP version and content-types that the browser
will accept are passed in the request. However, a simple HTTP server can
ignore this information and still function properly. A more advanced server
would look at this information and handle it. Further, the browser is not
required to send this information. All browsers are expected, by the protocol,
to understand plain text and html content-types.
HTTP/1.0 <STATUS
CODE><CRLF>
(Following lines are optional, but
recommended)
MIME-Version:1.0<CRLF>
Content-Type:<CONTENT
TYPE><CRLF>
Content-Length:<file-length in
bytes><CRLF>
<CRLF>
<file>Text: text/plain
Html: text/html
GIF: image/gif
JPEG:
image/jpeg
CGIs:
GET <path>/<cgi-script>?query=<query
text>+<param>|+<param>|+...
The query text goes into an environment variable HTTP_QUERY.
main()
{
char *cmd = "/usr/bin/ls *.c";
char buf[BUFSIZ];
FILE *ptr;
if ((ptr = popen(cmd, "r")) != NULL)
while (fgets(buf, BUFSIZ, ptr) != NULL)
(void) printf("%s", buf);
return 0;
}
You might also be interested in this code that sets up environment variables
/*Create the environmental variables*/
sprintf(query_dest,"QUERY_STRING=%s",query);
sprintf(content,"CONTENT_LENGTH=%s",length);
sprintf(shell_cmd,"%s > %s",filename,handle);
putenv("REQUEST_METHOD=GET");
putenv(content);
putenv(query_dest);
The following cgi helps may be useful in writing the CGI part of your code.
References