From 7d34fb1eaf96f044519b8d37eedf3b4c68c1ff25 Mon Sep 17 00:00:00 2001 From: gaoluo888 Date: Sun, 13 Jun 2021 12:45:12 +0800 Subject: [PATCH] update @21/6/13 imported from https://sqlite.org/althttpd/info/tip --- Makefile | 5 + althttpd.c | 1483 +++++++++++++++++++++++++++++++++++++++------------ althttpd.md | 360 +++++++++++++ 3 files changed, 1513 insertions(+), 335 deletions(-) create mode 100644 Makefile create mode 100644 althttpd.md diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..da4ccc1 --- /dev/null +++ b/Makefile @@ -0,0 +1,5 @@ +althttpd: althttpd.c + cc -Os -Wall -Wextra -o althttpd althttpd.c + +clean: + rm -f althttpd diff --git a/althttpd.c b/althttpd.c index 025bc3c..d78d083 100644 --- a/althttpd.c +++ b/althttpd.c @@ -1,14 +1,28 @@ /* -** A small, simple HTTP server. +** 2001-09-15 +** +** The author disclaims copyright to this source code. In place of +** a legal notice, here is a blessing: +** +** May you do good and not evil. +** May you find forgiveness for yourself and forgive others. +** May you share freely, never taking more than you give. +** +************************************************************************* +** +** This source code file implements a small, simple, stand-alone HTTP +** server. ** ** Features: ** -** * Launched from inetd +** * Launched from inetd/xinetd/stunnel4, or as a stand-alone server ** * One process per request -** * Deliver static content or run CGI +** * Deliver static content or run CGI or SCGI ** * Virtual sites based on the "Host:" property of the HTTP header -** * Very small code base (1 file) to facilitate security auditing -** * Simple setup - no configuration files to mess with. +** * Runs in a chroot jail +** * Unified log file in a CSV format +** * Small code base (this 1 file) to facilitate security auditing +** * Simple setup - no configuration files to misconfigure ** ** This file implements a small and simple but secure and effective web ** server. There are no frills. Anything that could be reasonably @@ -20,29 +34,191 @@ ** ** httpd -logfile logfile -root /home/www -user nobody ** -** It will automatically chroot to /home/www and become user nobody. +** It will automatically chroot to /home/www and become user "nobody". ** The logfile name should be relative to the chroot jail. ** -** (2) Directories of the form "*.website" (ex: www_hwaci_com.website) -** contain content. The directory is chosen based on HOST. If no -** HOST or the host directory is not found, "default.website" is used. +** (2) Directories of the form "*.website" (ex: www_sqlite_org.website) +** contain content. The directory is chosen based on the HTTP_HOST +** request header. If there is no HTTP_HOST header or if the +** corresponding host directory does not exist, then the +** "default.website" is used. If the HTTP_HOST header contains any +** charaters other than [a-zA-Z0-9_.,*~/] then a 403 error is +** generated. ** -** (3) Any file or directory whose name begins with "." or "-" is ignored. +** (3) Any file or directory whose name begins with "." or "-" is ignored, +** except if the URL begins with "/.well-known/" then initial "." and +** "-" characters are allowed, but not initial "..". The exception is +** for RFC-5785 to allow letsencrypt or certbot to generate a TLS cert +** using webroot. ** -** (4) Characters other than a-zA-Z0-9_.,*~/ in the filename are translated -** into _. This is a defense against cross-site scripting attacks and -** other mischief. +** (4) Characters other than [0-9a-zA-Z,-./:_~] and any %HH characters +** escapes in the filename are all translated into "_". This is +** a defense against cross-site scripting attacks and other mischief. ** -** (5) Executable files are run as CGI. All other files are delivered -** as is. +** (5) Executable files are run as CGI. Files whose name ends with ".scgi" +** trigger and SCGI request (see item 10 below). All other files +** are delivered as is. ** ** (6) For SSL support use stunnel and add the -https 1 option on the ** httpd command-line. ** -** (7) If a file named "-auth" exists in the same directory as file to +** (7) If a file named "-auth" exists in the same directory as the file to ** be run as CGI or to be delivered, then it contains information ** for HTTP Basic authorization. See file format details below. ** +** (8) To run as a stand-alone server, simply add the "-port N" command-line +** option to define which TCP port to listen on. +** +** (9) For static content, the mimetype is determined by the file suffix +** using a table built into the source code below. If you have +** unusual content files, you might need to extend this table. +** +** (10) Content files that end with ".scgi" and that contain text of the +** form "SCGI hostname port" will format an SCGI request and send it +** to hostname:port, the relay back the reply. Error behavior is +** determined by subsequent lines of the .scgi file. See SCGI below +** for details. +** +** Command-line Options: +** +** --root DIR Defines the directory that contains the various +** $HOST.website subdirectories, each containing web content +** for a single virtual host. If launched as root and if +** "--user USER" also appears on the command-line and if +** "--jail 0" is omitted, then the process runs in a chroot +** jail rooted at this directory and under the userid USER. +** This option is required for xinetd launch but defaults +** to "." for a stand-alone web server. +** +** --port N Run in standalone mode listening on TCP port N +** +** --user USER Define the user under which the process should run if +** originally launched as root. This process will refuse to +** run as root (for security). If this option is omitted and +** the process is launched as root, it will abort without +** processing any HTTP requests. +** +** --logfile FILE Append a single-line, CSV-format, log file entry to FILE +** for each HTTP request. FILE should be a full pathname. +** The FILE name is interpreted inside the chroot jail. The +** FILE name is expanded using strftime() if it contains +** at least one '%' and is not too long. +** +** --https Indicates that input is coming over SSL and is being +** decoded upstream, perhaps by stunnel. (This program +** only understands plaintext.) +** +** --family ipv4 Only accept input from IPV4 or IPV6, respectively. +** --family ipv6 These options are only meaningful if althttpd is run +** as a stand-alone server. +** +** --jail BOOLEAN Indicates whether or not to form a chroot jail if +** initially run as root. The default is true, so the only +** useful variant of this option is "--jail 0" which prevents +** the formation of the chroot jail. +** +** --max-age SEC The value for "Cache-Control: max-age=%d". Defaults to +** 120 seconds. +** +** --max-cpu SEC Maximum number of seconds of CPU time allowed per +** HTTP connection. Default 30. 0 means no limit. +** +** --debug Disables input timeouts. This is useful for debugging +** when inputs is being typed in manually. +** +** Command-line options can take either one or two initial "-" characters. +** So "--debug" and "-debug" mean the same thing, for example. +** +** +** Security Features: +** +** (1) This program automatically puts itself inside a chroot jail if +** it can and if not specifically prohibited by the "--jail 0" +** command-line option. The root of the jail is the directory that +** contains the various $HOST.website content subdirectories. +** +** (2) No input is read while this process has root privileges. Root +** privileges are dropped prior to reading any input (but after entering +** the chroot jail, of course). If root privileges cannot be dropped +** (for example because the --user command-line option was omitted or +** because the user specified by the --user option does not exist), +** then the process aborts with an error prior to reading any input. +** +** (3) The length of an HTTP request is limited to MAX_CONTENT_LENGTH bytes +** (default: 250 million). Any HTTP request longer than this fails +** with an error. +** +** (4) There are hard-coded time-outs on each HTTP request. If this process +** waits longer than the timeout for the complete request, or for CGI +** to finish running, then this process aborts. (The timeout feature +** can be disabled using the --debug command-line option.) +** +** (5) If the HTTP_HOST request header contains characters other than +** [0-9a-zA-Z,-./:_~] then the entire request is rejected. +** +** (6) Any characters in the URI pathname other than [0-9a-zA-Z,-./:_~] +** are converted into "_". This applies to the pathname only, not +** to the query parameters or fragment. +** +** (7) If the first character of any URI pathname component is "." or "-" +** then a 404 Not Found reply is generated. This prevents attacks +** such as including ".." or "." directory elements in the pathname +** and allows placing files and directories in the content subdirectory +** that are invisible to all HTTP requests, by making the first +** character of the file or subdirectory name "-" or ".". +** +** (8) The request URI must begin with "/" or else a 404 error is generated. +** +** (9) This program never sets the value of an environment variable to a +** string that begins with "() {". +** +** Security Auditing: +** +** This webserver mostly only serves static content. Any security risk will +** come from CGI and SCGI. To check an installation for security, then, it +** makes sense to focus on the CGI and SCGI scripts. +** +** To local all CGI files: +** +** find *.website -executable -type f -print +** OR: find *.website -perm +0111 -type f -print +** +** The first form of the "find" command is preferred, but is only supported +** by GNU find. On a Mac, you'll have to use the second form. +** +** To find all SCGI files: +** +** find *.website -name '*.scgi' -type f -print +** +** If any file is a security concern, it can be disabled on a live +** installation by turning off read permissions: +** +** chmod 0000 file-of-concern +** +** SCGI Specification Files: +** +** Content files (files without the execute bit set) that end with ".scgi" +** specify a connection to an SCGI server. The format of the .scgi file +** follows this template: +** +** SCGI hostname port +** fallback: fallback-filename +** relight: relight-command +** +** The first line specifies the location and TCP/IP port of the SCGI server +** that will handle the request. Subsequent lines determine what to do if +** the SCGI server cannot be contacted. If the "relight:" line is present, +** then the relight-command is run using system() and the connection is +** retried after a 1-second delay. Use "&" at the end of the relight-command +** to run it in the background. Make sure the relight-command does not +** send generate output, or that output will become part of the SCGI reply. +** Add a ">/dev/null" suffix (before the "&") to the relight-command if +** necessary to suppress output. If there is no relight-command, or if the +** relight is attempted but the SCGI server still cannot be contacted, then +** the content of the fallback-filename file is returned as a substitute for +** the SCGI request. The mimetype is determined by the suffix on the +** fallback-filename. The fallback-filename would typically be an error +** message indicating that the service is temporarily unavailable. ** ** Basic Authorization: ** @@ -60,6 +236,9 @@ ** ** There can be multiple "user" lines. If no "user" line matches, the ** request fails with a 401 error. +** +** Because of security rule (7), there is no way for the content of the "-auth" +** file to leak out via HTTP request. */ #include #include @@ -72,6 +251,7 @@ #include #include #include +#include #include #include #include @@ -82,6 +262,7 @@ #include #include #include +#include #ifdef linux #include #endif @@ -91,10 +272,13 @@ ** Configure the server by setting the following macros and recompiling. */ #ifndef DEFAULT_PORT -#define DEFAULT_PORT "80" +#define DEFAULT_PORT "80" /* Default TCP port for HTTP */ #endif #ifndef MAX_CONTENT_LENGTH -#define MAX_CONTENT_LENGTH 20000000 +#define MAX_CONTENT_LENGTH 250000000 /* Max length of HTTP request content */ +#endif +#ifndef MAX_CPU +#define MAX_CPU 30 /* Max CPU cycles in seconds */ #endif /* @@ -125,12 +309,15 @@ static char *zRealPort = 0; /* The real TCP port when running as daemon */ static char *zRemoteAddr = 0; /* IP address of the request */ static char *zReferer = 0; /* Name of the page that refered to us */ static char *zAccept = 0; /* What formats will be accepted */ +static char *zAcceptEncoding =0; /* gzip or default */ static char *zContentLength = 0; /* Content length reported in the header */ static char *zContentType = 0; /* Content type reported in the header */ static char *zQuerySuffix = 0; /* The part of the URL after the first ? */ static char *zAuthType = 0; /* Authorization type (basic or digest) */ static char *zAuthArg = 0; /* Authorization values */ static char *zRemoteUser = 0; /* REMOTE_USER set by authorization module */ +static char *zIfNoneMatch= 0; /* The If-None-Match header value */ +static char *zIfModifiedSince=0; /* The If-Modified-Since header value */ static int nIn = 0; /* Number of bytes of input */ static int nOut = 0; /* Number of bytes of output */ static char zReplyStatus[4]; /* Reply status code */ @@ -149,13 +336,57 @@ static int ipv6Only = 0; /* Use IPv6 only */ static int ipv4Only = 0; /* Use IPv4 only */ static struct rusage priorSelf; /* Previously report SELF time */ static struct rusage priorChild; /* Previously report CHILD time */ +static int mxAge = 120; /* Cache-control max-age */ +static char *default_path = "/bin:/usr/bin"; /* Default PATH variable */ +static char *zScgi = 0; /* Value of the SCGI env variable */ +static int rangeStart = 0; /* Start of a Range: request */ +static int rangeEnd = 0; /* End of a Range: request */ +static int maxCpu = MAX_CPU; /* Maximum CPU time per process */ + +/* +** Mapping between CGI variable names and values stored in +** global variables. +*/ +static struct { + char *zEnvName; + char **pzEnvValue; +} cgienv[] = { + { "CONTENT_LENGTH", &zContentLength }, /* Must be first for SCGI */ + { "AUTH_TYPE", &zAuthType }, + { "AUTH_CONTENT", &zAuthArg }, + { "CONTENT_TYPE", &zContentType }, + { "DOCUMENT_ROOT", &zHome }, + { "HTTP_ACCEPT", &zAccept }, + { "HTTP_ACCEPT_ENCODING", &zAcceptEncoding }, + { "HTTP_COOKIE", &zCookie }, + { "HTTP_HOST", &zHttpHost }, + { "HTTP_IF_MODIFIED_SINCE", &zIfModifiedSince }, + { "HTTP_IF_NONE_MATCH", &zIfNoneMatch }, + { "HTTP_REFERER", &zReferer }, + { "HTTP_USER_AGENT", &zAgent }, + { "PATH", &default_path }, + { "PATH_INFO", &zPathInfo }, + { "QUERY_STRING", &zQueryString }, + { "REMOTE_ADDR", &zRemoteAddr }, + { "REQUEST_METHOD", &zMethod }, + { "REQUEST_URI", &zScript }, + { "REMOTE_USER", &zRemoteUser }, + { "SCGI", &zScgi }, + { "SCRIPT_DIRECTORY", &zDir }, + { "SCRIPT_FILENAME", &zFile }, + { "SCRIPT_NAME", &zRealScript }, + { "SERVER_NAME", &zServerName }, + { "SERVER_PORT", &zServerPort }, + { "SERVER_PROTOCOL", &zProtocol }, +}; + /* ** Double any double-quote characters in a string. */ static char *Escape(char *z){ - int i, j; - int n; + size_t i, j; + size_t n; char c; char *zOut; for(i=0; (c=z[i])!=0 && c!='"'; i++){} @@ -173,17 +404,17 @@ static char *Escape(char *z){ } /* -** Convert a struct timeval into an integer number of milliseconds +** Convert a struct timeval into an integer number of microseconds */ -static int tvms(struct timeval *p){ - return (int)(p->tv_sec*1000000 + p->tv_usec); +static long long int tvms(struct timeval *p){ + return ((long long int)p->tv_sec)*1000000 + (long long int)p->tv_usec; } /* ** Make an entry in the log file. If the HTTP connection should be ** closed, then terminate this process. Otherwise return. */ -static void MakeLogEntry(int a){ +static void MakeLogEntry(int exitCode, int lineNum){ FILE *log; if( zTmpNam ){ unlink(zTmpNam); @@ -193,8 +424,11 @@ static void MakeLogEntry(int a){ struct tm *pTm; struct rusage self, children; int waitStatus; - char zDate[200]; char *zRM = zRemoteUser ? zRemoteUser : ""; + char *zFilename; + size_t sz; + char zDate[200]; + char zExpLogFile[500]; if( zScript==0 ) zScript = ""; if( zRealScript==0 ) zRealScript = ""; @@ -205,12 +439,18 @@ static void MakeLogEntry(int a){ gettimeofday(&now, 0); pTm = localtime(&now.tv_sec); strftime(zDate, sizeof(zDate), "%Y-%m-%d %H:%M:%S", pTm); + sz = strftime(zExpLogFile, sizeof(zExpLogFile), zLogFile, pTm); + if( sz>0 && sz 1900 ) year -= 1900; + for(mon=0; mon<12; mon++){ + if( !strncmp( azMonths[mon], zMonth, 3 )){ + int nDay; + int isLeapYr; + static int priorDays[] = + { 0, 31, 59, 90,120,151,181,212,243,273,304,334 }; + isLeapYr = year%4==0 && (year%100!=0 || (year+300)%400==0); + yday = priorDays[mon] + mday - 1; + if( isLeapYr && mon>1 ) yday++; + nDay = (year-70)*365 + (year-69)/4 - year/100 + (year+300)/400 + yday; + return ((time_t)(nDay*24 + hour)*60 + min)*60 + sec; + } + } + } + return 0; +} + +/* +** Test procedure for ParseRfc822Date +*/ +void TestParseRfc822Date(void){ + time_t t1, t2; + for(t1=0; t1<0x7fffffff; t1 += 127){ + t2 = ParseRfc822Date(Rfc822Date(t1)); + assert( t1==t2 ); + } } /* @@ -390,28 +699,28 @@ static void StartResponse(const char *zResultCode){ static void NotFound(int lineno){ StartResponse("404 Not Found"); nOut += printf( - "Content-type: text/html\r\n" + "Content-type: text/html; charset=utf-8\r\n" "\r\n" "Not Found \n" "

Document Not Found

\n" "The document %s is not available on this server\n" "\n", lineno, zScript); - MakeLogEntry(0); + MakeLogEntry(0, lineno); exit(0); } /* ** Tell the client that they are not welcomed here. */ -static void Forbidden(void){ +static void Forbidden(int lineno){ StartResponse("403 Forbidden"); nOut += printf( - "Content-type: text/plain\r\n" + "Content-type: text/plain; charset=utf-8\r\n" "\r\n" "Access denied\n" ); closeConnection = 1; - MakeLogEntry(0); + MakeLogEntry(0, lineno); exit(0); } @@ -423,13 +732,13 @@ static void NotAuthorized(const char *zRealm){ StartResponse("401 Authorization Required"); nOut += printf( "WWW-Authenticate: Basic realm=\"%s\"\r\n" - "Content-type: text/html\r\n" + "Content-type: text/html; charset=utf-8\r\n" "\r\n" "Not Authorized \n" "

401 Not Authorized

\n" "A login and password are required for this document\n" "\n", zRealm); - MakeLogEntry(0); + MakeLogEntry(0, 110); /* LOG: Not authorized */ } /* @@ -438,25 +747,30 @@ static void NotAuthorized(const char *zRealm){ static void CgiError(void){ StartResponse("500 Error"); nOut += printf( - "Content-type: text/html\r\n" + "Content-type: text/html; charset=utf-8\r\n" "\r\n" "CGI Program Error \n" "

CGI Program Error

\n" "The CGI program %s generated an error\n" "\n", zScript); - MakeLogEntry(0); + MakeLogEntry(0, 120); /* LOG: CGI Error */ exit(0); } /* -** This is called if we timeout. +** This is called if we timeout or catch some other kind of signal. +** Log an error code which is 900+iSig and then quit. */ -static void Timeout(int NotUsed){ - (void)NotUsed; +static void Timeout(int iSig){ if( !debugFlag ){ if( zScript && zScript[0] ){ - strcpy(zReplyStatus, "999"); - MakeLogEntry(0); + char zBuf[10]; + zBuf[0] = '9'; + zBuf[1] = '0' + (iSig/10)%10; + zBuf[2] = '0' + iSig%10; + zBuf[3] = 0; + strcpy(zReplyStatus, zBuf); + MakeLogEntry(0, 130); /* LOG: Timeout */ } exit(0); } @@ -468,11 +782,11 @@ static void Timeout(int NotUsed){ static void CgiScriptWritable(void){ StartResponse("500 CGI Configuration Error"); nOut += printf( - "Content-type: text/plain\r\n" + "Content-type: text/plain; charset=utf-8\r\n" "\r\n" "The CGI program %s is writable by users other than its owner.\n", zRealScript); - MakeLogEntry(0); + MakeLogEntry(0, 140); /* LOG: CGI script is writable */ exit(0); } @@ -484,13 +798,15 @@ static void Malfunction(int linenum, const char *zFormat, ...){ va_start(ap, zFormat); StartResponse("500 Server Malfunction"); nOut += printf( - "Content-type: text/plain\r\n" + "Content-type: text/plain; charset=utf-8\r\n" "\r\n" "Web server malfunctioned; error number %d\n\n", linenum); if( zFormat ){ nOut += vprintf(zFormat, ap); + printf("\n"); + nOut++; } - MakeLogEntry(0); + MakeLogEntry(0, linenum); exit(0); } @@ -499,8 +815,18 @@ static void Malfunction(int linenum, const char *zFormat, ...){ ** name not contain scheme or network location or the query string. ** It will be just the path. */ -static void Redirect(const char *zPath, int finish){ - StartResponse("302 Temporary Redirect"); +static void Redirect(const char *zPath, int iStatus, int finish, int lineno){ + switch( iStatus ){ + case 301: + StartResponse("301 Permanent Redirect"); + break; + case 308: + StartResponse("308 Permanent Redirect"); + break; + default: + StartResponse("302 Temporary Redirect"); + break; + } if( zServerPort==0 || zServerPort[0]==0 || strcmp(zServerPort,"80")==0 ){ nOut += printf("Location: %s://%s%s%s\r\n", zHttp, zServerName, zPath, zQuerySuffix); @@ -511,7 +837,7 @@ static void Redirect(const char *zPath, int finish){ if( finish ){ nOut += printf("Content-length: 0\r\n"); nOut += printf("\r\n"); - MakeLogEntry(0); + MakeLogEntry(0, lineno); } fflush(stdout); } @@ -576,6 +902,8 @@ void Decode64(char *z64){ ** authorization credentials are provided, and if so sets the ** REMOTE_USER to NAME. ** * "realm TEXT" sets the realm to TEXT. +** * "anyone" bypasses authentication and allows anyone to see the +** files. Useful in combination with "http-redirect" */ static int CheckBasicAuthorization(const char *zAuthFile){ FILE *in; @@ -584,9 +912,9 @@ static int CheckBasicAuthorization(const char *zAuthFile){ char *zName; char zLine[2000]; - in = fopen(zAuthFile, "r"); + in = fopen(zAuthFile, "rb"); if( in==0 ){ - NotFound(__LINE__); + NotFound(150); /* LOG: Cannot open -auth file */ return 0; } if( zAuthArg ) Decode64(zAuthArg); @@ -612,20 +940,22 @@ static int CheckBasicAuthorization(const char *zAuthFile){ } }else if( strcmp(zFieldName,"https-only")==0 ){ if( !useHttps ){ - NotFound(__LINE__); + NotFound(160); /* LOG: http request on https-only page */ fclose(in); return 0; } }else if( strcmp(zFieldName,"http-redirect")==0 ){ if( !useHttps ){ zHttp = "https"; - sprintf(zLine, "%s%s", zScript, zPathInfo); - Redirect(zLine, 1); + Redirect(zScript, 301, 1, 170); /* LOG: -auth redirect */ fclose(in); return 0; } + }else if( strcmp(zFieldName,"anyone")==0 ){ + fclose(in); + return 1; }else{ - NotFound(__LINE__); + NotFound(180); /* LOG: malformed entry in -auth file */ fclose(in); return 0; } @@ -802,6 +1132,7 @@ const char *GetMimeType(const char *zName, int nName){ { "stp", 3, "application/STEP" }, { "sv4cpio", 7, "application/x-sv4cpio" }, { "sv4crc", 6, "application/x-sv4crc" }, + { "svg", 3, "image/svg+xml" }, { "swf", 3, "application/x-shockwave-flash" }, { "t", 1, "application/x-troff" }, { "tar", 3, "application/x-tar" }, @@ -824,6 +1155,7 @@ const char *GetMimeType(const char *zName, int nName){ { "viv", 3, "video/vnd.vivo" }, { "vivo", 4, "video/vnd.vivo" }, { "vrml", 4, "model/vrml" }, + { "vsix", 4, "application/vsix" }, { "wav", 3, "audio/x-wav" }, { "wax", 3, "audio/x-ms-wax" }, { "wiki", 4, "application/x-fossil-wiki" }, @@ -872,7 +1204,7 @@ const char *GetMimeType(const char *zName, int nName){ ** The following table contains 1 for all characters that are permitted in ** the part of the URL before the query parameters and fragment. ** -** Allowed characters: 0-9 a-z A-Z ,-./:_~ +** Allowed characters: 0-9a-zA-Z,-./:_~ ** ** Disallowed characters include: !"#$%&'()*+;<=>?[\]^{|} */ @@ -886,8 +1218,42 @@ static const char allowedInName[] = { /* 5x */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, /* 6x */ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, /* 7x */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, +/* 8x */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* 9x */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* Ax */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* Bx */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* Cx */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* Dx */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* Ex */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* Fx */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; +/* +** Remove all disallowed characters in the input string z[]. Convert any +** disallowed characters into "_". +** +** Not that the three character sequence "%XX" where X is any byte is +** converted into a single "_" character. +** +** Return the number of characters converted. An "%XX" -> "_" conversion +** counts as a single character. +*/ +static int sanitizeString(char *z){ + int nChange = 0; + while( *z ){ + if( !allowedInName[*(unsigned char*)z] ){ + if( *z=='%' && z[1]!=0 && z[2]!=0 ){ + int i; + for(i=3; (z[i-2] = z[i])!=0; i++){} + } + *z = '_'; + nChange++; + } + z++; + } + return nChange; +} + /* ** Count the number of "/" characters in a string. */ @@ -897,6 +1263,361 @@ static int countSlashes(const char *z){ return n; } +/* +** Transfer nXfer bytes from in to out, after first discarding +** nSkip bytes from in. Increment the nOut global variable +** according to the number of bytes transferred. +*/ +static void xferBytes(FILE *in, FILE *out, int nXfer, int nSkip){ + size_t n; + size_t got; + char zBuf[16384]; + while( nSkip>0 ){ + n = nSkip; + if( n>sizeof(zBuf) ) n = sizeof(zBuf); + got = fread(zBuf, 1, n, in); + if( got==0 ) break; + nSkip -= got; + } + while( nXfer>0 ){ + n = nXfer; + if( n>sizeof(zBuf) ) n = sizeof(zBuf); + got = fread(zBuf, 1, n, in); + if( got==0 ) break; + fwrite(zBuf, got, 1, out); + nOut += got; + nXfer -= got; + } +} + +/* +** Send the text of the file named by zFile as the reply. Use the +** suffix on the end of the zFile name to determine the mimetype. +** +** Return 1 to omit making a log entry for the reply. +*/ +static int SendFile( + const char *zFile, /* Name of the file to send */ + int lenFile, /* Length of the zFile name in bytes */ + struct stat *pStat /* Result of a stat() against zFile */ +){ + const char *zContentType; + time_t t; + FILE *in; + char zETag[100]; + + zContentType = GetMimeType(zFile, lenFile); + if( zTmpNam ) unlink(zTmpNam); + sprintf(zETag, "m%xs%x", (int)pStat->st_mtime, (int)pStat->st_size); + if( CompareEtags(zIfNoneMatch,zETag)==0 + || (zIfModifiedSince!=0 + && (t = ParseRfc822Date(zIfModifiedSince))>0 + && t>=pStat->st_mtime) + ){ + StartResponse("304 Not Modified"); + nOut += DateTag("Last-Modified", pStat->st_mtime); + nOut += printf("Cache-Control: max-age=%d\r\n", mxAge); + nOut += printf("ETag: \"%s\"\r\n", zETag); + nOut += printf("\r\n"); + fflush(stdout); + MakeLogEntry(0, 470); /* LOG: ETag Cache Hit */ + return 1; + } + in = fopen(zFile,"rb"); + if( in==0 ) NotFound(480); /* LOG: fopen() failed for static content */ + if( rangeEnd>0 && rangeStartst_size ){ + StartResponse("206 Partial Content"); + if( rangeEnd>=pStat->st_size ){ + rangeEnd = pStat->st_size-1; + } + nOut += printf("Content-Range: bytes %d-%d/%d\r\n", + rangeStart, rangeEnd, (int)pStat->st_size); + pStat->st_size = rangeEnd + 1 - rangeStart; + }else{ + StartResponse("200 OK"); + rangeStart = 0; + } + nOut += DateTag("Last-Modified", pStat->st_mtime); + nOut += printf("Cache-Control: max-age=%d\r\n", mxAge); + nOut += printf("ETag: \"%s\"\r\n", zETag); + nOut += printf("Content-type: %s; charset=utf-8\r\n",zContentType); + nOut += printf("Content-length: %d\r\n\r\n",(int)pStat->st_size); + fflush(stdout); + if( strcmp(zMethod,"HEAD")==0 ){ + MakeLogEntry(0, 2); /* LOG: Normal HEAD reply */ + fclose(in); + fflush(stdout); + return 1; + } + if( useTimeout ) alarm(30 + pStat->st_size/1000); +#ifdef linux + { + off_t offset = rangeStart; + nOut += sendfile(fileno(stdout), fileno(in), &offset, pStat->st_size); + } +#else + xferBytes(in, stdout, (int)pStat->st_size, rangeStart); +#endif + fclose(in); + return 0; +} + +/* +** A CGI or SCGI script has run and is sending its reply back across +** the channel "in". Process this reply into an appropriate HTTP reply. +** Close the "in" channel when done. +*/ +static void CgiHandleReply(FILE *in){ + int seenContentLength = 0; /* True if Content-length: header seen */ + int contentLength = 0; /* The content length */ + size_t nRes = 0; /* Bytes of payload */ + size_t nMalloc = 0; /* Bytes of space allocated to aRes */ + char *aRes = 0; /* Payload */ + int c; /* Next character from in */ + char *z; /* Pointer to something inside of zLine */ + int iStatus = 0; /* Reply status code */ + char zLine[1000]; /* One line of reply from the CGI script */ + + if( useTimeout ){ + /* Disable the timeout, so that we can implement Hanging-GET or + ** long-poll style CGIs. The RLIMIT_CPU will serve as a safety + ** to help prevent a run-away CGI */ + alarm(0); + } + while( fgets(zLine,sizeof(zLine),in) && !isspace((unsigned char)zLine[0]) ){ + if( strncasecmp(zLine,"Location:",9)==0 ){ + StartResponse("302 Redirect"); + RemoveNewline(zLine); + z = &zLine[10]; + while( isspace(*(unsigned char*)z) ){ z++; } + nOut += printf("Location: %s\r\n",z); + rangeEnd = 0; + }else if( strncasecmp(zLine,"Status:",7)==0 ){ + int i; + for(i=7; isspace((unsigned char)zLine[i]); i++){} + nOut += printf("%s %s", zProtocol, &zLine[i]); + strncpy(zReplyStatus, &zLine[i], 3); + zReplyStatus[3] = 0; + iStatus = atoi(zReplyStatus); + if( iStatus!=200 ) rangeEnd = 0; + statusSent = 1; + }else if( strncasecmp(zLine, "Content-length:", 15)==0 ){ + seenContentLength = 1; + contentLength = atoi(zLine+15); + }else{ + size_t nLine = strlen(zLine); + if( nRes+nLine >= nMalloc ){ + nMalloc += nMalloc + nLine*2; + aRes = realloc(aRes, nMalloc+1); + if( aRes==0 ){ + Malfunction(600, "Out of memory: %d bytes", nMalloc); + } + } + memcpy(aRes+nRes, zLine, nLine); + nRes += nLine; + } + } + + /* Copy everything else thru without change or analysis. + */ + if( rangeEnd>0 && seenContentLength && rangeStart=contentLength ){ + rangeEnd = contentLength-1; + } + nOut += printf("Content-Range: bytes %d-%d/%d\r\n", + rangeStart, rangeEnd, contentLength); + contentLength = rangeEnd + 1 - rangeStart; + }else{ + StartResponse("200 OK"); + } + if( nRes>0 ){ + aRes[nRes] = 0; + printf("%s", aRes); + nOut += nRes; + nRes = 0; + } + if( iStatus==304 ){ + nOut += printf("\r\n\r\n"); + }else if( seenContentLength ){ + nOut += printf("Content-length: %d\r\n\r\n", contentLength); + xferBytes(in, stdout, contentLength, rangeStart); + }else{ + while( (c = getc(in))!=EOF ){ + if( nRes>=nMalloc ){ + nMalloc = nMalloc*2 + 1000; + aRes = realloc(aRes, nMalloc+1); + if( aRes==0 ){ + Malfunction(610, "Out of memory: %d bytes", nMalloc); + } + } + aRes[nRes++] = c; + } + if( nRes ){ + aRes[nRes] = 0; + nOut += printf("Content-length: %d\r\n\r\n%s", (int)nRes, aRes); + }else{ + nOut += printf("Content-length: 0\r\n\r\n"); + } + } + free(aRes); + fclose(in); +} + +/* +** Send an SCGI request to a host identified by zFile and process the +** reply. +*/ +static void SendScgiRequest(const char *zFile, const char *zScript){ + FILE *in; + FILE *s; + char *z; + char *zHost; + char *zPort = 0; + char *zRelight = 0; + char *zFallback = 0; + int rc; + int iSocket = -1; + struct addrinfo hints; + struct addrinfo *ai = 0; + struct addrinfo *p; + char *zHdr; + size_t nHdr = 0; + size_t nHdrAlloc; + int i; + char zLine[1000]; + char zExtra[1000]; + in = fopen(zFile, "rb"); + if( in==0 ){ + Malfunction(700, "cannot open \"%s\"\n", zFile); + } + if( fgets(zLine, sizeof(zLine)-1, in)==0 ){ + Malfunction(701, "cannot read \"%s\"\n", zFile); + } + if( strncmp(zLine,"SCGI ",5)!=0 ){ + Malfunction(702, "misformatted SCGI spec \"%s\"\n", zFile); + } + z = zLine+5; + zHost = GetFirstElement(z,&z); + zPort = GetFirstElement(z,0); + if( zHost==0 || zHost[0]==0 || zPort==0 || zPort[0]==0 ){ + Malfunction(703, "misformatted SCGI spec \"%s\"\n", zFile); + } + while( fgets(zExtra, sizeof(zExtra)-1, in) ){ + char *zCmd = GetFirstElement(zExtra,&z); + if( zCmd==0 ) continue; + if( zCmd[0]=='#' ) continue; + RemoveNewline(z); + if( strcmp(zCmd, "relight:")==0 ){ + free(zRelight); + zRelight = StrDup(z); + continue; + } + if( strcmp(zCmd, "fallback:")==0 ){ + free(zFallback); + zFallback = StrDup(z); + continue; + } + Malfunction(704, "unrecognized line in SCGI spec: \"%s %s\"\n", + zCmd, z ? z : ""); + } + fclose(in); + memset(&hints, 0, sizeof(struct addrinfo)); + hints.ai_family = AF_UNSPEC; + hints.ai_socktype = SOCK_STREAM; + hints.ai_protocol = IPPROTO_TCP; + rc = getaddrinfo(zHost,zPort,&hints,&ai); + if( rc ){ + Malfunction(704, "cannot resolve SCGI server name %s:%s\n%s\n", + zHost, zPort, gai_strerror(rc)); + } + while(1){ /* Exit via break */ + for(p=ai; p; p=p->ai_next){ + iSocket = socket(p->ai_family, p->ai_socktype, p->ai_protocol); + if( iSocket<0 ) continue; + if( connect(iSocket,p->ai_addr,p->ai_addrlen)>=0 ) break; + close(iSocket); + } + if( iSocket<0 || (s = fdopen(iSocket,"r+"))==0 ){ + if( iSocket>=0 ) close(iSocket); + if( zRelight ){ + rc = system(zRelight); + if( rc ){ + Malfunction(721,"Relight failed with %d: \"%s\"\n", + rc, zRelight); + } + free(zRelight); + zRelight = 0; + sleep(1); + continue; + } + if( zFallback ){ + struct stat statbuf; + int rc; + memset(&statbuf, 0, sizeof(statbuf)); + if( chdir(zDir) ){ + char zBuf[1000]; + Malfunction(720, /* LOG: chdir() failed */ + "cannot chdir to [%s] from [%s]", + zDir, getcwd(zBuf,999)); + } + rc = stat(zFallback, &statbuf); + if( rc==0 && S_ISREG(statbuf.st_mode) && access(zFallback,R_OK)==0 ){ + closeConnection = 1; + rc = SendFile(zFallback, (int)strlen(zFallback), &statbuf); + free(zFallback); + exit(0); + }else{ + Malfunction(706, "bad fallback file: \"%s\"\n", zFallback); + } + } + Malfunction(707, "cannot open socket to SCGI server %s\n", + zScript); + } + break; + } + + nHdrAlloc = 0; + zHdr = 0; + if( zContentLength==0 ) zContentLength = "0"; + zScgi = "1"; + for(i=0; i<(int)(sizeof(cgienv)/sizeof(cgienv[0])); i++){ + int n1, n2; + if( cgienv[i].pzEnvValue[0]==0 ) continue; + n1 = (int)strlen(cgienv[i].zEnvName); + n2 = (int)strlen(*cgienv[i].pzEnvValue); + if( n1+n2+2+nHdr >= nHdrAlloc ){ + nHdrAlloc = nHdr + n1 + n2 + 1000; + zHdr = realloc(zHdr, nHdrAlloc); + if( zHdr==0 ){ + Malfunction(706, "out of memory"); + } + } + memcpy(zHdr+nHdr, cgienv[i].zEnvName, n1); + nHdr += n1; + zHdr[nHdr++] = 0; + memcpy(zHdr+nHdr, *cgienv[i].pzEnvValue, n2); + nHdr += n2; + zHdr[nHdr++] = 0; + } + zScgi = 0; + fprintf(s,"%d:",(int)nHdr); + fwrite(zHdr, 1, nHdr, s); + fprintf(s,","); + free(zHdr); + if( zMethod[0]=='P' + && atoi(zContentLength)>0 + && (in = fopen(zTmpNam,"r"))!=0 ){ + size_t n; + while( (n = fread(zLine,1,sizeof(zLine),in))>0 ){ + fwrite(zLine, 1, n, s); + } + fclose(in); + } + fflush(s); + CgiHandleReply(s); +} + /* ** This routine processes a single HTTP request on standard input and ** sends the reply to standard output. If the argument is 1 it means @@ -911,17 +1632,21 @@ static int countSlashes(const char *z){ ** HTTP request may appear on the wire. */ void ProcessOneRequest(int forceClose){ - int i, c; + int i, j, j0; char *z; /* Used to parse up a string */ struct stat statbuf; /* Information about the file to be retrieved */ FILE *in; /* For reading from CGI scripts */ +#ifdef LOG_HEADER + FILE *hdrLog = 0; /* Log file for complete header content */ +#endif char zLine[1000]; /* A buffer for input lines or forming names */ /* Change directories to the root of the HTTP filesystem */ if( chdir(zRoot[0] ? zRoot : "/")!=0 ){ char zBuf[1000]; - Malfunction(__LINE__, "cannot chdir to [%s] from [%s]", + Malfunction(190, /* LOG: chdir() failed */ + "cannot chdir to [%s] from [%s]", zRoot, getcwd(zBuf,999)); } nRequest++; @@ -930,6 +1655,9 @@ void ProcessOneRequest(int forceClose){ ** We must receive a complete header within 15 seconds */ signal(SIGALRM, Timeout); + signal(SIGSEGV, Timeout); + signal(SIGPIPE, Timeout); + signal(SIGXCPU, Timeout); if( useTimeout ) alarm(15); /* Get the first line of the request and parse out the @@ -938,22 +1666,29 @@ void ProcessOneRequest(int forceClose){ if( fgets(zLine,sizeof(zLine),stdin)==0 ){ exit(0); } + gettimeofday(&beginTime, 0); omitLog = 0; nIn += strlen(zLine); + + /* Parse the first line of the HTTP request */ zMethod = StrDup(GetFirstElement(zLine,&z)); zRealScript = zScript = StrDup(GetFirstElement(z,&z)); zProtocol = StrDup(GetFirstElement(z,&z)); if( zProtocol==0 || strncmp(zProtocol,"HTTP/",5)!=0 || strlen(zProtocol)!=8 ){ StartResponse("400 Bad Request"); nOut += printf( - "Content-type: text/plain\r\n" + "Content-type: text/plain; charset=utf-8\r\n" "\r\n" "This server does not understand the requested protocol\n" ); - MakeLogEntry(0); + MakeLogEntry(0, 200); /* LOG: bad protocol in HTTP header */ exit(0); } - if( zScript[0]==0 ) NotFound(__LINE__); + if( zScript[0]!='/' ) NotFound(210); /* LOG: Empty request URI */ + while( zScript[1]=='/' ){ + zScript++; + zRealScript++; + } if( forceClose ){ closeConnection = 1; }else if( zProtocol[5]<'1' || zProtocol[7]<'1' ){ @@ -967,24 +1702,48 @@ void ProcessOneRequest(int forceClose){ && strcmp(zMethod,"HEAD")!=0 ){ StartResponse("501 Not Implemented"); nOut += printf( - "Content-type: text/plain\r\n" + "Content-type: text/plain; charset=utf-8\r\n" "\r\n" "The %s method is not implemented on this server.\n", zMethod); - MakeLogEntry(0); + MakeLogEntry(0, 220); /* LOG: Unknown request method */ exit(0); } + /* If there is a log file (if zLogFile!=0) and if the pathname in + ** the first line of the http request contains the magic string + ** "FullHeaderLog" then write the complete header text into the + ** file %s(zLogFile)-hdr. Overwrite the file. This is for protocol + ** debugging only and is only enabled if althttpd is compiled with + ** the -DLOG_HEADER=1 option. + */ +#ifdef LOG_HEADER + if( zLogFile + && strstr(zScript,"FullHeaderLog")!=0 + && strlen(zLogFile)=0 && x2>=x1 ){ + rangeStart = x1; + rangeEnd = x2; + }else if( n==1 && x1>0 ){ + rangeStart = x1; + rangeEnd = 0x7fffffff; + } } } +#ifdef LOG_HEADER + if( hdrLog ) fclose(hdrLog); +#endif - /* Disallow referring from certain clients */ + /* Disallow requests from certain clients */ if( zAgent ){ - if( strstr(zAgent, "Windows_9")!=0 - || strstr(zAgent, "Download_Master")!=0 - || strstr(zAgent, "Ezooms/")!=0 - /*|| strstr(zAgent, "bingbot")!=0*/ - || strstr(zAgent, "AhrefsBot")!=0 - ){ - Forbidden(); + const char *azDisallow[] = { + "Windows 9", + "Download Master", + "Ezooms/", + "HTTrace", + "AhrefsBot", + "MicroMessenger", + "OPPO A33 Build", + "SemrushBot", + "MegaIndex.ru", + "MJ12bot", + "Chrome/0.A.B.C", + "Neevabot/", + "BLEXBot/", + }; + size_t ii; + for(ii=0; iiMAX_CONTENT_LENGTH ){ StartResponse("500 Request too large"); nOut += printf( - "Content-type: text/plain\r\n" + "Content-type: text/plain; charset=utf-8\r\n" "\r\n" "Too much POST data\n" ); - MakeLogEntry(0); + MakeLogEntry(0, 270); /* LOG: Request too large */ exit(0); } + rangeEnd = 0; sprintf(zTmpNamBuf, "/tmp/-post-data-XXXXXX"); zTmpNam = zTmpNamBuf; - mkstemp(zTmpNam); - out = fopen(zTmpNam,"w"); + if( mkstemp(zTmpNam)<0 ){ + Malfunction(280, /* LOG: mkstemp() failed */ + "Cannot create a temp file in which to store POST data"); + } + out = fopen(zTmpNam,"wb"); if( out==0 ){ StartResponse("500 Cannot create /tmp file"); nOut += printf( - "Content-type: text/plain\r\n" + "Content-type: text/plain; charset=utf-8\r\n" "\r\n" "Could not open \"%s\" for writing\n", zTmpNam ); - MakeLogEntry(0); + MakeLogEntry(0, 290); /* LOG: cannot create temp file for POST */ exit(0); } - zBuf = SafeMalloc( len ); + zBuf = SafeMalloc( len+1 ); if( useTimeout ) alarm(15 + len/2000); n = fread(zBuf,1,len,stdin); nIn += n; @@ -1130,45 +1942,55 @@ void ProcessOneRequest(int forceClose){ ** ** This is a defense against various attacks, XSS attacks in particular. */ - for(z=zScript; *z; z++){ - unsigned char c = *(unsigned char*)z; - if( (c&0x80)!=0 || !allowedInName[c] ){ - *z = '_'; - if( c=='%' && z[1]!=0 && z[2]!=0 ){ - for(i=3; (z[i-2] = z[i])!=0; i++){} - } - } - } + sanitizeString(zScript); - /* Don't allow "/." or "/-" to to occur anywhere in the entity name. + /* Do not allow "/." or "/-" to to occur anywhere in the entity name. ** This prevents attacks involving ".." and also allows us to create - ** files and directories whose names begin with "-" which are invisible - ** to the webserver. + ** files and directories whose names begin with "-" or "." which are + ** invisible to the webserver. + ** + ** Exception: Allow the "/.well-known/" prefix in accordance with + ** RFC-5785. */ for(z=zScript; *z; z++){ if( *z=='/' && (z[1]=='.' || z[1]=='-') ){ - NotFound(__LINE__); + if( strncmp(zScript,"/.well-known/",13)==0 && (z[1]!='.' || z[2]!='.') ){ + /* Exception: Allow "/." and "/-" for URLs that being with + ** "/.well-known/". But do not allow "/..". */ + continue; + } + NotFound(300); /* LOG: Path element begins with "." or "-" */ } } /* Figure out what the root of the filesystem should be. If the ** HTTP_HOST parameter exists (stored in zHttpHost) then remove the ** port number from the end (if any), convert all characters to lower - ** case, and convert all "." to "_". Then try to find a directory - ** with that name and the extension .website. If not found, look - ** for "default.website". + ** case, and convert non-alphanumber characters (including ".") to "_". + ** Then try to find a directory with that name and the extension .website. + ** If not found, look for "default.website". */ - if( zScript[0]!='/' ) NotFound(__LINE__); - if( strlen(zRoot)+40 >= sizeof(zLine) ) NotFound(__LINE__); + if( zScript[0]!='/' ){ + NotFound(310); /* LOG: URI does not start with "/" */ + } + if( strlen(zRoot)+40 >= sizeof(zLine) ){ + NotFound(320); /* LOG: URI too long */ + } if( zHttpHost==0 || zHttpHost[0]==0 ){ - NotFound(__LINE__); + NotFound(330); /* LOG: Missing HOST: parameter */ }else if( strlen(zHttpHost)+strlen(zRoot)+10 >= sizeof(zLine) ){ - NotFound(__LINE__); + NotFound(340); /* LOG: HOST parameter too long */ }else{ sprintf(zLine, "%s/%s", zRoot, zHttpHost); for(i=strlen(zRoot)+1; zLine[i] && zLine[i]!=':'; i++){ - int c = zLine[i]; + unsigned char c = (unsigned char)zLine[i]; if( !isalnum(c) ){ + if( c=='.' && (zLine[i+1]==0 || zLine[i+1]==':') ){ + /* If the client sent a FQDN with a "." at the end + ** (example: "sqlite.org." instead of just "sqlite.org") then + ** omit the final "." from the document root directory name */ + break; + } zLine[i] = '_'; }else if( isupper(c) ){ zLine[i] = tolower(c); @@ -1182,7 +2004,7 @@ void ProcessOneRequest(int forceClose){ if( standalone ){ sprintf(zLine, "%s", zRoot); }else{ - NotFound(__LINE__); + NotFound(350); /* LOG: *.website permissions */ } } } @@ -1192,69 +2014,75 @@ void ProcessOneRequest(int forceClose){ */ if( chdir(zHome)!=0 ){ char zBuf[1000]; - Malfunction(__LINE__, "cannot chdir to [%s] from [%s]", + Malfunction(360, /* LOG: chdir() failed */ + "cannot chdir to [%s] from [%s]", zHome, getcwd(zBuf,999)); } /* Locate the file in the filesystem. We might have to append - ** the name "index.html" in order to find it. Any excess path - ** information is put into the zPathInfo variable. + ** a name like "/home" or "/index.html" or "/index.cgi" in order + ** to find it. Any excess path information is put into the + ** zPathInfo variable. */ - zLine[0] = '.'; + j = j0 = (int)strlen(zLine); i = 0; while( zScript[i] ){ - while( zScript[i] && zScript[i]!='/' ){ - zLine[i+1] = zScript[i]; - i++; + while( zScript[i] && (i==0 || zScript[i]!='/') ){ + zLine[j] = zScript[i]; + i++; j++; } - zLine[i+1] = 0; + zLine[j] = 0; if( stat(zLine,&statbuf)!=0 ){ int stillSearching = 1; - while( stillSearching && i>0 ){ - while( i>0 && zLine[i]!='/' ){ i--; } - strcpy(&zLine[i], "/not-found.html"); + while( stillSearching && i>0 && j>j0 ){ + while( j>j0 && zLine[j-1]!='/' ){ j--; } + strcpy(&zLine[j-1], "/not-found.html"); if( stat(zLine,&statbuf)==0 && S_ISREG(statbuf.st_mode) && access(zLine,R_OK)==0 ){ - zRealScript = StrDup(&zLine[1]); - Redirect(zRealScript, 1); + zRealScript = StrDup(&zLine[j0]); + Redirect(zRealScript, 302, 1, 370); /* LOG: redirect to not-found */ return; }else{ - i--; + j--; } } - if( stillSearching ) NotFound(__LINE__); + if( stillSearching ) NotFound(380); /* LOG: URI not found */ break; } if( S_ISREG(statbuf.st_mode) ){ if( access(zLine,R_OK) ){ - NotFound(__LINE__); + NotFound(390); /* LOG: File not readable */ } - zRealScript = StrDup(&zLine[1]); + zRealScript = StrDup(&zLine[j0]); break; } if( zScript[i]==0 || zScript[i+1]==0 ){ - strcpy(&zLine[i+1],"/index.html"); - if( stat(zLine,&statbuf)!=0 || !S_ISREG(statbuf.st_mode) - || access(zLine,R_OK) ){ - strcpy(&zLine[i+1],"/index.cgi"); - if( stat(zLine,&statbuf)!=0 || !S_ISREG(statbuf.st_mode) - || access(zLine,R_OK) ){ - NotFound(__LINE__); - } + static const char *azIndex[] = { "/home", "/index.html", "/index.cgi" }; + int k = j>0 && zLine[j-1]=='/' ? j-1 : j; + unsigned int jj; + for(jj=0; jj=sizeof(azIndex)/sizeof(azIndex[0]) ){ + NotFound(400); /* LOG: URI is a directory w/o index.html */ + } + zRealScript = StrDup(&zLine[j0]); if( zScript[i]==0 ){ /* If the requested URL does not end with "/" but we had to ** append "index.html", then a redirect is necessary. Otherwise ** none of the relative URLs in the delivered document will be ** correct. */ - Redirect(zRealScript, 1); + Redirect(zRealScript,301,1,410); /* LOG: redirect to add trailing / */ return; } break; } - zLine[i+1] = zScript[i]; - i++; + zLine[j] = zScript[i]; + i++; j++; } zFile = StrDup(zLine); zPathInfo = StrDup(&zScript[i]); @@ -1276,54 +2104,33 @@ void ProcessOneRequest(int forceClose){ /* Take appropriate action */ if( (statbuf.st_mode & 0100)==0100 && access(zFile,X_OK)==0 ){ + char *zBaseFilename; /* Filename without directory prefix */ + /* - ** The followings static variables are used to setup the environment - ** for the CGI script + ** Abort with an error if the CGI script is writable by anyone other + ** than its owner. */ - static char *default_path = "/bin:/usr/bin"; - static char *gateway_interface = "CGI/1.0"; - static struct { - char *zEnvName; - char **pzEnvValue; - } cgienv[] = { - { "AUTH_TYPE", &zAuthType }, - { "AUTH_CONTENT", &zAuthArg }, - { "CONTENT_LENGTH", &zContentLength }, - { "CONTENT_TYPE", &zContentType }, - { "DOCUMENT_ROOT", &zHome }, - { "GATEWAY_INTERFACE", &gateway_interface }, - { "HTTP_ACCEPT", &zAccept }, - { "HTTP_COOKIE", &zCookie }, - { "HTTP_HOST", &zHttpHost }, - { "HTTP_REFERER", &zReferer }, - { "HTTP_USER_AGENT", &zAgent }, - { "PATH", &default_path }, - { "PATH_INFO", &zPathInfo }, - { "QUERY_STRING", &zQueryString }, - { "REMOTE_ADDR", &zRemoteAddr }, - { "REQUEST_METHOD", &zMethod }, - { "REQUEST_URI", &zScript }, - { "REMOTE_USER", &zRemoteUser }, - { "SCRIPT_DIRECTORY", &zDir }, - { "SCRIPT_FILENAME", &zFile }, - { "SCRIPT_NAME", &zRealScript }, - { "SERVER_NAME", &zServerName }, - { "SERVER_PORT", &zServerPort }, - { "SERVER_PROTOCOL", &zProtocol }, - }; - char *zBaseFilename; /* Filename without directory prefix */ + if( statbuf.st_mode & 0022 ){ + CgiScriptWritable(); + } /* If its executable, it must be a CGI program. Start by ** changing directories to the directory holding the program. */ if( chdir(zDir) ){ char zBuf[1000]; - Malfunction(__LINE__, "cannot chdir to [%s] from [%s]", + Malfunction(420, /* LOG: chdir() failed */ + "cannot chdir to [%s] from [%s]", zDir, getcwd(zBuf,999)); } + /* Compute the base filename of the CGI script */ + for(i=strlen(zFile)-1; i>=0 && zFile[i]!='/'; i--){} + zBaseFilename = &zFile[i+1]; + /* Setup the environment appropriately. */ + putenv("GATEWAY_INTERFACE=CGI/1.0"); for(i=0; i<(int)(sizeof(cgienv)/sizeof(cgienv[0])); i++){ if( *cgienv[i].pzEnvValue ){ SetEnv(cgienv[i].zEnvName,*cgienv[i].pzEnvValue); @@ -1331,28 +2138,24 @@ void ProcessOneRequest(int forceClose){ } if( useHttps ){ putenv("HTTPS=on"); - } - - /* - ** Abort with an error if the CGI script is writable by anyone other - ** than its owner. - */ - if( statbuf.st_mode & 0022 ){ - CgiScriptWritable(); + putenv("REQUEST_SCHEME=https"); + }else{ + putenv("REQUEST_SCHEME=http"); } /* For the POST method all input has been written to a temporary file, ** so we have to redirect input to the CGI script from that file. */ if( zMethod[0]=='P' ){ - dup(0); + if( dup(0)<0 ){ + Malfunction(430, /* LOG: dup(0) failed */ + "Unable to duplication file descriptor 0"); + } close(0); open(zTmpNam, O_RDONLY); } - for(i=strlen(zFile)-1; i>=0 && zFile[i]!='/'; i--){} - zBaseFilename = &zFile[i+1]; - if( i>=0 && strncmp(zBaseFilename,"nph-",4)==0 ){ + if( strncmp(zBaseFilename,"nph-",4)==0 ){ /* If the name of the CGI script begins with "nph-" then we are ** dealing with a "non-parsed headers" CGI script. Just exec() ** it directly and let it handle all its own header generation. @@ -1370,149 +2173,50 @@ void ProcessOneRequest(int forceClose){ */ { int px[2]; - pipe(px); + if( pipe(px) ){ + Malfunction(440, /* LOG: pipe() failed */ + "Unable to create a pipe for the CGI program"); + } if( fork()==0 ){ close(px[0]); close(1); - dup(px[1]); + if( dup(px[1])!=1 ){ + Malfunction(450, /* LOG: dup(1) failed */ + "Unable to duplicate file descriptor %d to 1", + px[1]); + } close(px[1]); + for(i=3; close(i)==0; i++){} execl(zBaseFilename, zBaseFilename, (char*)0); exit(0); } close(px[1]); - in = fdopen(px[0], "r"); + in = fdopen(px[0], "rb"); } if( in==0 ){ CgiError(); + }else{ + CgiHandleReply(in); } - - /* Read and process the first line of the header returned by the - ** CGI script. - */ - if( useTimeout ) alarm(15); - while( fgets(zLine,sizeof(zLine),in) ){ - if( strncmp(zLine,"Location:",9)==0 ){ - int i; - RemoveNewline(zLine); - z = &zLine[10]; - while( isspace(*z) ){ z++; } - for(i=0; z[i]; i++){ - if( z[i]=='?' ){ - zQuerySuffix = StrDup(""); - } - } - - if( z[0]=='/' && z[1]=='/' ){ - /* The scheme is missing. Add it in before redirecting */ - StartResponse("302 Redirect"); - nOut += printf("Location: %s:%s%s\r\n",zHttp,z,zQuerySuffix); - break; /* DK */ - MakeLogEntry(0); - return; - }else if( z[0]=='/' ){ - /* The scheme and network location are missing but we have - ** an absolute path. */ - Redirect(z, 0); /* DK */ - break; - } - /* Check to see if there is a scheme prefix */ - for(i=0; z[i] && z[i]!=':' && z[i]!='/'; i++){} - if( z[i]==':' ){ - /* We have a scheme. Assume there is an absolute URL */ - StartResponse("302 Redirect"); - nOut += printf("Location: %s%s\r\n",z,zQuerySuffix); - break; /* DK */ - MakeLogEntry(0); - return; - } - /* Must be a relative pathname. Construct the absolute pathname - ** and redirect to it. */ - i = strlen(zRealScript); - while( i>0 && zRealScript[i-1]!='/' ){ i--; } - while( i>0 && zRealScript[i-1]=='/' ){ i--; } - while( *z=='.' ){ - if( z[1]=='/' ){ - z += 2; - }else if( z[1]=='.' && z[2]=='/' ){ - while( i>0 && zRealScript[i-1]!='/' ){ i--; } - while( i>0 && zRealScript[i-1]=='/' ){ i--; } - z += 3; - }else{ - break; - } - } - StartResponse("302 Redirect"); - nOut += printf("Location: %s://%s",zHttp,zServerName); - if( strcmp(zServerPort,"80") ){ - nOut += printf(":%s",zServerPort); - } - nOut += printf("%.*s/%s%s\r\n\r\n",i,zRealScript,z,zQuerySuffix); - MakeLogEntry(0); - return; - }else if( strncmp(zLine,"Status:",7)==0 ){ - int i; - for(i=7; isspace(zLine[i]); i++){} - nOut += printf("%s %s", zProtocol, &zLine[i]); - strncpy(zReplyStatus, &zLine[i], 3); - zReplyStatus[3] = 0; - statusSent = 1; - break; - }else{ - int i; - StartResponse("200 OK"); - nOut += printf("%s",zLine); - for(i=0; zLine[i] && !isspace(zLine[i]) && zLine[i]!=':'; i++){} - if( i<2 || zLine[i]!=':' ) break; - } - } - - /* Copy everything else thru without change or analysis. + }else if( lenFile>5 && strcmp(&zFile[lenFile-5],".scgi")==0 ){ + /* Any file that ends with ".scgi" is assumed to be text of the + ** form: + ** SCGI hostname port + ** Open a TCP/IP connection to that host and send it an SCGI request */ - if( useTimeout ) alarm(60*5); - while( (c = getc(in))!=EOF ){ - putc(c,stdout); - nOut++; - } - fclose(in); + SendScgiRequest(zFile, zScript); }else if( countSlashes(zRealScript)!=countSlashes(zScript) ){ /* If the request URI for static content contains material past the ** actual content file name, report that as a 404 error. */ - NotFound(__LINE__); + NotFound(460); /* LOG: Excess URI content past static file name */ }else{ /* If it isn't executable then it ** must a simple file that needs to be copied to output. */ - const char *zContentType = GetMimeType(zFile, lenFile); - - if( zTmpNam ) unlink(zTmpNam); - in = fopen(zFile,"r"); - if( in==0 ) NotFound(__LINE__); - StartResponse("200 OK"); - nOut += DateTag("Last-Modified", statbuf.st_mtime); - nOut += printf("Content-type: %s\r\n",zContentType); - nOut += printf("Content-length: %d\r\n\r\n",(int)statbuf.st_size); - fflush(stdout); - if( strcmp(zMethod,"HEAD")==0 ){ - MakeLogEntry(0); - fclose(in); - return; - } - if( useTimeout ) alarm(30 + statbuf.st_size/1000); -#ifdef linux - { - off_t offset = 0; - nOut += sendfile(fileno(stdout), fileno(in), &offset, statbuf.st_size); - } -#else - while( (c = getc(in))!=EOF ){ - putc(c,stdout); - nOut++; - } -#endif - fclose(in); + if( SendFile(zFile, lenFile, &statbuf) ) return; } fflush(stdout); - MakeLogEntry(0); + MakeLogEntry(0, 0); /* LOG: Normal reply */ /* The next request must arrive within 30 seconds or we close the connection */ @@ -1534,9 +2238,9 @@ typedef union { } address; /* -** Implement an HTTP server daemon listening on port iPort. +** Implement an HTTP server daemon listening on port zPort. ** -** As new connections arrive, fork a child and let child return +** As new connections arrive, fork a child and let the child return ** out of this procedure call. The child will handle the request. ** The parent never returns from this procedure. ** @@ -1683,19 +2387,26 @@ int main(int argc, char **argv){ zRoot = zArg; }else if( strcmp(z,"-logfile")==0 ){ zLogFile = zArg; + }else if( strcmp(z,"-max-age")==0 ){ + mxAge = atoi(zArg); + }else if( strcmp(z,"-max-cpu")==0 ){ + maxCpu = atoi(zArg); }else if( strcmp(z,"-https")==0 ){ useHttps = atoi(zArg); - zHttp = useHttps ? "http" : "https"; + zHttp = useHttps ? "https" : "http"; + if( useHttps ) zRemoteAddr = getenv("REMOTE_HOST"); }else if( strcmp(z, "-port")==0 ){ zPort = zArg; standalone = 1; + }else if( strcmp(z, "-family")==0 ){ if( strcmp(zArg, "ipv4")==0 ){ ipv4Only = 1; }else if( strcmp(zArg, "ipv6")==0 ){ ipv6Only = 1; }else{ - Malfunction(__LINE__, "unknown IP protocol: [%s]", zArg); + Malfunction(500, /* LOG: unknown IP protocol */ + "unknown IP protocol: [%s]\n", zArg); } }else if( strcmp(z, "-jail")==0 ){ if( atoi(zArg)==0 ){ @@ -1705,8 +2416,18 @@ int main(int argc, char **argv){ if( atoi(zArg) ){ useTimeout = 0; } + }else if( strcmp(z, "-input")==0 ){ + if( freopen(zArg, "rb", stdin)==0 || stdin==0 ){ + Malfunction(501, /* LOG: cannot open --input file */ + "cannot open --input file \"%s\"\n", zArg); + } + }else if( strcmp(z, "-datetest")==0 ){ + TestParseRfc822Date(); + printf("Ok\n"); + exit(0); }else{ - Malfunction(__LINE__, "unknown argument: [%s]", z); + Malfunction(510, /* LOG: unknown command-line argument on launch */ + "unknown argument: [%s]\n", z); } argv += 2; argc -= 2; @@ -1715,7 +2436,8 @@ int main(int argc, char **argv){ if( standalone ){ zRoot = "."; }else{ - Malfunction(__LINE__, "no --root specified"); + Malfunction(520, /* LOG: --root argument missing */ + "no --root specified"); } } @@ -1723,7 +2445,8 @@ int main(int argc, char **argv){ ** create a chroot jail there. */ if( chdir(zRoot)!=0 ){ - Malfunction(__LINE__, "cannot change to directory [%s]", zRoot); + Malfunction(530, /* LOG: chdir() failed */ + "cannot change to directory [%s]", zRoot); } /* Get information about the user if available */ @@ -1732,7 +2455,8 @@ int main(int argc, char **argv){ /* Enter the chroot jail if requested */ if( zPermUser && useChrootJail && getuid()==0 ){ if( chroot(".")<0 ){ - Malfunction(__LINE__, "unable to create chroot jail"); + Malfunction(540, /* LOG: chroot() failed */ + "unable to create chroot jail"); }else{ zRoot = ""; } @@ -1740,26 +2464,44 @@ int main(int argc, char **argv){ /* Activate the server, if requested */ if( zPort && http_server(zPort, 0) ){ - Malfunction(__LINE__, "failed to start server"); + Malfunction(550, /* LOG: server startup failed */ + "failed to start server"); } +#ifdef RLIMIT_CPU + if( maxCpu>0 ){ + struct rlimit rlim; + rlim.rlim_cur = maxCpu; + rlim.rlim_max = maxCpu; + setrlimit(RLIMIT_CPU, &rlim); + } +#endif + /* Drop root privileges. */ if( zPermUser ){ if( pwd ){ - setgid(pwd->pw_gid); - setuid(pwd->pw_uid); + if( setgid(pwd->pw_gid) ){ + Malfunction(560, /* LOG: setgid() failed */ + "cannot set group-id to %d", pwd->pw_gid); + } + if( setuid(pwd->pw_uid) ){ + Malfunction(570, /* LOG: setuid() failed */ + "cannot set user-id to %d", pwd->pw_uid); + } }else{ - Malfunction(__LINE__, "no such user [%s]", zPermUser); + Malfunction(580, /* LOG: unknown user */ + "no such user [%s]", zPermUser); } } if( getuid()==0 ){ - Malfunction(__LINE__, "cannot run as root"); + Malfunction(590, /* LOG: cannot run as root */ + "cannot run as root"); } - /* Get the IP address from when the request originates + /* Get the IP address from whence the request originates */ - { + if( zRemoteAddr==0 ){ address remoteAddr; unsigned int size = sizeof(remoteAddr); char zHost[NI_MAXHOST]; @@ -1769,6 +2511,13 @@ int main(int argc, char **argv){ zRemoteAddr = StrDup(zHost); } } + if( zRemoteAddr!=0 + && strncmp(zRemoteAddr, "::ffff:", 7)==0 + && strchr(zRemoteAddr+7, ':')==0 + && strchr(zRemoteAddr+7, '.')!=0 + ){ + zRemoteAddr += 7; + } /* Process the input stream */ for(i=0; i<100; i++){ @@ -1777,3 +2526,67 @@ int main(int argc, char **argv){ ProcessOneRequest(1); exit(0); } + +#if 0 +/* Copy/paste the following text into SQLite to generate the xref +** table that describes all error codes. +*/ +BEGIN; +CREATE TABLE IF NOT EXISTS xref(lineno INTEGER PRIMARY KEY, desc TEXT); +DELETE FROM Xref; +INSERT INTO xref VALUES(100,'Malloc() failed'); +INSERT INTO xref VALUES(110,'Not authorized'); +INSERT INTO xref VALUES(120,'CGI Error'); +INSERT INTO xref VALUES(130,'Timeout'); +INSERT INTO xref VALUES(140,'CGI script is writable'); +INSERT INTO xref VALUES(150,'Cannot open -auth file'); +INSERT INTO xref VALUES(160,'http request on https-only page'); +INSERT INTO xref VALUES(170,'-auth redirect'); +INSERT INTO xref VALUES(180,'malformed entry in -auth file'); +INSERT INTO xref VALUES(190,'chdir() failed'); +INSERT INTO xref VALUES(200,'bad protocol in HTTP header'); +INSERT INTO xref VALUES(210,'Empty request URI'); +INSERT INTO xref VALUES(220,'Unknown request method'); +INSERT INTO xref VALUES(230,'Referrer is devids.net'); +INSERT INTO xref VALUES(240,'Illegal content in HOST: parameter'); +INSERT INTO xref VALUES(250,'Disallowed user agent'); +INSERT INTO xref VALUES(260,'Disallowed referrer'); +INSERT INTO xref VALUES(270,'Request too large'); +INSERT INTO xref VALUES(280,'mkstemp() failed'); +INSERT INTO xref VALUES(290,'cannot create temp file for POST content'); +INSERT INTO xref VALUES(300,'Path element begins with . or -'); +INSERT INTO xref VALUES(310,'URI does not start with /'); +INSERT INTO xref VALUES(320,'URI too long'); +INSERT INTO xref VALUES(330,'Missing HOST: parameter'); +INSERT INTO xref VALUES(340,'HOST parameter too long'); +INSERT INTO xref VALUES(350,'*.website permissions'); +INSERT INTO xref VALUES(360,'chdir() failed'); +INSERT INTO xref VALUES(370,'redirect to not-found page'); +INSERT INTO xref VALUES(380,'URI not found'); +INSERT INTO xref VALUES(390,'File not readable'); +INSERT INTO xref VALUES(400,'URI is a directory w/o index.html'); +INSERT INTO xref VALUES(410,'redirect to add trailing /'); +INSERT INTO xref VALUES(420,'chdir() failed'); +INSERT INTO xref VALUES(430,'dup(0) failed'); +INSERT INTO xref VALUES(440,'pipe() failed'); +INSERT INTO xref VALUES(450,'dup(1) failed'); +INSERT INTO xref VALUES(460,'Excess URI content past static file name'); +INSERT INTO xref VALUES(470,'ETag Cache Hit'); +INSERT INTO xref VALUES(480,'fopen() failed for static content'); +INSERT INTO xref VALUES(2,'Normal HEAD reply'); +INSERT INTO xref VALUES(0,'Normal reply'); +INSERT INTO xref VALUES(500,'unknown IP protocol'); +INSERT INTO xref VALUES(501,'cannot open --input file'); +INSERT INTO xref VALUES(510,'unknown command-line argument on launch'); +INSERT INTO xref VALUES(520,'--root argument missing'); +INSERT INTO xref VALUES(530,'chdir() failed'); +INSERT INTO xref VALUES(540,'chroot() failed'); +INSERT INTO xref VALUES(550,'server startup failed'); +INSERT INTO xref VALUES(560,'setgid() failed'); +INSERT INTO xref VALUES(570,'setuid() failed'); +INSERT INTO xref VALUES(580,'unknown user'); +INSERT INTO xref VALUES(590,'cannot run as root'); +INSERT INTO xref VALUES(600,'malloc() failed'); +INSERT INTO xref VALUES(610,'malloc() failed'); +COMMIT; +#endif /* SQL */ diff --git a/althttpd.md b/althttpd.md new file mode 100644 index 0000000..b40fff0 --- /dev/null +++ b/althttpd.md @@ -0,0 +1,360 @@ +The Althttpd Webserver +====================== + +Althttpd is a simple webserver that has run the website +since 2004. Althttpd strives for simplicity, security, and low resource +usage. + +As of 2018, the althttpd instance for sqlite.org answers +about 500,000 HTTP requests per day (about 5 or 6 per second) +delivering about 50GB of content per day (about 4.6 megabits/second) +on a $40/month [Linode](https://www.linode.com/pricing). The load +average on this machine normally stays around 0.1 or 0.2. About 19% +of the HTTP requests are CGI to various [Fossil](https://fossil-scm.org/) +source-code repositories. + +Design Philosophy +---------------- + +Althttpd is usually launched from +[xinetd](https://en.wikipedia.org/wiki/Xinetd) or +[stunnel4](https://www.stunnel.org/). A separate process +is started for each incoming connection, and that process is +wholly focused on serving that one connection. A single althttpd +process will handle one or more HTTP requests over the same connection. +When the connection closes, the althttpd process exits. + +Althttpd can also operate stand-alone. Althttpd +itself listens on port 80 for incoming HTTP requests, then forks +a copy of itself to handle each inbound connection. Each connection +is still handled using a separate process. The only difference is +that the connection-handler process is now started by a master +althttpd instance rather than by xinetd or stunnel4. + +Althttpd has no configuration file. All configuration is handled +using a few command-line arguments. This helps to keep the +configuration simple and mitigates worries about about introducing +a security vulnerability through a misconfigured web server. + +Althttpd does not itself handle TLS connections. For HTTPS, althttpd +relies on stunnel4 to handle TLS protocol negotiation, decryption, and +encryption. + +Because each althttpd process only needs to service a single +connection, althttpd is single threaded. Furthermore, each process +only lives for the duration of a single connection, which means that +althttpd does not need to worry too much about memory leaks. +These design factors help keep the althttpd source code simple, +which facilitates security auditing and analysis. + + +Source Code +----------- + +The complete source code for althttpd is contained within a single +C-code file with no dependences outside of the standard C library. +The source code file is named "[althttpd.c](/file/althttpd.c)". +To build and install althttpd, run the following command: + +> + gcc -Os -o /usr/bin/althttpd althttpd.c + +The althttpd source code is heavily commented and accessible. +It should be relatively easy to customize for specialized needs. + +Setup Using Xinetd +------------------ + +Shown below is the complete text of the /etc/xinetd.d/http file on +sqlite.org that configures althttpd to server unencrypted +HTTP requests on both IPv4 and IPv6. +You can use this as a template to create your own installations. + +> + service http + { + port = 80 + flags = IPv4 + socket_type = stream + wait = no + user = root + server = /usr/bin/althttpd + server_args = -logfile /logs/http.log -root /home/www -user www-data + bind = 45.33.6.223 + } +> + service http + { + port = 80 + flags = REUSE IPv6 + bind = 2600:3c00::f03c:91ff:fe96:b959 + socket_type = stream + wait = no + user = root + server = /usr/bin/althttpd + server_args = -logfile /logs/http.log -root /home/www -user www-data + } + + +The key observation here is that each incoming TCP/IP connection on +port 80 launches a copy of /usr/bin/althttpd with some additional +arguments that amount to the configuration for the webserver. + +Notice that althttpd is run as the superuser. This is not required, but if it +is done, then althttpd will move itself into a chroot jail at the root +of the web document hierarchy (/home/www in the example) and then drop +all superuser privileges prior to reading any content off of the wire. +The -user option tells althttpd to become user www-data after entering +the chroot jail. + +The -root option tells althttpd where to find the document hierarchy. +In the case of sqlite.org, all content is served from /home/www. +At the top level of this document hierarchy is a bunch of directories +whose names end with ".website". Each such directory is a separate +website. The directory is chosen based on the Host: parameter of the +incoming HTTP request. A partial list of the directories on sqlite.org +is this: + +> + 3dcanvas_tcl_lang_org.website + 3dcanvas_tcl_tk.website + androwish_org.website + canvas3d_tcl_lang_org.website + canvas3d_tcl_tk.website + cvstrac_org.website + default.website + fossil_scm_com.website + fossil_scm_hwaci_com.website + fossil_scm_org.website + system_data_sqlite_org.website + wapp_tcl_lang_org.website + wapp_tcl_tk.website + www2_alt_mail_net.website + www_androwish_org.website + www_cvstrac_org.website + www_fossil_scm_com.website + www_fossil_scm_org.website + www_sqlite_org.website + +For each incoming HTTP request, althttpd takes the text of the Host: +parameter in the request header, converts it to lowercase, and changes +all characters other than ASCII alphanumerics into "_". The result +determines which subdirectory to use for content. If nothing matches, +the "default.website" directory is used as a fallback. + +For example, if the Host parameter is "www.SQLite.org" then the name is +translated into "www\_sqlite\_org.website" and that is the directory +used to serve content. If the Host parameter is "fossil-scm.org" then +the "fossil\_scm\_org.website" directory is used. Oftentimes, two or +more names refer to the same website. For example, fossil-scm.org, +www.fossil-scm.org, fossil-scm.com, and www.fossil-scm.com are all the +same website. In that case, typically only one of the directories is +a real directory and the others are symbolic links. + +On a minimal installation that only hosts a single website, it suffices +to have a single subdirectory named "default.website". + +Within the *.website directory, the file to be served is selected by +the HTTP request URI. Files that are marked as executable are run +as CGI. Non-executable files with a name that ends with ".scgi" +and that have content of the form "SCGI hostname port" relay an SCGI +request to hostname:port. All other non-executable files are delivered +as-is. + +If the request URI specifies the name of a directory within *.website, +then althttpd appends "/home", "/index.html", and "/index.cgi", in +that order, looking for a match. + +If a prefix of a URI matches the name of an executable file then that +file is run as CGI. For as-is content, the request URI must exactly +match the name of the file. + +For content delivered as-is, the MIME-type is deduced from the filename +extension using a table that is compiled into althttpd. + +Setup For HTTPS Using Stunnel4 +------------------------------ + +Althttpd itself does not do any encryption. +To set up an encrypted website using althttpd, the recommended technique +is to use [stunnel4](https://www.stunnel.org/). + +On the sqlite.org website, the relevant lines of the +/etc/stunnel/stunnel.conf file are: + +> + cert = /etc/letsencrypt/live/sqlite.org/fullchain.pem + key = /etc/letsencrypt/live/sqlite.org/privkey.pem + \[https\] + accept = :::443 + TIMEOUTclose = 0 + exec = /usr/bin/althttpd + execargs = /usr/bin/althttpd -logfile /logs/http.log -root /home/www -user www-data -https 1 + +This setup is very similar to the xinetd setup. One key difference is +the "-https 1" option is used to tell althttpd that the connection is +encrypted. This is important so that althttpd will know to set the +HTTPS environment variable for CGI programs. + +It is ok to have both xinetd and stunnel4 both configured to +run althttpd, at the same time. In fact, that is the way that the +SQLite.org website works. Requests to go through +xinetd and requests to go through stunnel4. + +Stand-alone Operation +--------------------- + +On the author's desktop workstation, in his home directory is a subdirectory +named ~/www/default.website. That subdirectory contains a collection of +files and CGI scripts. Althttpd can serve the content there by running +the following command: + +> + althttpd -root ~/www -port 8080 + +The "-port 8080" option is what tells althttpd to run in stand-alone +mode, listening on port 8080. + +The author of althttpd has only ever used stand-alone mode for testing. +Since althttpd does not itself support TLS encryption, the +stunnel4 setup is preferred for production websites. + +Security Features +----------------- + +To defend against mischief, there are restrictions on names of files that +althttpd will serve. Within the request URI, all characters other than +alphanumerics and ",-./:~" are converted into a single "_". Furthermore, +if any path element of the request URI begins with "." or "-" then +althttpd always returns a 404 Not Found error. Thus it is safe to put +auxiliary files (databases or other content used by CGI, for example) +in the document hierarchy as long as the filenames being with "." or "-". + +An exception: Though althttpd normally returns 404 Not Found for any +request with a path element beginning with ".", it does allow requests +where the URI begins with "/.well-known/". And file or directory names +below "/.well-known/" are allowed to begin with "." or "-" (but not +with ".."). This exception is necessary to allow LetsEncrypt to validate +ownership of the website. + +Basic Authentication +-------------------- + +If a file named "-auth" appears anywhere within the content hierarchy, +then all sibling files and all files in lower-level directories require +[HTTP basic authentication](https://en.wikipedia.org/wiki/Basic_access_authentication), +as defined by the content of the "-auth" file. +The "-auth" file is plain text and line oriented. +Blank lines and lines that begin with "#" are ignored. +Other lines have meaning as follows: + + * http-redirect + + The http-redirect line, if present, causes all HTTP requests to + redirect into an HTTPS request. The "-auth" file is read and + processes sequentially, so lines below the "http-redirect" line + are never seen or processed for http requests. + + * https-only + + The https-only line, if present, means that only HTTPS requests + are allowed. Any HTTP request results in a 404 Not Found error. + The https-only line normally occurs after an http-redirect line. + + * realm NAME + + A single line of this form establishes the "realm" for basic + authentication. Web browsers will normally display the realm name + as a title on the dialog box that asks for username and password. + + * user NAME LOGIN:PASSWORD + + There are multiple user lines, one for each valid user. The + LOGIN:PASSWORD argument defines the username and password that + the user must type to gain access to the website. The password + is clear-text - HTTP Basic Authentication is not the most secure + authentication mechanism. Upon successful login, the NAME is + stored in the REMOTE_USER environment variable so that it can be + accessed by CGI scripts. NAME and LOGIN are usually the same, + but can be different. + + * anyone + + If the "anyone" line is encountered, it means that any request is + allowed, even if there is no username and password provided. + This line is useful in combination with "http-redirect" to cause + all ordinary HTTP requests to redirect to HTTPS without requiring + login credentials. + +Basic Authentication Examples +----------------------------- + +The website contains a "-auth" file in the +toplevel directory as follows: + +> + http-redirect + anyone + +That -auth file causes all HTTP requests to be redirected to HTTPS, without +requiring any further login. (Try it: visit http://sqlite.org/ and +verify that you are redirected to https://sqlite.org/.) + +There is a "-auth" file at that looks +like this: + +> + realm Access To All Fossil Repositories + http-redirect + user drh drh:xxxxxxxxxxxxxxxx + +Except, of course, the password is not a row of "x" characters. This +demonstrates the typical use for a -auth file. Access is granted for +a single user to the content in the "private" subdirectory, provided that +the user enters with HTTPS instead of HTTP. The "http-redirect" line +is strongly recommended for all basic authentication since the password +is contained within the request header and can be intercepted and +stolen by bad guys if the request is sent via HTTP. + +Log File +-------- + +If the -logfile option is given on the althttpd command-line, then a single +line is appended to the named file for each HTTP request. +The log file is in the Comma-Separated Value or CSV format specified +by [RFC4180](https://tools.ietf.org/html/rfc4180). +There is a comment in the source code that explains what each of the fields +in this output line mean. + +The fact that the log file is CSV makes it easy to import into +SQLite for analysis, using a script like this: + +> + CREATE TABLE log( + date TEXT, /* Timestamp */ + ip TEXT, /* Source IP address */ + url TEXT, /* Request URI */ + ref TEXT, /* Referer */ + code INT, /* Result code. ex: 200, 404 */ + nIn INT, /* Bytes in request */ + nOut INT, /* Bytes in reply */ + t1 INT, t2 INT, /* Process time (user, system) milliseconds */ + t3 INT, t4 INT, /* CGI script time (user, system) milliseconds */ + t5 INT, /* Wall-clock time, milliseconds */ + nreq INT, /* Sequence number of this request */ + agent TEXT, /* User agent */ + user TEXT, /* Remote user */ + n INT, /* Bytes of url that are in SCRIPT_NAME */ + lineno INT /* Source code line that generated log entry */ + ); + .mode csv + .import httplog.csv log + + +The filename on the -logfile option may contain time-based characters +that are expanded by [strftime()](https://linux.die.net/man/3/strftime). +Thus, to cause a new logfile to be used for each day, you might use +something like: + +> + -logfile /var/logs/althttpd/httplog-%Y%m%d.csv