Why log HTTP requests?
If you’re running HTTP servers you probably want to log HTTP requests.
They are helpful for diagnosing problems (which requests fail? how many requests per day we serve? which requests are slow?).
They are necessary for analytics (which pages are popular? where do website visitors come from?).
This article describes everything you need to know to log HTTP requests in your own Go web servers.
This is not about re-usable library but about things you need to know to implement your own solution, a description of my logging choice.
To show what information is typically logged, here’s a sample log entry in Apache’s Extended Log File Format:
111.222.333.123 HOME - [01/Feb/1998:01:08:39 -0800] "GET /bannerad/ad.htm HTTP/1.0" 200 198 "http://www.referrer.com/bannerad/ba_intro.htm" "Mozilla/4.01 (Macintosh; I; PPC)"
We have:
111.222.333.123
: ip address of client making HTTP requests
HOME
: domain (for the cases of serving multiple domains from the same web server)
-
: user authentication (empty in this case)
[01/Feb/1998:01:08:39 -0800]
: time when request was recorded
"GET /bannerad/ad.htm HTTP/1.0"
: HTTP method, url and protocol type
200
: HTTP status code. 200 means the request was successfully handled
198
: size of the response
"http://www.referrer.com/bannerad/ba_intro.htm"
: referer
"Mozilla/4.01 (Macintosh; I; PPC)"
: so called user agent identifying HTTP client (most likely a web browser)
We can log more information or choose to not log some of the above information.
Personally:
- I also log how long did the server spent generating a request, in milliseconds (that’s enough resolution for me; recording in microseconds is possible but an overkill)
- I don’t log protocol (like
HTTP/1.0
)
- a server usually serves a single purpose so no need to log domain
- if the service has user authentication, I also log user id
The signature of Go’s standard HTTP serving function is:
func(w http.ResponseWriter, r *http.Request)
We’ll add logging as a so-called middleware, which is a way to add re-usable pieces of functionality into HTTP serving pipeline.
We have logReqeustHandler
function that takes http.Handler
interface as an argument and returns another http.Handler
that wraps the original handler and adds logging:
func logRequestHandler(h http.Handler) http.Handler {
fn := func(w http.ResponseWriter, r *http.Request) {
// call the original http.Handler we're wrapping
h.ServeHTTP(w, r)
// gather information about request and log it
uri := r.URL.String()
method := r.Method
// ... more information
logHTTPReq(uri, method, ....)
}
// http.HandlerFunc wraps a function so that it
// implements http.Handler interface
return http.HandlerFunc(fn)
}
We can nest middleware handlers on top of each other, each adding some functionality.
Here’s how we would use it to add logging to all request handlers:
func makeHTTPServer() *http.Server {
mux := &http.ServeMux{}
mux.HandleFunc("/", handleIndex)
// ... potentially add more handlers
var handler http.Handler = mux
// wrap mux with our logger. this will
handler = logRequestHandler(handler)
// ... potentially add more middleware handlers
srv := &http.Server{
ReadTimeout: 120 * time.Second,
WriteTimeout: 120 * time.Second,
IdleTimeout: 120 * time.Second, // introduced in Go 1.8
Handler: handler,
}
return srv
}
First, let’s define a struct to package all the logged information:
// LogReqInfo describes info about HTTP request
type HTTPReqInfo struct {
// GET etc.
method string
uri string
referer string
ipaddr string
// response code, like 200, 404
code int
// number of bytes of the response sent
size int64
// how long did it take to
duration time.Duration
userAgent string
}
Here’s the full implementation of logRequestHandler
:
func logRequestHandler(h http.Handler) http.Handler {
fn := func(w http.ResponseWriter, r *http.Request) {
ri := &HTTPReqInfo{
method: r.Method,
uri: r.URL.String(),
referer: r.Header.Get("Referer"),
userAgent: r.Header.Get("User-Agent"),
}
ri.ipaddr = requestGetRemoteAddress(r)
// this runs handler h and captures information about
// HTTP request
m := httpsnoop.CaptureMetrics(h, w, r)
ri.code = m.Code
ri.size = m.BytesWritten
ri.duration = m.Duration
logHTTPReq(ri)
}
return http.HandlerFunc(fn)
}
Let’s go over the easy cases:
r.Method
returns HTTP method, like “GET”, “POST” etc.
r.URL
is parsed url, like /getname?id=5
and String()
gives us string representation
r.Header
is a struct representing HTTP headers. Headers contain, among other things, Referer
and User-Agent
- to log how long it takes the server to serve the request, we remember
timeStart
at the beginning, call the handler, get the duration with time.Since(timeStart)
Other information is harder to get.
The problem with getting IP address of the client is that there could be a HTTP proxy involved. The client makes a request to the proxy, the proxy makes a requests to us. As a result, we get the IP address of the proxy, not the client.
Because of that, proxies usually augment the request to us with information about client’s real IP address using X-Real-Ip
or X-Forwarded-For
HTTP headers.
Here’s how to extract this info:
// Request.RemoteAddress contains port, which we want to remove i.e.:
// "[::1]:58292" => "[::1]"
func ipAddrFromRemoteAddr(s string) string {
idx := strings.LastIndex(s, ":")
if idx == -1 {
return s
}
return s[:idx]
}
// requestGetRemoteAddress returns ip address of the client making the request,
// taking into account http proxies
func requestGetRemoteAddress(r *http.Request) string {
hdr := r.Header
hdrRealIP := hdr.Get("X-Real-Ip")
hdrForwardedFor := hdr.Get("X-Forwarded-For")
if hdrRealIP == "" && hdrForwardedFor == "" {
return ipAddrFromRemoteAddr(r.RemoteAddr)
}
if hdrForwardedFor != "" {
// X-Forwarded-For is potentially a list of addresses separated with ","
parts := strings.Split(hdrForwardedFor, ",")
for i, p := range parts {
parts[i] = strings.TrimSpace(p)
}
// TODO: should return first non-local address
return parts[0]
}
return hdrRealIP
}
Capturing ResponseWriter status code and size of the response is the most tricky.
http.ResponseWriter
doesn’t give us this information. Luckily, it’s a simple interface:
type ResponseWriter interface {
Header() Header
Write([]byte) (int, error)
WriteHeader(statusCode int)
}
It’s possible to write an implementation of this interface which wraps the original and records the information we want to know. Lucky for us, some already did it in package
httpsnoop.
The Apache log format is compact, human-readable but rather hard to parse.
At some point we also need to read the logs for analysis and I don’t relish the idea of implementing a parser for this format.
From the point of view of implementation, one easy way would be to log each record in JSON, separated by newline.
What I don’t like about this approach is: JSON is not easy to read.
As a middle-ground, I’ve created siser
library which implements a serialization format that is extensible, easy to implement and human readable. It’s well suited for logging structured information and I use it in multiple projects.
Here’s how a single request is serialized:
171 1567185903788 httplog
method: GET
uri: /favicon.ico
ipaddr: 204.14.239.58
code: 404
size: 758
duration: 0
ua: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0
The firsts line of each record contains the information:
171
is the size of the following record data. Knowing size in advance allows for safe and efficient implementation
1567185903788
is a timestamp in Unix format (number of seconds since Epoch). It allows us to skip recording redundant timestamp in the data
httplog
is type of the record. This allows us to write different kinds of records to the same file. In our case, all records are the same.
What follows is data in a key: value
format.
Here’s how we serialize a record and write it to log file:
var (
muLogHTTP sync.Mutex
)
func logHTTPReq(ri *HTTPReqInfo) {
var rec siser.Record
rec.Name = "httplog"
rec.Append("method", ri.method)
rec.Append("uri", ri.uri)
if ri.referer != "" {
rec.Append("referer", ri.referer)
}
rec.Append("ipaddr", ri.ipaddr)
rec.Append("code", strconv.Itoa(ri.code))
rec.Append("size", strconv.FormatInt(ri.size, 10))
durMs := ri.duration / time.Millisecond
rec.Append("duration", strconv.FormatInt(int64(durMs), 10))
rec.Append("ua", ri.userAgent)
muLogHTTP.Lock()
defer muLogHTTP.Unlock()
_, _ = httpLogSiser.WriteRecord(&rec)
}
Rotating logs daily
I usually deploy my servers on Ubuntu servers and place logs in /data/<service-name>/log
directory.
We can’t write to the same log file forever. Eventually we would run out of space.
For long term logs I usually have a log file per day, named by date e.g. 2019-09-23.txt
, 2019-09-24.txt
etc.
This is sometimes called log rotation.
To avoid re-implementing this functionality multiple times, I wrote a library
filerotate.
It implements Write
, Close
and Flush
methods, so it’s easy to plug into existing code using io.Writer
.
You specify a directory to use and a suffix for file name.
You then write to it as to a regular file and the code ensures to create per-day files.
You can also provide a callback to be notified when rotation happens so that you can, for example, upload the file to online storage for archival or run analytics on it.
Here’s the code:
func didClose(path string, didRotate bool) {
fmt.Printf("closed file '%s', didRotate: %v\n", path, didRotate)
if didRotate {
// here you can implement e.g. compressing a log file and
// uploading them to s3 for long-term storage
}
}
f, err := filerotate.NewDaily(".", "log.txt", didClose)
if err != nil {
log.Fatalf("filerotate.NewDaily() failed with '%s'\n", err)
}
_, err = f.Write([]byte("hello"))
if err != nil {
log.Fatalf("f.Write() failed with '%s'\n", err)
}
err = f.Close()
if err != nil {
log.Fatalf("f.Close() failed with '%s'\n", err)
}
Long term storage and analytics
For long term storage I compress them with gzip and upload the files to online storage. There are plenty of options: S3, Google Storage, Digital Ocean Spaces, BackBlaze.
I tend to use Digital Ocean Spaces or BackBlaze, as they are significantly cheaper (both in storage costs and bandwidth costs).
They both support S3 protocol so I use
go-minio library.
For analytics I do daily processing using my own code to generate basic summaries of the most useful information.
Another option would be to import the data into a system like
BigQuery.
More Go resources
- siser is my library implementing simple serialization format
- I wrote an in-depth article about the design of siser
- filerotate is my library implementing a file that rotates daily