HTTP Protocol: How It Works and Why It’s Designed This Way

14 min readFeb 15, 2025

Leapcell: The Next — Gen Serverless Platform for Web Hosting, Async Tasks, and Redis

HTTP Protocol: The Cornerstone of the Internet and Essential Knowledge for Web Development

In the world of the Internet, the HTTP protocol is undoubtedly a fundamental protocol and essential knowledge in the field of web development. In particular, its latest version, HTTP/2, has drawn extensive attention and become a technological hotspot. This article will delve into the historical evolution and design concepts of the HTTP protocol, helping readers gain a comprehensive understanding of this crucial technology.

I. HTTP/0.9: The Embryonic Form of Internet Communication

HTTP is an application — layer protocol based on the TCP/IP protocol. It focuses on specifying the communication format between clients and servers and does not involve the transmission of data packets. It defaults to using port 80. The HTTP/0.9 version, released in 1991, is the earliest version of the HTTP protocol. Its design is extremely simple, with only one command, GET.

GET /index.html

The meaning of the above command is that after the TCP connection is established, the client requests the web page index.html from the server. According to the protocol, the server can only respond with an HTML — formatted string and cannot respond in other formats. For example:

<html>
  <body>Hello World</body>
</html>

After the server finishes sending, it will immediately close the TCP connection. Although this version is simple, it laid the foundation for the subsequent development of the HTTP protocol and marked the establishment of a simple communication mode between the client and the server.

II. HTTP/1.0: The Initial Expansion of Functions

In May 1996, the HTTP/1.0 version was released. Compared with HTTP/0.9, its content increased significantly, bringing important changes to the development of the Internet.

2.1 Introduction

Diversification of content formats: HTTP/1.0 allows the sending of content in any format. This makes the Internet no longer limited to text transmission, and various types of data such as images, videos, and binary files can be transmitted over the network, laying a solid foundation for the diversified development of the Internet.
Rich interactive commands: In addition to the GET command, the POST command and the HEAD command were introduced. The POST command is often used to submit data to the server, such as the information submitted during user registration and login. The HEAD command is mainly used to obtain the meta — information of resources without returning the actual resource content. The addition of these commands has greatly enriched the interaction methods between browsers and servers.
Changes in request and response formats: In each communication, in addition to the data part, header information (HTTP header) must be included, which is used to describe some metadata, such as the source of the request, the type of client, and the acceptable data formats. In addition, functions such as status codes (status code), multi — charset support, multi — part sending (multi — part type), authorization, cache, and content encoding were added. The status code is used to indicate the processing result of the server for the request. For example, 200 indicates a successful request, and 404 indicates that the resource was not found. Multi — charset support enables the correct display of content in different languages on the Internet. The cache function can reduce repeated requests and improve access speed.

2.2 Request Format

GET / HTTP/1.0
User - Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
Accept: */*

Compared with the HTTP/0.9 version, the request format of the 1.0 version has changed significantly. The first line is the request command, and the protocol version (HTTP/1.0) must be added at the end. The following are multiple lines of header information, which are used to describe the situation of the client. The User — Agent field identifies the type and version of the client, and the Accept field declares the data formats that the client can accept.

2.3 Response Format

HTTP/1.0 200 OK 
Content - Type: text/plain
Content - Length: 137582
Expires: Thu, 05 Dec 1997 16:00:00 GMT
Last - Modified: Wed, 5 August 1996 15:55:28 GMT
Server: Apache 0.84

<html>
  <body>Hello World</body>
</html>

The server’s response format is “header information + a blank line (\r\n) + data”. Among them, the first line is “protocol version + status code (status code) + status description”. The status code 200 indicates a successful request, and OK is the status description. The Content — Type field indicates the type of data, the Content — Length field indicates the length of the data, the Expires field specifies the expiration time of the resource, the Last — Modified field indicates the last modification time of the resource, and the Server field identifies the type and version of the server.

2.4 Content — Type Field

In the HTTP/1.0 version, the header information must be in ASCII code, and the subsequent data can be in any format. Therefore, when the server responds, it needs to tell the client the format of the data through the Content — Type field. Common values of the Content — Type field include:

Text types: text/plain (plain text), text/html (HTML document), text/css (CSS style sheet).
Image types: image/jpeg (JPEG image), image/png (PNG image), image/svg + xml (SVG vector graphic).
Audio and video types: audio/mp4 (MP4 audio), video/mp4 (MP4 video).
Application types: application/javascript (JavaScript script), application/pdf (PDF document), application/zip (ZIP compressed file), application/atom + xml (Atom XML document).

These data types are collectively called MIME types. Each value includes a primary type and a secondary type, separated by a slash. In addition to predefined types, manufacturers can also define custom types, such as application/vnd.debian.binary - package, indicating that the data sent is a binary data package of the Debian system. MIME types can also add parameters at the end using a semicolon. For example, Content - Type: text/html; charset=utf - 8 indicates that the data sent is a web page and the encoding is UTF - 8. When the client makes a request, it can use the Accept field to declare the data formats it can accept, such as Accept: */* indicating that the client can accept any format of data. MIME types are not only used in the HTTP protocol but also widely applied in other fields. For example, in an HTML web page, the encoding and content type of the page can be specified through <meta http - equiv="Content - Type" content="text/html; charset=UTF - 8" /> or <meta charset="utf - 8" />.

2.5 Content — Encoding Field

Since the data sent can be in any format, to improve transmission efficiency, the data can be compressed before sending. The Content — Encoding field is used to indicate the compression method of the data. Common values are gzip, compress, deflate. When the client makes a request, it uses the Accept - Encoding field to indicate the compression methods it can accept, such as Accept - Encoding: gzip, deflate.

2.6 Disadvantages

The main disadvantage of the HTTP/1.0 version is that each TCP connection can only send one request. After the data is sent, the connection will be closed. If other resources need to be requested, a new connection must be established. The cost of establishing a new TCP connection is relatively high, as the client and server need to perform a three — way handshake, and the initial transmission rate is slow (slow start), resulting in poor performance of the HTTP 1.0 version. As more and more external resources are loaded on web pages, this problem has become more prominent. To solve this problem, some browsers used a non — standard Connection field in the request:

Connection: keep - alive

This field requires the server not to close the TCP connection so that other requests can be reused. If the server also responds with this field, a reusable TCP connection can be established until the client or server actively closes the connection. However, this is not a standard field, and the behavior of different implementations may vary, so it is not a fundamental solution.

III. HTTP/1.1: A Classic Version Widely Used

In January 1997, the HTTP/1.1 version was released, only half a year later than the 1.0 version. It further improved the HTTP protocol and became a version that is still widely used today.

3.1 Persistent Connection

The biggest change in the HTTP/1.1 version is the introduction of the persistent connection, that is, the TCP connection is not closed by default and can be reused by multiple requests without the need to declare Connection: keep - alive. The client and server can actively close the connection when they find that the other party has been inactive for a while. However, the standard practice is that the client sends Connection: close in the last request, explicitly asking the server to close the TCP connection. Currently, most browsers allow the establishment of 6 persistent connections for the same domain name, which greatly improves the efficiency of the HTTP protocol.

3.2 Pipelining

The HTTP/1.1 version also introduced the pipelining mechanism. In the same TCP connection, the client can send multiple requests simultaneously. For example, if the client needs to request two resources, the previous approach was to send request A first in the same TCP connection, then wait for the server’s response, and send request B after receiving the response. The pipelining mechanism allows the browser to send request A and request B at the same time. Although the server still responds to request A first in sequence and then responds to request B after completion, this method further improves the efficiency of the HTTP protocol.

3.3 Content — Length Field

In the HTTP/1.1 version, a TCP connection can transmit multiple responses. Therefore, a mechanism is needed to distinguish which response a data packet belongs to. The function of the Content — Length field is to declare the length of the data in this response. For example:

Content - Length: 3495

This line of code tells the browser that the length of this response is 3495 bytes, and the subsequent bytes belong to the next response. In the 1.0 version, the Content — Length field was not required because the browser knew that all the data packets had been received when it detected that the server had closed the TCP connection. In the 1.1 version, since the TCP connection can be reused, it is necessary to clarify the data length to distinguish different responses.

3.4 Chunked Transfer Encoding

The premise of using the Content — Length field is that the server must know the length of the response data before sending the response. For some time — consuming dynamic operations, this means that the server has to wait until all operations are completed before sending the data, resulting in low efficiency. To solve this problem, the HTTP/1.1 version stipulates that the Content — Length field can be not used, and “chunked transfer encoding” can be used instead. As long as the request or response header has a Transfer — Encoding field, it indicates that the response will consist of an undetermined number of data chunks. For example:

Transfer - Encoding: chunked

Before each non — empty data chunk, there will be a hexadecimal value indicating the length of this chunk. Finally, a chunk of size 0 indicates that the data in this response has been completely sent. The following is an example:

HTTP/1.1 200 OK
Content - Type: text/plain
Transfer - Encoding: chunked

25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0

This method allows the server to send data as soon as it is generated, replacing the “buffer mode” with the “stream mode”, improving the efficiency of data transmission.

3.5 Other Functions

The HTTP/1.1 version also added many verb methods, such as PUT (used to update resources), PATCH (used to partially update resources), HEAD (similar to GET, but only returns header information, not the resource content), OPTIONS (used to obtain information such as the request methods supported by the server), DELETE (used to delete resources). In addition, the client — side request header added a Host field, which is used to specify the domain name of the server. For example:

Host: www.example.com

With the Host field, requests can be sent to different websites on the same server, laying the foundation for the rise of virtual hosts. Through the Host field, the server can provide different services according to different domain names, achieving resource sharing and isolation.

3.6 Disadvantages

Although the HTTP/1.1 version allows the reuse of TCP connections, in the same TCP connection, all data communications are carried out in sequence. The server will only process the next response after completing one response. If the previous response is particularly slow, many requests will queue up and wait, which is called “Head — of — line blocking”. To avoid this problem, there are currently only two methods: one is to reduce the number of requests, and the other is to open multiple persistent connections simultaneously. This has led to the emergence of many web page optimization techniques, such as merging scripts and style sheets, embedding images in CSS code, and domain sharding. However, if the HTTP protocol was designed better, these additional efforts could be avoided.

IV. SPDY Protocol: The Precursor of HTTP/2

In 2009, Google publicly released its self — developed SPDY protocol, mainly aiming to solve the problem of low efficiency of HTTP/1.1. After being proven feasible in the Chrome browser, the SPDY protocol was used as the basis for HTTP/2, and its main features were inherited in HTTP/2. The SPDY protocol improved the efficiency of data transmission through optimizations of the HTTP protocol, such as compressing header information and multiplexing, providing valuable experience and technical foundation for the development of HTTP/2.

V. HTTP/2: The Efficient Next — Generation Protocol

In 2015, HTTP/2 was released. It is not called HTTP/2.0 because the standards committee does not plan to release sub — versions anymore. The next new version will be HTTP/3. HTTP/2 is of great significance in the development history of the HTTP protocol and has brought a series of remarkable improvements.

5.1 Binary Protocol

The header information in the HTTP/1.1 version is text (ASCII — encoded), and the data body can be text or binary. However, HTTP/2 is a completely binary protocol. Both the header information and the data body are binary and are collectively called “frames”, including header frames and data frames. The advantage of the binary protocol is that additional frames can be defined. HTTP/2 has defined nearly ten frames, laying a good foundation for future advanced applications. If these functions were implemented using text, parsing the data would be very troublesome, while binary parsing is much more convenient. The binary protocol can transmit and process data more efficiently, improving the performance and flexibility of the protocol.

5.2 Multiplexing

HTTP/2 reuses the TCP connection. In one connection, both the client and the browser can send multiple requests or responses simultaneously, and they do not need to correspond one — by — one in order, thus avoiding “Head — of — line blocking”. For example, in a TCP connection, the server receives request A and request B at the same time. So it first responds to request A. If it finds that the processing process is very time — consuming, it will send the part of request A that has been processed, then respond to request B. After completion, it will send the remaining part of request A. This two — way and real — time communication is called multiplexing. Multiplexing technology enables HTTP/2 to handle multiple requests and responses simultaneously on the same connection, greatly improving the transmission efficiency.

5.3 Data Streams

Since the data packets in HTTP/2 are sent out of order, consecutive data packets in the same connection may belong to different responses. Therefore, it is necessary to mark the data packets to indicate which response they belong to. HTTP/2 calls all the data packets of each request or response a data stream. Each data stream has a unique number. When sending data packets, the data stream ID must be marked to distinguish which data stream it belongs to. In addition, it is stipulated that the data streams sent by the client have odd — numbered IDs, and those sent by the server have even — numbered IDs. When a data stream is being sent halfway, both the client and the server can send a signal (RST_STREAM frame) to cancel this data stream. In the HTTP/1.1 version, the only way to cancel a data stream is to close the TCP connection. However, in HTTP/2, a certain request can be cancelled while ensuring that the TCP connection is still open and can be used by other requests. The client can also specify the priority of the data stream. The higher the priority, the earlier the server will respond. Through the concept of data streams and related mechanisms, HTTP/2 achieves more flexible and efficient data transmission and management.

5.4 Header Compression

The HTTP protocol is stateless, and all information must be attached to each request. Therefore, many fields in the request are repeated, such as Cookie and User — Agent. The same content must be attached to each request, which wastes a lot of bandwidth and affects the speed. HTTP/2 has optimized this by introducing a header compression mechanism. On the one hand, the header information is compressed using gzip or compress before being sent. On the other hand, both the client and the server maintain a header information table. All fields are stored in this table, generating an index number. In the future, the same field will not be sent again, only the index number will be sent, thus improving the speed. The header compression mechanism effectively reduces the amount of data transmission and improves the transmission efficiency.

5.5 Server Push

HTTP/2 allows the server to actively send resources to the client without a request, which is called server push. A common scenario is when a client requests a web page that contains many static resources. Under normal circumstances, the client must receive the web page, parse the HTML source code, discover the static resources, and then send requests for the static resources. However, the server can anticipate that after the client requests the web page, it is likely to request the static resources, so it actively sends these static resources to the client along with the web page. The server push technology reduces the number of client requests and improves the user experience.

The development history of the HTTP protocol is a process of continuous evolution and optimization. From the initially simple HTTP/0.9 to the feature — rich HTTP/1.1, and then to the efficient HTTP/2, each version has addressed the problems of the previous version and enhanced performance and functionality. With the continuous development of technology, the future HTTP/3 is also worth looking forward to, as it will continue to drive the progress of Internet communication technology.