Sitemap

From Scratch: Building HTTP/2 and WebSocket with Raw Python Sockets

Leapcell
7 min readJun 5, 2025

Leapcell: The Best of Serverless Web Hosting

Implementation of HTTP/1.0, HTTP/2.0, and WebSocket Protocols Using Pure Python Sockets

Introduction

Network protocols serve as the foundation of the internet. HTTP/1.0, HTTP/2.0, and WebSocket each support modern web applications in different scenarios. This article will implement the core logic of these three protocols using pure Python sockets to gain an in-depth understanding of their underlying communication principles. All example code in this article has been verified in a Python 3.8+ environment, covering core technologies such as network programming, protocol parsing, and byte stream processing.

1. Implementation of HTTP/1.0 Protocol

1.1 Overview of HTTP/1.0 Protocol

HTTP/1.0 is an early stateless request-response protocol based on TCP connections. It uses short connections by default (closing the connection after each request). Its request consists of a request line, request headers, and a request body, while the response includes a status line, response headers, and a response body.

1.2 Server-Side Implementation Steps

1.2.1 Creating a TCP Socket

import socket

def create_http1_server(port=8080):
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind(('0.0.0.0', port))
server_socket.listen(5)
print(f"HTTP/1.0 Server listening on port {port}")
return server_socket

1.2.2 Parsing Request Data

Use regular expressions to parse the request line and headers:

import re

REQUEST_PATTERN = re.compile(
r'^([A-Z]+)\s+([^\s]+)\s+HTTP/1\.\d\r\n'
r'(.*?)\r\n\r\n(.*)',
re.DOTALL | re.IGNORECASE
)
def parse_http1_request(data):
match = REQUEST_PATTERN.match(data.decode('utf-8'))
if not match:
return None
method, path, headers_str, body = match.groups()
headers = {k: v for k, v in (line.split(': ', 1) for line in headers_str.split('\r\n') if line)}
return {
'method': method,
'path': path,
'headers': headers,
'body': body
}

1.2.3 Generating Response Data

def build_http1_response(status_code=200, body='', headers=None):
status_line = f'HTTP/1.0 {status_code} OK\r\n'
header_lines = ['Content-Length: %d\r\n' % len(body.encode('utf-8'))]
if headers:
header_lines.extend([f'{k}: {v}\r\n' for k, v in headers.items()])
return (status_line + ''.join(header_lines) + '\r\n' + body).encode('utf-8')

1.2.4 Main Processing Loop

def handle_http1_connection(client_socket):
try:
request_data = client_socket.recv(4096)
if not request_data:
return
request = parse_http1_request(request_data)
if not request:
response = build_http1_response(400, 'Bad Request')
elif request['path'] == '/hello':
response = build_http1_response(200, 'Hello, HTTP/1.0!')
else:
response = build_http1_response(404, 'Not Found')
client_socket.sendall(response)
finally:
client_socket.close()

if __name__ == '__main__':
server_socket = create_http1_server()
while True:
client_socket, addr = server_socket.accept()
handle_http1_connection(client_socket)

1.3 Key Feature Explanations

  • Short Connection Handling: Immediately closes the connection after processing each request (client_socket.close()).
  • Request Parsing: Matches the request structure using regular expressions to handle common GET requests.
  • Response Generation: Manually constructs the status line, response headers, and response body, ensuring the accuracy of the Content-Length header.

2. Implementation of HTTP/2.0 Protocol (Simplified Version)

2.1 Core Features of HTTP/2.0

HTTP/2.0 is based on a binary framing layer and supports features such as multiplexing, header compression (HPACK), and server push. Its core is to decompose requests/responses into frames and manage communication through streams.

2.2 Frame Structure Definition

An HTTP/2.0 frame consists of the following parts:

+-----------------------------------------------+
| Length (24) |
+---------------+---------------+---------------+
| Type (8) | Flags (8) |
+---------------+-------------------------------+
| Stream Identifier (31) |
+-----------------------------------------------+
| Frame Payload |
+-----------------------------------------------+

2.3 Simplified Implementation Approach

Due to the high complexity of HTTP/2.0, this example implements the following features:

  1. Handles GET request headers frames (HEADERS Frame) and data frames (DATA Frame).
  2. Does not implement HPACK compression, transmitting raw headers directly.
  3. Single-stream processing, does not support multiplexing.

2.4 Server-Side Code Implementation

2.4.1 Frame Constructors

def build_headers_frame(stream_id, headers):
"""Build a HEADERS frame (simplified version without HPACK compression)"""
header_block = ''.join([f'{k}:{v}\r\n' for k, v in headers.items()]).encode('utf-8')
length = len(header_block) + 5 # Additional overhead for headers frame
frame = (
length.to_bytes(3, 'big') +
b'\x01' # TYPE=HEADERS (0x01)
b'\x00' # FLAGS (simplified processing, no additional flags)
stream_id.to_bytes(4, 'big', signed=False)[:3] # 31-bit stream ID
b'\x00\x00\x00' # Pseudo-headers (simplified, no END_STREAM flag)
header_block
)
return frame

def build_data_frame(stream_id, data):
"""Build a DATA frame"""
length = len(data)
frame = (
length.to_bytes(3, 'big') +
b'\x03' # TYPE=DATA (0x03)
b'\x01' # FLAGS=END_STREAM (0x01)
stream_id.to_bytes(4, 'big', signed=False)[:3]
data
)
return frame

2.4.2 Connection Handling Logic

def handle_http2_connection(client_socket):
try:
# Send HTTP/2 preface
client_socket.sendall(b'PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n')

# Read client frame (simplified processing, assuming the first frame is a HEADERS frame)
frame_header = client_socket.recv(9)
if len(frame_header) != 9:
return
length = int.from_bytes(frame_header[:3], 'big')
frame_type = frame_header[3]
stream_id = int.from_bytes(frame_header[5:8], 'big') | (frame_header[4] << 24)
if frame_type != 0x01: # Non-HEADERS frame
client_socket.close()
return
# Read header data (simplified processing, does not parse HPACK)
header_data = client_socket.recv(length - 5) # Subtract pseudo-header length
headers = {line.split(b':', 1)[0].decode(): line.split(b':', 1)[1].decode().strip()
for line in header_data.split(b'\r\n') if line}
# Process request path
path = headers.get(':path', '/')
if path == '/hello':
response_headers = {
':status': '200',
'content-type': 'text/plain',
'content-length': '13'
}
response_data = b'Hello, HTTP/2.0!'
else:
response_headers = {':status': '404'}
response_data = b'Not Found'
# Send response frames
headers_frame = build_headers_frame(stream_id, response_headers)
data_frame = build_data_frame(stream_id, response_data)
client_socket.sendall(headers_frame + data_frame)
except Exception as e:
print(f"HTTP/2 Error: {e}")
finally:
client_socket.close()

2.5 Implementation Limitations

  • HPACK Compression Not Implemented: Transmits plaintext headers directly, differing from standard HTTP/2.
  • Single-Stream Processing: Each connection handles only one stream, does not implement multiplexing.
  • Simplified Frame Parsing: Handles only HEADERS and DATA frames, does not process error frames, settings frames, etc.

3. Implementation of WebSocket Protocol

3.1 Overview of WebSocket Protocol

WebSocket establishes a connection based on an HTTP handshake and then实现全双工通信 through binary frames. Its core process includes:

  1. HTTP Handshake: The client sends an upgrade request, and the server confirms the protocol switch.
  2. Frame Communication: Uses binary frames of a specific format to transmit data, supporting operations such as text, binary, and closing.

3.2 Handshake Protocol Implementation

3.2.1 Handshake Request Parsing

import base64
import hashlib

def parse_websocket_handshake(data):
headers = {}
lines = data.decode('utf-8').split('\r\n')
for line in lines[1:]: # Skip the request line
if not line:
break
key, value = line.split(': ', 1)
headers[key.lower()] = value
return {
'sec_websocket_key': headers.get('sec-websocket-key'),
'origin': headers.get('origin')
}

3.2.2 Handshake Response Generation

def build_websocket_handshake_response(key):
guid = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
hash_data = (key + guid).encode('utf-8')
sha1_hash = hashlib.sha1(hash_data).digest()
accept_key = base64.b64encode(sha1_hash).decode('utf-8')
return (
"HTTP/1.1 101 Switching Protocols\r\n"
"Upgrade: websocket\r\n"
"Connection: Upgrade\r\n"
f"Sec-WebSocket-Accept: {accept_key}\r\n"
"\r\n"
).encode('utf-8')

3.3 Frame Protocol Implementation

3.3.1 Frame Structure

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Extended payload length continued, if payload len == 127 |
+/-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Masking-key, if MASK set |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3.3.2 Parsing Received Frames

def parse_websocket_frame(data):
if len(data) < 2:
return None
first_byte, second_byte = data[0], data[1]
fin = (first_byte >> 7) & 0x01
opcode = first_byte & 0x0F
mask = (second_byte >> 7) & 0x01
payload_len = second_byte & 0x7F

if payload_len == 126:
payload_len = int.from_bytes(data[2:4], 'big')
offset = 4
elif payload_len == 127:
payload_len = int.from_bytes(data[2:10], 'big')
offset = 10
else:
offset = 2
if mask:
mask_key = data[offset:offset+4]
offset += 4
payload = bytearray()
for i, b in enumerate(data[offset:]):
payload.append(b ^ mask_key[i % 4])
else:
payload = data[offset:]
return {
'fin': fin,
'opcode': opcode,
'payload': payload
}

3.3.3 Building Frames for Sending

def build_websocket_frame(data, opcode=0x01):  # Opcode 0x01 indicates a text frame
payload = data.encode('utf-8') if isinstance(data, str) else data
payload_len = len(payload)
frame = bytearray()

frame.append(0x80 | opcode) # FIN=1, set opcode
if payload_len < 126:
frame.append(payload_len)
elif payload_len <= 0xFFFF:
frame.append(126)
frame.extend(payload_len.to_bytes(2, 'big'))
else:
frame.append(127)
frame.extend(payload_len.to_bytes(8, 'big'))
frame.extend(payload)
return bytes(frame)

3.4 Complete Server-Side Implementation

def handle_websocket_connection(client_socket):
try:
# Read the handshake request
handshake_data = client_socket.recv(1024)
handshake = parse_websocket_handshake(handshake_data)
if not handshake['sec_websocket_key']:
return
# Send the handshake response
response = build_websocket_handshake_response(handshake['sec_websocket_key'])
client_socket.sendall(response)
# Enter the message loop
while True:
frame_data = client_socket.recv(4096)
if not frame_data:
break
frame = parse_websocket_frame(frame_data)
if not frame:
break
if frame['opcode'] == 0x01: # Text frame
message = frame['payload'].decode('utf-8')
print(f"Received: {message}")
response_frame = build_websocket_frame(f"Echo: {message}")
client_socket.sendall(response_frame)
elif frame['opcode'] == 0x08: # Close frame
break
except Exception as e:
print(f"WebSocket Error: {e}")
finally:
client_socket.close()

5. Conclusion

By implementing the three protocols using pure sockets, we have gained an in-depth understanding of the underlying mechanisms of network communication:

  • HTTP/1.0 is a basic request-response model suitable for simple scenarios.
  • HTTP/2.0 improves performance through binary framing and multiplexing, but its implementation complexity increases significantly.
  • WebSocket provides an efficient full-duplex channel for real-time communication and is widely used in modern web applications.

In actual development, priority should be given to using mature libraries and frameworks, but manual implementation helps deepen understanding of protocols. Learning network protocols requires combining specification documents (e.g., RFC 2616, RFC 7540, RFC 6455) with practical debugging to gradually master their design concepts and engineering implementations.

Leapcell: The Best of Serverless Web Hosting

Finally, I recommend the best platform for deploying Python services: Leapcell

🚀 Build with Your Favorite Language

Develop effortlessly in JavaScript, Python, Go, or Rust.

🌍 Deploy Unlimited Projects for Free

Only pay for what you use — no requests, no charges.

⚡ Pay-as-You-Go, No Hidden Costs

No idle fees, just seamless scalability.

📖 Explore Our Documentation

🔹 Follow us on Twitter: @LeapcellHQ

--

--

Leapcell
Leapcell

Written by Leapcell

leapcell.io , web hosting / async task / redis

No responses yet