Sometimes you want to have a globally accessible function like get_current_user()
that somehow magically stores the current user for the duration of the request. How do you implement this? How is the user
stored?
Or maybe you want get_current_request_id()
or request_cache.add_to_cache(some_data)
or get_tenant()
.
We are surely not going to pass user
/request_id
as arguments to every function in our app.
So, how is this made?
Let’s learn from an example
Let’s build get_current_request_id()
as an example. I’m assuming we are talking about HTTP requests.
You can build all the other functions in the same way, even those that don’t have anything to do with HTTP requests.
First: we need to set the request_id
somewhere
This somewhere is going to be the middleware (or a middleware, one of them).
Middlewares are called at the beginning of every request. We want to set the request_id
at the beginning of every request and unset it after the request is done.
I don’t know what framework you are using, but most of them have a list of middleware classes/functions defined somewhere, usually in the settings.
In Django there is the setting MIDDLEWARE
. In Flask, you can use the before_request
decorator and the after_request
decorator.
Pseudocode (a Django example is at the end of the article):
import request_storage # <- our file
from uuid import uuid4
def my_middleware():
# 1. Build a unique identifier for the request
request_id = str(uuid4())
# 2. Store the request_id (we'll implement this below)
request_storage.set(request_id)
try:
... # this is where the request is processed
finally:
# 3. Unset the request_id after the request is done
# The unset must happen even if there is an exception, this is why
# finally is used and no except-block is needed.
request_storage.set(None)
Level 1: Only used in school, on a single-process app
I don’t know what it is with schools, but the ones I’ve seen always want to teach concepts, which are not used (anymore?) in the real world.
So… don’t use this in production 🫠. Or do, what do I know what you’re building. 🙂
Create a Singleton class and store the request_id
in it.
A Singleton is a class that can only have one instance in your app. It’s a design pattern. Trying to instantiate it 2x just returns the same instance both times.
Very simple example:
class RequestStorage:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._instance.request_id = None
return cls._instance
def set(self, request_id):
self.request_id = request_id
def get(self):
return self.request_id
We can’t use this in production, because it’s not thread-safe.
Once you have 2 threads, they will both execute code at the same time, but they will each handle the request of a different user. They will both call RequestStorage().get()
and get the SAME request_id
. We don’t want this. We want a Singleton PER thread.
Level 2: Multi-threading apps can use threading.local()
Pyhon’s module threading
handles all sorts of threading-related stuff. One of those things is handling data that is only visible to the current thread. That’s exactly what we are after.
Thread-local objects support the management of thread-local data. If you have data that you want to be local to a thread, simply create a thread-local object and use its attributes:
mydata = local() mydata.number = 42 mydata.number 42
- Excerpt from the docstring of threading.local() in CPython[1]
In our case we’d have something like so in request_storage.py
:
import threading
_threading = threading.local()
def get() -> str | None:
return getattr(_threading, "request_id", None)
def set(request_id: str | None): # pylint: disable=redefined-builtin
_threading.request_id = request_id
I didn’t bother with a class in this case, because we are storing just 1 string. But, you can easily store more complex data if you want.
Level 3: I have asyncio code, threading.local()
doesn’t work
Once you have some asyncio code, you need contextvars.ContextVar
.
We introduced asyncio code to a part of our codebase, because we were doing a bunch of HTTP requests to other APIs and were mostly just waiting for them to respond.
threading.local()
doesn’t work with asyncio, because asyncio runs on a single thread. It instead uses coroutines and tasks. You can read more about them in Python docs.
Our example would look like this in request_storage.py
:
import contextvars
_request_id: contextvars.ContextVar[str | None] = contextvars.ContextVar("request_id")
def get() -> str | None:
return _request_id.get(None)
def set(request_id: str | None): # pylint: disable=redefined-builtin
_request_id.set(request_id)
These ContextVar
work great for 2 reasons:
- if their value is modified inside a coroutine, the modification is only visible to that coroutine
- if we set the value from outside asyncio code, then the value is visible to all coroutines that run after that => we don’t have to manually pass
request_id
to asyncio-functions
What about concurrent.ThreadPoolExecutor
?
This is tricky.
Both the following statements are true:
ThreadPoolExecutor
uses threads, so go withthreading.local()
.ThreadPoolExecutor
re-uses threads, so if 1 thread sets its own value inthreading.local()
and this thread is then re-used (because the number of workers is less than the number of tasks), then the second task will see thethreading
value set by the first task.
So… you have about the following options:
- you know your concurrent code doesn’t set any values in
threading.local()
=> just usethreading.local()
- use
contextvars.ContextVar
=> you need to callcontextvars.copy_context()
on thread-init
ThreadPoolExecutor
doesn’t know about contextvars
. By default, it doesn’t see the values set to ContextVar
objects.
We can fix this by calling contextvars.copy_context()
at the beginning of every thread. And ThreadPoolExecutor
has an init-argument for this logic.
An example of how to copy the context:
import contextvars
from concurrent.futures import ThreadPoolExecutor
def _set_contextvars(context: contextvars.Context):
# This is called first thing before a thread starts running your code
for var, value in context.items():
var.set(value)
parent_context = contextvars.copy_context() # this stores the values before any thread has started
num_of_workers = 5
with ThreadPoolExecutor(num_of_workers, initializer=_set_contextvars, initargs=(parent_context,)) as executor:
...
Bonus: Using contextmanager
, not a middleware
I’ve mostly talked about HTTP requests, but really they have nothing to do with global state. They just provide an easy-to-understand example.
Another common pattern for setting “global state” is with contextmanager
. You set a variable as you __enter__
and unset it as you __exit__
.
Here’s a silly example:
from contextlib import contextmanager
import user_storage # just like request_storage, but for current user
@contextmanager
def impersonate_user(user: User):
"""For the duration of this contextmanager, let's change the current_user to user"""
old_user = user_storage.get()
try:
user_storage.set(user)
yield
finally:
user_storage.set(old_user)
Bonus 2: Django example with middleware
All together now and just for Django and just the request_id
example:
#
# File: myapp/middleware.py
#
import request_storage
from uuid import uuid4
class RequestIdMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
request_id = str(uuid4())
request_storage.set(request_id)
try:
return self.get_response(request)
finally:
# Unset the request_id after the request is done
request_storage.set(None)
# ------------------------------------
#
# File: request_storage.py
#
import contextvars
_request_id: contextvars.ContextVar[str | None] = contextvars.ContextVar("request_id")
def get() -> str | None:
return _request_id.get(None)
def set(request_id: str | None): # pylint: disable=redefined-builtin
_request_id.set(request_id)
def get_current_request_id() -> str | None:
return get()
# ------------------------------------
#
# File: anywhere
#
from request_storage import get_current_request_id
request_id = get_current_request_id()