What do a good portion of database transactions do? They fail. But this is hardly surprising.
The whole reason we put the code into a transaction is because we expected to encounter some failures.
DB transactions give us the ability to handle failures gracefully and ensure our DB isn’t left in an inconsistent state. Wrapping them around some code is a crucial first step towards this goal. But did you know there is a second step in this process?
There are actually at least 4 possible second steps:
user_id
/ request_id
/ tenant_id
Sometimes you want to have a globally accessible function like get_current_user()
that somehow magically stores the current user for the duration of the request. How do you implement this? How is the user
stored?
Or maybe you want get_current_request_id()
or request_cache.add_to_cache(some_data)
or get_tenant()
.
We are surely not going to pass user
/request_id
as arguments to every function in our app.
So, how is this made?
The next item in my collection of “Aren’t we all just constantly re-creating the same bits of code?” is how to track the execution time of Celery tasks.
Firstly, it could be argued that there are 2 different “exec” times for every Celery task:
The reason both are important is: that our real motivation is understanding when the thing is done.
Aren’t we all just constantly re-creating the same bits of code?
This goes beyond boilerplate code. We are adding the same bits of code to every project, the same git shortcuts, the same logs formatters, the same permissions decorators, …
Here I’ve started putting together a personal collection of building blocks.
And I’m starting with: Code that isolates (insulates) code blocks.
Our code is so interdependent! Adding dependencies to projects is just how code is done.
Every time somebody sneezes a new dependency is added to a project somewhere.
It is kinda great that we are on this side of the copy-right debate, where the default is building things on top of other people’s work, and not shooting at everybody, who dares to even look in the direction of my code. But… who is going to upgrade all these dependencies all the time?
In the perfect world, we are a community and we all keep the libs upgraded to the latest versions, right?
Wrong! Devs need to be softly pushed to do the right thing. Yes, softly, so they don’t notice they are being pushed 😉🚧🚧🚧.
When you need it: when you want to lock the DB, but don’t have the row to lock yet. As in: you want to create a row.
… the other option would be to lock the whole table, but that only makes sense if you have like… 5 users… and the 4 don’t mind waiting for the 5th before they can do anything.. which is for sure not our case, cus we have a hugely popular app! Yes. 🙃
What it does: The locks can guarantee that only 1 thread at a time executes some specific code. As in: only 1 thread updates the bank account balance at one time.
The “can” is in italics (and underscored, to really make it stand out), because of what is written below: you have to check the locks manually. Postgres won’t do it for you.
This means the locks are only as good as you are 😉.
git rebaseToMaster
There’s too many people on my project. 😅
I’m not complaining, but I did have to rebase
my code to master
a LOT.
And it’s always the same 5 lines (or so), so I wrote myself a little shortcut and transformed it into a git “alias”. 🍰
Here’s the thing, as far as I can tell lots of (most?) web developers know shockingly little about the HTTP headers or the HTTP standard as a whole.
I mean, I get it, most education curricula don’t include much HTTP-protocol-related information (mine certainly didn’t) and who sits at home on an idle Sunday morning and says to herself: “You know what? I’m going to pamper myself today by reading the HTTP standard.” Nobody. I know.
But, … fact is, for a web developer, which many, many of us are, not having some understanding of the HTTP standard is a glaring gap in our knowledge. So, let’s fix this problem.
Here’s the problem: I have 1 timeline, but 2 types of events appear on this timeline. I want to use events of type 1 as borders in my timeline, these borders will separate events of type 2 into groups.
For example: I want to know which bird sings first every morning after sunrise. I have a list of sunrises for the last month and a list of singing events. How do I get the result without doing a 2-level nested loop?
At the moment, the IT industry is considered a pretty prestigious profession. It attracts lots and lots of people because the pay is good, the working conditions are good and the whole society is in awe of the masterful things programmers produce.
Potential workers are trying join the IT train in droves. So many have been joining us that a third of working developers only have <= 4 years of work experience (according to Stack Overflow surveys).
This could have been a golden opportunity for IT companies to test all kinds of interview approaches on a vast number of people and develop a data-backed super approach. One that could finally tell us what makes a good (co-)worker.
Instead, we are willingly standing still, voluntarily propping up this broken and deformed interview process. Why do we do this?
If we mostly agree that the interview process does not produce dreamy IT teams, then why are we sticking with this process?
The answer is simple: it produces “proper” IT teams. It does not create great teams that deliver great products or services or are pleasant to work with or are innovative or are prolific. But it does create teams that look as if they are good IT teams.
We are mostly concerned with copy-pasting what we consciously or subconsciously believe is a good IT team: young, white, male, dressed in hoodies, socially awkward, having gone to the same universities we went to, having the same beer preferences we have, have had the same life experiences that we had, … .
GROUP BY
together with MAX
/MIN
Very recently I stumbled upon a new and curious solution to a very minor, but very annoying problem that I occasionally bump into with PostgreSQL. Admittedly, a perfectly adequate solution already exists for this problem and Postgres’s limitations of the GROUP BY
-logic, which are causing this problem make perfect sense to me and I support them fully. But (and doesn’t every rule always trigger a “but can you make an exception this time”), I never liked that solution, because it is so verbose and difficult to read. Is there a better way?
You will never be able to accurately estimate the time and effort it will take you to build a piece of software .. for as long as you keep doing new things.
As long as your projects don’t resemble each other as your signatures do, you will not know how much effort it will take to finish a new project.
A lot has been said and written about software estimation, but one thing is clear: predictions of future effort are always based on the amounts of past effort.
Celery never retries your tasks, unless you tell it to. Here is how you can tell her to:
Have you ever heard of the continuum of theory-before-practice VS. practice-before-theory? Probably not, since I created the name just now 😏. But, though the name is new, the continuum is old. The question is simple: should I first study, study, study the documentation and then only after I presumably fully understand the library and its logic start using it in my code, or should I first dive into it, use it and abuse it before going back and reading the documentation of it.
“My time now is 16:44, what is that in UTC time?” is what I asked myself last week. “Do we use UTC dates everywhere with no timezone information or do we store timezones into the DB?” was another question. “And by the way, if I call datetime.now()
, am I getting the correct time or should I adjust it with a timezone suffix?”. This time-business is so delicate. I never had much trouble with date-times, until last week, when I had to create a humble Celery task that needs to be a master of time, that needs to understand how all the dates on all these objects relate to one another. And suddenly I am finding myself appalled by all datetimes with no timezone info. What a grievous mistake it was to allow programmers to forget about timezones, to just call now()
and hope for the best. How is my super-duper, time-lord of a Celery task supposed to lord the time if a single measly timezone-lacking datetime knocks it out of its balance?
I always wanted to have this. The cool part of me, of course, wanted me to be the one who writes it, the pragmatic part just wanted to have access to a list like this and the hedonic part of me made me ignore the whole topic by telling me to chase after greater pleasures of life, at least greater than this blog post, no matter how magnificent it might eventually become, could ever be. But here I am, some years later, in the wrath of the epidemic lockdown, re-running Python tests in an infinite loop until I figure out which nobs and settings of this mock
library I have to turn and set to get it to mock the damn remote calls.
Nginx is “a web server which can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache”. uWSGI is an implementation of the WSGI spec, which describes how a web server should communicate with a web app, which makes uWSGI also a type of web server. So, why does a typical server setup for a Python app consists of 2 web servers?
The first person to walk on the Moon was Neil Armstrong. I knew this before I knew how to speak English or knew how to spell his name. Why did the 8-year-old me know this? Why do we need a frontman for everything? And how do you build a team in such a culture?
Today, I came to 2 realizations, both of them surprising and both of them essential. During an innocuous debate about code review, I suddenly discovered that only a few basic ideas underlie all of my coding-related decisions. One of them is: simple is better than complex. And about 3 seconds later, I realized that this is neither a well-known mantra nor one that can be quickly explained. It’s a conviction that you grow into. But without having to wait for a few years, how do I explain it to my teammate?
Code readability is a hot topic. We do not agree on what it looks like and we do not agree on how much of it is needed. It is rarely discussed on a new project and practically never on a project with only 1 developer.
When did you first learn about Vim? Where you one of those unfortunate souls, who just wanted to run a git command, but ended up trapped in vim? Many of us have been there. Ending up in vim accidentally is like being accidentally teleported to an unknown planet. How are you supposed to google your way out if you do not know where you even are?
I must concede, spying on a network (and everything and everybody on it) is just candid fun. Imagine, silently typing on your keyboard, exploring a network, examining what things are there and what they are up to. How would this not be equally intriguing as reading a mystery novel?
Every project inevitably gravitates towards messy code. As long as you are adding features to a project, you can be pretty sure its total “messiness” is not decreasing. Developers are notoriously bad at removing features, there is only ever going to be more logic and in Python, this means more dictionaries and more tuples being passed around. While we may not be able to win in the long run, we surely can fight it for a long time.
Because it is written in a boring, counterproductive style. But why is that? Because it copied the style of academic papers. But why are academic papers dry and boring? Because most schools have since forever talked to their pupils in a dry and boring manner. But why is that? Because until recently human society has been managed in an authoritarian manner. Everybody knew exactly who was above them and who was below them, who they have to obey and who they can give orders to. For the most part, fun was reserved for the afterlife and the rich. But come-on! we can do better, here are a few pointers.
The only times, I feel like a hacker, a movie hacker, is when I suddenly notice that I’ve been writing git commands for “I don’t know how long”. Git is so simple and so easy to use and so elegant and I wouldn’t call it too verbose at all and still, I am regularly caught up in between git commands.
It turns out that a significant part of our work as developers is to write stuff down. Or to say it more eloquently, to outline, recount and illustrate our deliberations in writing. Given that English is not my first language I might be forgiven for lacking in expressive prowess, but everybody is sometimes just tired, annoyed or drained. Sometimes our literary capacities are narrowed down to only simple words, like “bad”, “good”, “ugly”.
Part 1: Scale your system Part 1: From 1 to 99 999 users
We are onto the 2nd part. How do we continue scaling from 100 000 users onwards?
In this series of posts, I wanted to create a list of stages, possible designs of a system as it caters to different-sized audiences. What is the minimum setup for a system if it has but 1 user a day and how it progresses towards a system, which serves 500M users per day.
Lately, this has become a common interview question: How many Bytes will some hypothetical app probably need? When in fact, planning the data storage capacity is usually a complicated and time-consuming operation. So how do you present your case in 45 minutes?
Consistent hashing is a strategy most notably used by distributed databases for determining to which slot
a key
belongs. Its main advantage is that if a new slot
needs to be added, only K/n
objects need to be moved (K
=number of all keys, n
=the number of slots
). And this means adding and removing slots is relatively inexpensive.