Synchronous And Asynchronous - Beyond The Basic
Understanding Synchronous and Asynchronous Workload in an Easy and Fun Way
Introduction
If you're in programming, you've likely heard about synchronous and asynchronous terms very often. We understand the basics of synchronous - it's the step-by-step execution of processes, right? But do we wonder what occurs deep within the OS level? How does the OS handle tasks synchronously or asynchronously? Or what exactly does asynchronous entail beyond its basic understanding? Interesting? Let's dive in.
Before we delve into the technical specifics, let's revisit what synchronous and asynchronous mean based on our fundamental understanding. Let's consider a real-life example. Imagine this - You've requested money from your mother to buy a new smartphone. In technical terms, you are the caller, and your mother is the responder - remember this as we'll refer to these terms often. Now, back to the example. When you request money from your mother, you're essentially waiting for a response. You're unable to proceed to your next task, buying the smartphone, until you receive a response or the money from your mother, right? This exemplifies synchronous.
So, what's asynchronous then? Let's expand on the previous example. Picture this - You wake up and plan to take a bath, have breakfast, and head to the market to buy a smartphone, all while needing money from your mother. Initially, you ask your mother for the money and continue with your other tasks like taking a bath and having your meal, repeatedly reminding your mother of the money!
This illustrates an asynchronous task. You've requested a task but proceeded with other tasks without waiting for its full completion.
Enough with examples! Now, let's delve into the world of OS!
One Liner Understanding
We can define synchronicity by asking a simple question: Can I work while waiting?
If the answer is yes, the task might be asynchronous; otherwise, it's synchronous.
Synchronous is like a wave - the caller and the responder move synchronously. They are in sync. They act like a wave as waves move in a smooth and in sync.
Asynchronous is akin to a waterfall - no sync between the caller and the responder. There might be instances where the caller has executed and concluded their task long ago, and the responder is responding now! It's more like a waterfall - no sync between them.
Synchronous and Asynchronous In OS Level
How A Process Executes in the CPU and the Role of the OS
Before we explore synchronous execution in the OS, let's understand the CPU Context Switching
. The OS aims to use the CPU efficiently. What do I mean by efficient? It means the OS aims to ensure the CPU is fully utilized. To achieve this, the OS only allows an actively working task to enter the CPU execution queue. This facilitates the CPU switching between tasks, performing multiple tasks simultaneously, known as context switching. If a task isn't actively executing, the OS blocks
it until it has something to do. For tasks in its execution queue, the CPU processes only those, managed by the OS. This occurs primarily in I/O operations. With a basic understanding of context switching, let's delve deeper.
How A Program Becomes Synchronous
Consider writing a program needing a network call or data read from a file - these are I/O operations. When the program requests to read a file, the workflow occurs as follows:
Your program requests an I/O (The caller).
The programming language connects with the OS, forwarding the request.
The OS receives the I/O request, prompting the respective device controller/file controller (the responder) to carry out the I/O operation.
While the device controller performs its I/O operation, the OS removes the program thread from the CPU execution queue as the program is in an idle state. (However, it remains in memory/RAM as usual).
As the program thread is removed from the CPU queue, it can't execute any operations without computing resources. At this point, the remaining code waits for execution.
The OS replaces the program with another from memory in the CPU queue, continuing processing new tasks (Context Switching).
Once the I/O operation concludes, the OS returns the file to the program, reinserting it into the CPU execution queue. Upon CPU resumption, the remaining code waiting for the I/O operation executes. This synchronous execution forces the code to wait until the older task is completed.
How A Program Becomes Asynchronous
For asynchronous calls, the caller or program can continue other operations even after an I/O call.
This is achieved by a few mechanisms:
Programs might use
epoll
, continually querying the OS about operation completion - common in Linux-based systems like node.js.Another method is
io_uring
, prevalent in Windows-based systems.A clever approach to achieve asynchronicity in handling I/O is creating/assigning a separate worker thread for that I/O operation. When a program assigns a worker thread for an I/O task, it separates that task from the main thread's execution. This separation of tasks into different threads enables parallel processing. While the worker thread focuses on the I/O operation, the main thread carries on with its set of other instructions, not hindered or stalled by the ongoing I/O task. Once the I/O task is finished, the worker thread notifies the main thread(callback), enabling the program to operate asynchronously, with different threads managing tasks simultaneously.
Where Asynchronous Process is Used
One might think that asynchronous processes are only in programming. However, it extends beyond that. Let’s explore some scenarios where asynchronous processing is employed.
Asynchronous Programming:
As already discussed, our first example is asynchronous programming. Asynchronous programming involves utilizing async
and await
constructs in languages like Python and JavaScript to manage that might take time like I/O operation or a network call. Easy, right? we already know that.
Asynchronous Backend Processing:
The next asynchronous example is Asynchronous Backend Processing.
In asynchronous backend processing, a client requests a long waiting task, and the client expects a response from its request. The backend, as the task needs to be processed for a longer time to finish, puts the request in a queue, and immediately responds to the client with a job id.
Although the request was a long waiting task, the backend was able to return an immediate response to the client by queuing the request, and the backend now has enough time to process the request in the background. The backend may assign other worker threads to process the tasks one by one.
The client may send additional requests to the backend with the job id to query the current state of the task.
Frameworks like Celery
in Python is designed to such asynchronous processing, allowing for scalable and efficient task execution.
Asynchronous Commits in PostgreSQL:
In PostgreSQL, there is a concept of asynchronous commits.
In general, commits in PostgreSQL are synchronous i.e. PostgreSQL will block the caller who called the commit. Then PostgreSQL will try to write all the transactions of the WAL into the disk, and after the completion of all the writes, PostgreSQL will return a success message to the caller. This is a long process, and the caller has to wait until the database responds with a successful commit message. This is how PostgreSQL provides synchronous commits.
So, what are these asynchronous commits in PostgreSQL then? Well, when the caller calls a commit in PostgreSQL, the database will not block the caller, it will immediately return a response to the caller, and then it will try to write all the transactions of the WAL into the disk. And the caller may continue to perform its other operations.
But it has a trade-off. What if the transaction fails afterward? The caller has received a success message but, the commit has failed later, now all the reads will be a dirty read for other readings as these transactions are not successful. So, this is the cost for this asynchronous commit in PostgreSQL.
Asynchronous Replication Of Database:
If we have primary and replica databases, and all the changes and transaction commits need to be replicated into other replica databases, there are two ways to do it.
Synchronously:
Synchronously, when the caller commits, the main database will signal all the other replica databases to commit all the changes or transactions, and upon receiving a successful message from all the replica databases, or the majority of the replica databases' successful messages as per the database configuration, only then the main database will commit the transactions and pass a successful message to the caller. If the replica databases send an unsuccessful message, then the main database won’t commit the changes on its own, and pass an unsuccessful message to the caller.
- While the main database is waiting for the commit response from all of its replica databases, the caller is also blocked from performing any of its other tasks. While the caller is waiting for the commit message from the main database, and as the main database is also waiting for the commit message from all of its replica databases - as a result, the caller is blocked.
Asynchronously:
The same replication can also be done asynchronously.
Once the caller requests a commit to the main database, the main database commits the transaction on its own, signals its replica databases to commit the transactions and immediately returns a response to the caller without any confirmation from the replica databases, and the caller is not blocked.
This is faster than the synchronous way, but at a cost of consistency - what if the main database has returned a successful message to the caller, but later the replica databases send an unsuccessful message to the main database? So, it has a cost of consistency.
Asynchronous OS fsync
(fs cache
):
Let me ask you a question. How a file is written on the disk? Do not get it hard, let me answer it for you!
Generally, a file written into a disk happens through an OS intermediary cache system. When we write a file in any OS, OS does not directly write into the disk, rather the file is being written in an OS file cache system (like ext3 for Linux), and the bytes reside in OS cache. So, there is a cache in the OS file system where the write happens first in the form of pages, and after all the write in the Cache; the OS flushes all the pages at once in the disk.
Without this intermediary process of file writing of OS cache to disk, if the system were to directly write all the pages into the disk, the life and the performance of the disk would shrink drastically. This is one of the important reasons why the intermediary process is followed in general - but it also has a trade-off. This process does take some extra amount of time for the data to be reflected in the disk.
But in some cases of database application, the need for the accuracy of data and need for consistency might be very high, and thus we can not wait that extra few mili-seconds to write a file through a cash to the disk. In the case of some database applications, in case to achieve extreme consistency and accuracy of the data, we might want the OS to write the data directly into the disk, rather than via the cache system. To achieve this, we can make use of fsync (Linux)
to make sure the OS directly writes the file into the disk - which is also an asynchronous process.
I Am Tired and Need A Nap!
This is overall an overview of how synchronous and asynchronous process is executed at the OS level and some examples of asynchronous workloads. Hope you have learned something new from today’s post. I am Mahboob Alam, a software developer, and I love programming and talking hours about core concepts of technologies. Have feedback? Don't hesitate to comment!