Develup

Projects and developments by Supragya Raj

DirSync

A SYSTEM TO KEEP DIRECTORIES ON TWO DISTANT LINUX MACHINES IN SYNC

Project link: Github

The backstory

The summer of 2018 was just along the corner and I had to get an internship… somewhere, for I was in the 3rd year of my graduation and was interested in working on a real-life project (also, it’s kind of important to complete an industrial internship to get the B.Tech degree).

In any case, I tried to apply for many different companies, however, was most excited about Socialcops.

I applied for the internship as a backend developer and got a great email.

The challenge

Dharmik’s email was a great sense of relief to me, for finally, I had got an email from an organization which is actually trying to have a conversation. I did not want to lose such an opportunity and got straight to work.

The following was the problem statement (you can skip unnecessary parts, I don’t mind).

Clone Wars 2.0
--------------

Prime Minister Lama Su,

I hope this letter finds you in the best of health.

The last batch of clones you built for us were faulty
and did not perform as expected (https://www.youtube.com/watch?v=b0DuUnhGBK4)

We unearthed some secrets about how the droid army was trained and hope that
you can use this information to make a better army this time around. With the
galaxy on the brink of another war, I cannot help but emphasize how much a 
large discount will help the Republic in its efforts.

One of our allies came across these schematics in an abandoned base that shed some
light on the droid training exercises, master Yoda concluded that a pair of droids
undergo various kinds of battle simulations during which each droid records its
progress and learning in a force, currently unfamilair to us, called "Data".
This force from both droids is then combined in a ritual called the
"Sync" resulting in both droids having an increased data force.

Please have a look at this schematic, your engineers may have better luck
decoding its mysteries.

            +----------------+                +----------------+
            |                |                |                |
            |   +--------+   |      Sync      |   +--------+   |
            |   |-|Data|-|   | +------------> |   |-|Data|-|   |
            |   +--------+   | <------------+ |   +--------+   |
            |                |                |                |
            |    Driod  A    |                |    Driod  B    |
            |                |                |                |
            +----------------+                +----------------+

May the force be with you.

- Sifo-Dyas


[....2 months later....]


Prime Minister Lama Su!,

I hope the army is coming along nicely. The force has given us more clarity in
the last few months. As it turns out, this "Data" that we were so worried about,
is just a method by which the droids store information about their experiences and
orders. Most importantly, the "Sync" ritual was just an exchange of files
from one droid to another in both directions. This is how their data force
increased after the ritual.

Master Windoo has been doing extensive research and has come up with a simplified
experiment to test if this training method can be implemented. He says that you
should start by figuring out how to synchronize data between a folder on one
device (say device A) and a folder on another device (say device B).
In addition to that, a change made to the data on one device should also be made 
available to the other device as well. If we have a way to do this then we could 
potentially improve the quality of the new clone army. I hope your engineers
are able to make sense of all of this information. Do write back to me if you
need more information.

Please share your method and implementation in great detail with us so
that it can be added to our records in the Jedi Temple. I wish you luck.

May the force be with you.

- Sifo-Dyas


                                 +---------------------+
                                 | Whats going on here?|
                                 +------------------+--+
                                                    |
                                                    |
  _                                                 |
  \\                                                |
   \\_          _.-._                               |
    X:\        (_/ \_)     <------------------------+
    \::\       ( ==  )
     \::\       \== /
    /X:::\   .-./`-'\.--.
    \\/\::\ / /     (    l
     ~\ \::\ /      `.   L.
       \/:::|         `.'  `
       /:/\:|          (    `.
       \/`-'`.          >    )
              \       //  .-'
               |     /(  .'
               `-..-'_ \  \
               __||/_ \ `-'
              / _ \ #  |
             |  #  |#  |   B-SD3 Security Droid
          LS |  #  |#  |      - Front View -

(http://www.ascii-art.de/ascii/s/starwars.txt)

It is kind of cool problem statement, isn’t it? In general sense, the following is what I had to do:

Create a system to sync two different system nodes so that the directories are in sync with each other.

The technology stack dilemma

Now I was stuck with two different things:

  1. I haven’t worked much with different web-based technology stacks like node.js and angular for that matter. And this is just not what I deal with for the most part. I had working knowledge on these, however, could not figure which framework to use and which not to.
  2. I am quite proficient in C. In fact, I have worked quite a bit on C programming language (you can C++ too, but I haven’t had much experience in STL, so it does not count). For C is not made for these problems per se, however, with C I had built quite complex systems.

So, the dilemma was how to proceed? To learn a new programming language and it’s frameworks and to build the system or to continue the “C” way…  I chose the C way.

inotify-tools

If you don’t know notify-tools, it is a great way (in Linux) to watch a directory. It can alert any system of changes that have occurred in the directory and then you can act upon it as you like. I used inotify-tools to build my system.

Our experimental setup

Before we begin on how to achieve this, I need to explain to you what the experimental setup is and how we are going to build on top of it. It is pretty basic – two clean installations of Ubuntu 16.04 on different Virtual Machines did the job more or less. These will be our two droids.

 

The solution we chose

Now, let us look at the problem statement at hand: Basically, there is a need to synchronize the data in some folder which is in sync between the two systems.    To solve this, we have many different approaches.

  1. One naive approach can be to copy the whole directory structure to the other droid after a given period of time. However, this can be very demanding for the network and very unrealistic in the actual setting.
  2. The other approach can be to utilize a system that automatically gives you information as to when there is a change in filesystem structure at a given directory and then to act upon it. However, then it has to be thought as to how the data can be transferred between the droids.
  3. The last approach would be to utilize the approach such as git diff. This will drastically reduce the data sent over the network. However, this is a complex task to reproduce by adding compression and VCS on both sides. However, this method does not have an automatic triggering system which 2 enjoys.

Ideally, the best approach is to do something that comes close to the second approach in triggering a send and receives from clients and if possible make it compact such as git diff.

In developing this system, we use the second approach to keep the directories in sync. We do not implement git diff approach (3), however, that can be built on top of the system discussed below.

We assume the following:  The directories to be synced are empty when the syncing program starts to run. The sync program needs to operate on initially empty directories on both the droids and should continue to run as long as sync is needed. Once the sync is not needed, the program may quit.  However, the sync is not guaranteed if the programs are rerun to operate on the same directories again.

And hence we need to do the following activities:

  1. Establish a way to track a directory for changes made in the directory.
  2. Establish communication between two droids.
  3. Make a program to communicate changes in the directory – actual program for syncing.

For tracking activity in directories

For Job 1, we can use inotify-tools. This little tool is supposed to watch a given folder and notifies us whenever there is some activity there.

We install in both the droid the inotify-tools. This can be done using

sudo apt-get install inotify-tools

For establishing connection between droids

For establishing connection between the droids, we need to setup the droids such that both the droids are connected to a network and have an IP address for themselves. To do this, we have to change the MAC address of the two running VMs. We can do this by selecting the settings for the bots and finding valid MAC addresses for them.

For this, we use a ​ Host only Network ​ in the VMware for both the bots.
After the wired connections establish for both the bots, we can find their IP address by using the command ​ ifconfig​..
For us, we have the following situation, and the two droids can ping each other too:

Finally, the DirSync program

Creating a program which allows the syncing between two droids is a two step process. First, we need to find the appropriate data structures for keeping track of changes, and establish an algorithm to process them. This also includes creation of a protocol for communication. Second, is establishment of the actual system.
In our system, we implement threads as follows (after communication is established):

Filewatcher​ watches the files in the directory and sends the messages of changes in the directory to the other droid.
Receiver listens the socket for any incoming messages from other droid. It acts upon the messages sent from the other droid and incorporates the filesystem changes on it’s end.

The third (grayed out) thread is simply used to exit out of the program safely. It sends an invalid message to the other droid causing the other droid to shut down and then shuts down itself. This thread is constantly on the lookout for string “shut” being input by the user.

A race condition

Implementing the system in its current state is a problem because of the following race condition. Consider creating a file at client A. This will happen:

To solve this situation, we add a ​ transaction history​ such that whenever a message is received, the corresponding changes due to inotify are not acted upon.
As there are two different threads running, while accessing this transaction history, mutexes are used to avoid transaction conflicts.

Delving deeper into the directory

Whenever there is a directory created, The filewatcher thread spawns another thread to watch filesystem activities in the new directory.

See it in action

If you want to see the system working, go to Github. Here’s an example run:

Adios, and keep up the hustle!