- Why Fasting is Good
- It is just too easy to become mediocre
- We already die
- Today is my birthday
- Refactor Code & Cleaning Up Toilet
- Lessons Learned from Midtrans Hackathon 2018
- Code Review: My Own Experience
- Learning Forward and Sustainability
- Software is an Organization
- Explicit is better than implicit
- All posts ...
There is one administrator fallacy
Mar 22, 2018
For the last one year, as an early career professional in software engineering and development, I am very happy to involve in sort of distributed system development and getting exposed to several very good articles, books, and conference in the field. That’s help me to better understand nature of distributed system and see there is many thing we can play and research in this field.
During 1994-1997, several folks in Sun Microsystems including L. Peter Deutsch and James Gosling (the inventor of Java) introducing the 8 fallacies of distributed system. The fallacies are 1:
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous. I think each item of the fallacy seems obvious and self-explanatory, but in practice for newbie like me in the wild, there just many ways to do things wrong.
In this article we focus in the sixth fallacy, which is there is one administrator. I am not a system engineer or system administrator, so when I first read this fallacy it seems obvious and easy to grasp until I doing administration myself in a distributed fashion.
Basically this fallacy means many things to a lot of different context, but I try to summarize. Before that let me share my own background. So, this idea very annoy me after leader in open source project I contribute reminds me that I need to do this and that before actually doing something on the servers we operate (I’ll explain the end of the blog post). I feel not good, because I already now the fallacy, but yet I still don’t actually prevent this fallacy in practice. I also have another bad time when I am doing something stupid in a build server I and other folks maintain in my current company where I make the whole continuous integration in almost every team in the company can’t working due to local Docker image which not pushed to online registry deleted by me by running command I not fully understand.
Here is my take,
There will be different administrator with different level of expertise
As I already mentioned in the previous section when my leader in open source project I contribute to always check the logged in user before doing an operation, I realize that in context of Linux administration not every administrator has the same level expertise and experience. I have no to little experience managing server remotely with several people have access to the server.
Coordination of upgrades
Have you ever imagine you are an Android or iOS developer and your team develop back-end service that not backward compatible and only latest version of your app having already implemented your new spec? Yes, your app will break in the wild and your user will complain and give bad ratings. It is the example of one administration fallacy where the admin of iPhone or Android phone which use your app is not a single person, but many kind of people.
Another example is when you update your database, say MySQL, will your entire ecosystem support that version already? You know, someone also need to update the library or the implementation detail in their app.
Recall my story about deleting Docker images in build machine, where I mistakenly deleted all the images because I run a command where I not 100% know the behaviour and even worst I am doing
-f or forced command to that command and impacted a lot of user of the service where the server serve.
How to reduce mistakes for newbie like me
This is the simple tricks the lead of the open source project I told in the previous section when you are dealing with Unix system.
Check everyone who login with
who #[email protected]:~# who #root pts/0 2018-03-21 14:17 (c-66-31-202-63.hsd1.ma.comcast.net) #root pts/1 2018-03-21 19:13 (188.8.131.52) #root pts/2 2018-03-21 20:21 (184.108.40.206) w [email protected]:~# w # 20:22:18 up 6:05, 3 users, load average: 1.04, 1.03, 1.00 #USER TTY FROM [email protected] IDLE JCPU PCPU WHAT #root pts/0 c-66-31-202-63.h 14:17 1:04m 33.41s 33.20s top #root pts/1 220.127.116.11 19:13 1:06m 0.11s 0.11s -bash #root pts/2 18.104.22.168 20:21 1.00s 0.12s 0.03s w
Check history of the user who logged in with
last #[email protected]:~# last #root pts/2 22.214.171.124 Wed Mar 21 20:21 still logged in #root pts/1 126.96.36.199 Wed Mar 21 19:13 still logged in #root pts/2 188.8.131.52 Wed Mar 21 19:04 - 19:10 (00:05) #root pts/1 184.108.40.206 Wed Mar 21 18:32 - 19:13 (00:40) #root pts/1 220.127.116.11 Wed Mar 21 18:27 - 18:32 (00:05)
Just doing that before you login and coordinate with other logged user when you want to do risky stuff.
screen utility when collaborate in single machine with people distributed in different place.
#start screen screen #list all screen session screen -ls #detach screen #CTRL+A and D #resume screen screen -r #take over: detach and resume screen -d -r
You can learn more about
screen utility here.