To retry or not to retry – Exponential backoff, Cirtcuit breaker

Our applications live in an imperfect world, communicate using unstable network and call unguaranteed resources. Lately, with the rise of service world, even more failure points appeared. The source code often contains more of insurance code than the actual business logic. I’d like to write about one of their kind – when the service call is unsuccessful.

The fail result might differ in nature:
Transient – For instance,
– 503 Service unavailable – when the service is overloaded or temporary disabled for maintenance;
– 504 Gateway Timeout – when proxy servers don’t get response from back servers in time;
– Also any timeout, when there is no response from server at all.
These are transient errors and might resolve after some time.

Permanent – Incorrect password error will never resolve with time.

When we determine that we are getting a transient error, we can start retrying every few seconds. Just one issue: If we hit timeout because of a server overload, our retries will increase the request queue even more and prevent service from recovering. There are few design patterns to tackle this problem:

 

Exponential backoff

You might have noticed, when you have GMail opened in the browser and internet goes off, the notification comes up “Connecting in 1s…”. At first it will retry in 1 second, then in 2 seconds, then in 4; then 8 and increases the delay time exponentially like that. Sometimes it even reaches hours.
The case is not only with the browser-server communication. This kind of problematic connections can be present totally on the server side – among different components. Sometimes ‘randomness’ is introduced for better performance. For instance, both methods are used in Amazon AWS architecture: Exponential Backoff and Jitter.
In randomness I mean that instead of having 4 seconds delay, there could be X seconds, where X is random number between 1 and 4.

I use these numbers for the sake of explanation. Clearly, we will need several constants:
Base delay time, maximal number of attempts and maximal delay time.

In a similar fashion, when the invocation is unsuccessful, we can make more and more delay instead of retrying every other second and let the service recover.

Exponential backoff algorithm is also used in the Ethernet protocol. When two machines on the same network try to send the packet simultaneously, collision happens. If they repeat the action after the same delay, they will collide again and again forever. Consequently, the delay formula roughly looks like this:
0 ≤ r < 2^k where k = min (n, 10)
n is the number of collisions and r is selected randomly between 0 and 2^k. k is number of collisions topped by 10. So, the more collisions happen, the more top limit is increased (exponentially). Then there is even less probability to get same delay time randomly.

 

Circuit breaker

This algorithm is very much like an electric circuit breaker, which we have at home. An intermediary object is placed (on client side) between the client and a server, which serves as a service protector. The requests are sent through this object. When it notices a high rate of failed responses, it will trip to an ‘Open’ state and won’t pass client requests, but rather respond to them itself with a failure.

In case of an electric one, we have to manually switch it back to the initial state, but we can’t do it here, so after some time interval, this object should automatically switch into a “Half-Open” state and let one request pass to the service. Based on the response, it will either stay in the “Open” state, or move to “Closed” one and will pass all requests.

After several timeout results, the protector will trip and start replying to the client

Photo is taken from Martin Fowler’s blog

In some frameworks (especially while communicating with the database), these kinds of algorithms are already implemented.

If you have a public API and wish that unknown clients stick to a better mechanism of retry – i.e not to totally kill your service during troubles, then you can write client libraries yourself for several languages and users will use them instead.

Georgian Capital letters added to Unicode. Now what?

Last may Unicode approved 46 Capital letters of Georgian Mkhedruli alphabet.

Maybe it’s a bit early, but operating systems will support this change in future anyway. Out of curiosity I decided to do a little research about what will change for us – developers and I’m sharing it in this article.

Few definitions just in case:
Unicode – A standard, which maps every symbol with an unique number. Also it describes specific rules for different languages. This standard is used all over the technical world and everyone who needs text processing / representing, follows it – Operating systems, platforms, browsers…

UTF-8 – Unicode has the list of symbols and number codes, but it does not care how this information will be stored in memory. There are various encoding algorithms for this. UTF-8 is one of the most popular ones as it optimally uses memory and does not require extra bytes for a symbol which can be fit in just one. Other encoding examples are UCS-2, UTF-16, UTF-32…

Changes in a standard cause changes in implementations, which does not happen immediately. For instance, ₾ Georgian currency symbol was added to 8 version of Unicode on 17 may, 2015 and the Windows update for this symbol was released on 19 January, 2016.

Operating Systems should update keyboard drivers to enable Georgian users use CAPS mode to write Capital letters (there are 33 letters in Georgian, so shift+symbols method is already taken). Also system fonts should be updated, so correct symbols will appear during the font fallback.

Due to the fact, that the Capital and Small versions of the same letter have different codes, software developers usually need some considerations – up until now only for other languages, now for Georgian, too. For instance, when it’s necessary to compare strings, search, match regex patterns, sort, store into the database, etc.

 

Database

MS SQL server has built in Unicode support and during the operations it follows the standard anyway. Just make sure it follows the correct version: SQL Fiddle

It’s different with MySQL – Here each database, table or even a column might have corresponding collation defined, based on what kind of information it stores. We are accustomed to using utf8_general_ci, as it ‘processes’ Georgian letters, too. This collation does not completely implement the Unicode unlike utf8_unicode_ci. Generally, it was being used just because of better performance, however, there is not much difference with modern processors. utf8_unicode_ci will correctly process new Georgian alphabet upon version upgrade.

Here is an example:
Together with the unique codes, Unicode also defines the order of symbols, which is used during sorting. E.g. in this list all kinds of Georgian letter ‘ა’ are listed together – Nuskhuri, Asomtavruli and Mkhedruli. Then versions of ‘ბ’ letter appear. Probably new capital letters will be added in the same way.

SQL Fiddle

CREATE TABLE IF NOT EXISTS `test` (
  `content` varchar(200) NOT NULL
) DEFAULT CHARSET=utf8 COLLATE utf8_general_ci;
INSERT INTO `test` (`content`) VALUES
  ('აბგ'),  ('ააააა'),  ('Ⴁააააა'),  ('Ⴀააააა'),  ('bcd'),  ('ab.'),  ('Ⴄ'),  ('ж'),  ('Ж'),  ('ц'),  ('Ц');
  

CREATE TABLE IF NOT EXISTS `test_better` (
  `content` varchar(200) NOT NULL
) DEFAULT CHARSET=utf8 COLLATE utf8_unicode_ci;
INSERT INTO `test_better` (`content`) VALUES
  ('აბგ'),  ('ააააა'),  ('Ⴁააააა'),  ('Ⴀააააა'),  ('bcd'),  ('ab.'),  ('Ⴄ'),  ('ж'),  ('Ж'),  ('ц'),  ('Ц');


select * from `test` d order by d.content;
select * from `test_better` d order by d.content;

Result:

ab., bcd, Ж, ж, ц, Ц, Ⴀააააა, Ⴁააააა, Ⴄ, ააააა, აბგ
ab., bcd, ж, Ж, ц, Ц, ააააა, Ⴀააააა, აბგ, Ⴁააააა, Ⴄ

The MySQL 8 beta release, which appeared currently, has implemented Unicode version 9, our capital letters are in version 11 🙂

 

Javascript

Although there are many implementations, we can’t ignore V8 anyway, so I’ll discuss based on it.

Javascript has Unicode support, but some things still have problems (e.g. unicode regex). If we need sorting or filtering on our site, than ordinary sort won’t work any more and we should use Locale. Then it will consider Unicode rules. For instance:

let a = ['აბგ','ააააა','Ⴁააააა','Ⴀააააა','bcd','ab.','Ⴄ','ж','Ж','ц','Ц'];
console.log(a.sort());
console.log(a.sort(Intl.Collator('ru').compare));

Unfortunately it does not have support for Georgian collation at all. So, we cannot correctly sort together with Nuskhuri and Asomtavruli. Well, this is a very rare case anyway, so no need to worry. Casual function sorts based on the code points, so it will be according to alphabet (with the exception of capital letters).

That problem with capitals can be solved by converting strings to the same case. Giorgi suggested an idea:

myArray.sort(function(s1,s2){ return s1.toLowerCase() > s2.toLowerCase()}));

Probably it will work correctly for Georgian, too, after V8 renews the Unicode implementation. Currently it works like that for Asomtavruli and Nuskhuri: "Ⴀ".toLowerCase() => "ⴀ"

It seems that, as standard defined Asomtavruli as CAPITAL and Nuskhuri as SMALL, these alphabets are implemented as cases of single one instead of being two completely different alphabets. (v8 source file: unicode.cc, code points are mapped directly.)
Now Mkhedruli is caseless. It’s interesting how it will be marked. I think there is no other language with two kinds of Capital letters.
Anyway, this requires the version upgrade anyway.

Now I remembered, that V8 is an open source project and a volunteer can add Georgian locale. For the time being this results in an empty array:

Intl.Collator.supportedLocalesOf('ka')

 

Java

Java is not in a hurry to upgrade either. JDK 9 with the Unicode 8 implementation (where Lari currency symbol was added) was released after two years – September of 2017.
Here the strings are compared with ‘equals’. In future we’ll need to use the ‘equalsIgnoreCase’ method for Georgian, too:

"Ⴀ".equals("ⴀ")  => false
"Ⴀ".equalsIgnoreCase("ⴀ")  => true

As there is one Capital alphabet already, I’m testing with it. We just don’t use that alphabet generally.

Also, we can’t search with regex directly. Ordinary i – ignore case flag does not work, as Unicode is processed differently. So, we should write:

"A".matches("(?i)[a]")  => true
"Ⴀ".matches("(?i)[ⴀ]") => false

Pattern.compile("[ⴀ]", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher("Ⴀ").matches();  => true

Correspondingly, we should consider this wherever we use strings – maps, sets, etc.

 

PHP

Generally, working with unicode strings is not pleasant in PHP at all + more conversions will be added here, too.

 

 
We’ll also need change at other place – with very convenient search tools – grep and the similar ones. The case insensitive option of grep does not work for existing Georgian Asomtavruli capital alphabet even now. I hope the Unicode changes will be reflected in their upgrades, too. They are great apps for regex filtering and searching in large (or small) texts and files.

Many of Georgian application systems won’t be able to quickly upgrade their platforms, as testing would take huge amount of time. They will probably add some conversions and validations in front-end, to prevent user input capital strings from appearing in old Java or other systems.

Overall, I like that Capital letters were added (as a result of several persons hard work). It’s an important part of the Georgian language and should not be lost.

Do you have any ideas, what else will need to be changed?

Some resources about the topic:
On.ge – UNICODE-მა ქართული მხედრული ანბანის 46 მთავრული ასონიშანი დაამტკიცა
DevFest 2016: Akaki Razmadze –  ❤  [I LOVE UNICODE]
DevFest 2017: Akaki Razmadze – გუტენბერგი, სტივ ჯობსი, გუგლი, ხინკალი, უნიკოდი
DevFest 2016: Michael Everson – The confusing case history of Georgian in Unicode

My talk at DevFest 2017: Continuous Integration-Delivery-Deployment

The talks from the Developers’ festival are being published ^_^
I’m sharing my talk here, to keep it on my blog. I love this festival. Instead of few days, it took me one month to prepare the presentation cause of my little baby, but I really wanted to participate :)))

This is the demo url on Github:
https://github.com/elatsoshvili/DevFestDemo2017

Integration tests with databases (Node.js + Mocha)

Automation tests are divided into several categories. To be short, unit tests are used to test small fragments of code. For example, there is a function for formatting a phone number. We might have several unit tests for covering various scenarios, but if we want to check how user performs registration with this number and then passes authorization, we need to cover interaction of several components in our test – this is an integration test (or maybe even acceptance test).

Generally, we are facing an integration test, if it uses:

  • A database
  • A network
  • Any external system (e.g. a mail server)
  • I/O operations

The hard part is that unlike unit tests we cannot run test operations directly on external systems. E.g. we cannot send thousands of test mails to randomly generated addresses. there are several ways to solve this kind of problem depending on what we want to test. let’s look at the options:

 

Service imitation (Stubs, Mocks)

Let’s assume we’re writing a client application, which invokes services on various servers. I.e. our priority is testing a client and no need to actually use production operations. In this case we can create a service stub with exactly same functions and parameters as the real one. only instead of executing the real logic, it will return some fixed responses.

function sendMail(email, content) {
    console.log(‘Email sent to: ‘ + email);
    return true;
}

When we run our app in a test mode, we should make it use the fake service object instead of a real one (Let’s dive into details in future articles).

 

Using the database

Let’s say we are writing a service which heavily uses a database and we need integration tests to check it. clearly we can substitute the database layer with a stub and let select, insert,etc. operations return some predefined fixed values. However in most cases this is not practical and doesn’t really test the relations among various processes. For instance, I would like user to register, activate their account and perform authorization. This flow uses several tables and I would prefer to execute it on the database.

There are several solutions here, too. I prefer to have an empty database separately – neither in-memory, nor a lighter alternative, but exactly the same version of a database, just dedicated to testing. When my app runs in a test mode, it will fetch the test database path from corresponding configuration and will use for test operations. First it will clear the tables to avoid broken state.

I will use Node and Mocha for this example

In my previous post I was describing configuration of various environments. I don’t think of Mocha tests as a different environment, because we might have dev, test and even build servers and tests would be running on all of them. However I will follow the similar method – I’ll use environment variables for configuring testing runtime, too, and I’ll create .env.mocha file.

I’d like to note that the dotenv documentation clearly states – it’s not recommended to have multiple env files like .env, env.test, env.prod, etc, but we should have one .env file with different content on different servers. In my opinion .env.mocha serves completely different purpose and is not included in this rule.

The next step is to use .env.mocha file instead of a real one while app runs in a test mode. Currently there is no working cross-platform code on the internet and I like using Windows OS, so I’m offering my solution, and no need to load configuration in every test file either:

  • Create .env.mocha file in the project directory and configure properly with test values.
  • Create setup.js file under test directory and put this line into it:
    require('dotenv').config({path:__dirname + '/../.env.mocha'});
  • Create one more file under test directory – mocha.opts and put this line there:
    --require test/setup.js

That’s it.
When you run ‘npm test’ on the project, .env.mocha configuration will be used in every test automatically.

For the sake of insurance and to make sure that I’m not loading production configuration (not to drop all databases), I’ll add one more property into the .env.mocha file and execution of setup.js will continue only in case it is found (e.g. MOCHA_CONFIG_LOADED=yes)

I would also like to have empty tables before running tests. Mocha has various hooks and among them before(), which will be invoked before executing the test suite, if we put it inside ‘describe’. If we declare it globally, then it will be executed only once before all tests. That’s exactly what I need. It would be better if I could put this code in setup.js, but if you try, you’ll find that mocha is not yet loaded on that stage and ‘before’ variable won’t be defined. So, I added hooks.js file under the test directory and described my global hooks there.

If integration tests take too long to execute, it’s possible to configure scripts in package.json and make different commands for running unit and integration tests (separated on a directory level).

Memory optimization with multidimensional arrays in Java

Yesterday I hit a memory limit while submitting a problem solution to an online judge. All because I used a wrong declaration of a multidimensional array. In case of C or C++ language this would not be a problem, but due to the fact that in Java everything is an object (despite primitive types), we should calculate allocated memory differently for multidimensional arrays.

For instance, in C++ the size of this array is 96 bytes (assuming we have 4-byte int):
int arr[2][3][4];
2 * 3 * 4 = 24. Plus 4 bytes for each due to int type and we get total of 24 * 4 = 96.
If we declare the array like this, result would be the same: int arr[4][3][2]

Let’s compare these two declarations in Java:
int arr[][][] = new int[40000][200][2];
int arr[][][] = new int[2][200][40000];

There is not much different from the data storage viewpoint, however the first one requires much more memory. Here is why:
1. Array in Java is an object itself. Every object (not near the reference, but in the heap) holds few additional bytes for the object header information. This header contains data necessary for the VM, which later uses it for the garbage collector or some other purposes. As far as I know, generally it takes 8 bytes in case of a 32-bit machine and 16 bytes in case of a 64-bit one. Apart from this, an object holds information about the array size – that’s additional 4 bytes. Also padding in memory might take few more bytes. So, I won’t start precise calculations and let’s assume that each array object (excluding array elements) takes X bytes.

2. int[a][b] – In Java language this is ‘a’ number of arrays which contain ‘b’ number of elements. I.e. we have a+1 number of objects instead of just one.
int[a][b][k] In case of a three-dimensional one a * b + a + 1 number of object, etc.

Now let’s calculate and compare sizes of these arrays:
(40,000 * 200 + 40,000 + 1) * X + (40,000 * 200 * 2 * 4)
(2 * 200 + 2 + 1) * X + (40,000 * 200 * 2 * 4)

Clearly, the second part, which calculates the total size of elements based on int type, will be the same. But according to the first part we will have 8,039,598 excessive objects in case of the first array, consequently it will take considerable larger memory.

By the way, I cannot see if this number is real with profiler, do you have any idea how to check this?