Quick back of the envelope calculations
In the design, it sometimes is important to get a rough estimate about the amount of server or boxes which will be needed to support a scale. In this post, I would like to summarize some of the quick hacks to estimate the number of machines based on multiple parameters.
First let’s do mapping between 2^x and 10^y for data estimation.
* 2^1 => 2 (1 bit can represent 2 items 0, 1)
* 2^2 => 4 (2 bits can represent 4 items 00, 01, 10, 11)
* 2^3 => 8 (can represent 8 items, octal system)
* 2^4 => 16 (can represent 16 items, hexadecimal system 0-F, two hexadecimal characters form a byte)
* 2^5 => 32
* 2^6 => 64
* 2^7 => 128 (Can represent ASCII, in a byte (8 bits))
* 2^8 => 256 (Represent ISO-8859-1 and multiple number of bytes form UTF-8)
* 2^9 => 512
* 2^10 => 1 KB --> 10^3 bytes (Thousand)
* 2^20 => 1 MB --> 10^6 bytes (Million)
* 2^30 => 1 GB --> 10^9 bytes (Billion)
* 2^40 => 1 TB --> 10^12 bytes (Trillion)
* 2^50 => 1 PB --> 10^15 bytes (Quadrillion)
Using a the above table you can quickly calculate for example how many 2^48 (8 TB) in 10 index:
2^48 => 2^40 (1 TB) * 8 => 10^12 (Trillion) * 8 => 8 Trillion
Common approximate quantities used in the calculations to make it simpler
Approximate seconds in a day: 10^5 (10 Thousand)
Approximate days in a year: 400
Approximate connection throughput in data center:
10 Gbps (Giga bits per second) connection: 10 * 10^9/10 => 10^9 bytes per second => 1 GB data transfer will take 1 sec.
How many unique keys can be created with 64 characters (a-zA-Z0-9-\_)
as alphabet size and 7 character size: 4 Trillion
Number of Unique keys with size of 7 characters => 64 ^ 7
64 => 2 ^ 6
64 ^ 7 => 2 ^ (6 * 7) => 2 ^ 40 (Trillion) * 2 ^ 2 => 4 Trillion
Considering 4 Trillion keys can be generated with 64 character alphabet and 7 characters, how many years will it last if generating 1000 keys/second: 100 years
(1000 Keys per second are generated): 1000 * 10^5 keys are generated in a day => 10^8
In a year key generated are 400 * 10^8 => 4 * 10^10 => 40 Billion
Total keys => 4 * 10^12 keys
Number of years: 4 * 10 ^ 12 / 4 * 10 ^ 10 => 100 years
How much data will be needed to store generated 40 Billion keys in a year: 240 GB
How many bits are needed to represent 64 characters: 64 => 2^6 => 6 bits (This is by creating custom codec for characters (a-zA-Z0-9-\_))
Size to store each key: 7 (characters) * 6 bits = > 42 bits => 42/8 => 6 bytes
Keys generated in year 40 Billion
Storage needed 40B * 6 (bytes) => 240 Billion bytes => 240 GB
Common approximations around users:
10% Active users in a day
1 Billion users -> 10% -> 100 Million users active in a day (10^6 * 100) => 10^8
Read/write (Most of the users are read only (caching really helps in such cases))
10% of active users write to the system in a day (Create a post, Upload an image)
100 Million Active users * 10% => 10 Million users write every day
Data Storage Calculations: These are calculations to find how much data storage needs will be required in a year or 10 years.
100 KB Post written by active users in a day, what is the amount space needed in a year:
10% of active users write every day 10 posts => 10 Million users 10^7 * 10 => 10 ^ 8
100KB(10^5 B) * 10^8 => 10^13 B every day => 10 TB every day.
In a year:
10 TB * 400 => 4000TB => 4PB 4 Peta Byte
Number of disks needed to store 4PB
One hard drive storing 10 TB: Will need 400 drives, with redundancy of 3 will need around 1200 drives
QPS calculations:
Get the data of number of active users in a day:
100 Million => 10^8
Read QPS:
(All active users make 100 requests each day)
10^8 * 100 => 10^10
QPS : 10^10 / 10^5 => 10^5 (10,000 request/second)
Write QPS:
Get the data of number of active users in a day:
100 Million => 10^8
10% of the users write to the system 100 times every day => 10^7 * 100 => 10^9
QPS: 10^9/10^5 => 10^4 (1000 request/second)