“Most of the low latency programmer think that by merely writing a lock free data structure like lock free circular buffer or lock free hash table, their application performance will become increase by huge amount. This is wrong. If you don’t respect the CPU cores and their mutual communications and optimization that they do. They might slow the lock less data structure by a great degree at some point in load. Always test what you have assumed and what is real with your target architectures.”
Design and coding criteria for low latency application development:
1. Know your architecture [X-86 or what]
2. Avoid False sharing as much as you can by placing the structure variables in different cache line If they are suspected to be read by different threads in different cores the same time, and they are expected not to be used by same thread in next few instructions.
3. Enhance the cache line usage [and hence reduce L1 L2 cache miss] by placing the sequential variables [ variables to be read in sequence by one single thread ] to be used in program in contiguous memory location.
4. If your sequential containers [SPSC case] is suppose to be used by two threads at the same time [one producer and one consumer ],
4a. If producer is producing data continuously then place that thread in the one CPU core [ set CPU affinity for this]
Push the data chunk to the other core as soon as possible, without flushing to main memory.
4b. If Consumer is consuming data continuously then place that thread in one CPU core [ set CPU affinity for this ]
Pull the data from the producer core if the data to be read is not available in consumer core [ my core/self core]
In this case give the