🚀 Performance Optimization

Learn advanced techniques for writing high-performance C++ code with memory and CPU optimization.

Performance Principles

C++ is known for its performance capabilities. Understanding how to write efficient code requires knowledge of memory management, CPU architecture, and compiler optimizations.

Memory Optimization

Cache-Friendly Data Structures

// Bad: Array of Structures (AoS) - poor cache locality struct Particle { float x, y, z; float vx, vy, vz; float mass; }; std::vector<Particle> particles; // Good: Structure of Arrays (SoA) - better cache locality struct ParticleSystem { std::vector<float> x, y, z; std::vector<float> vx, vy, vz; std::vector<float> mass; };

Avoiding Unnecessary Allocations

// Reserve capacity to avoid reallocations std::vector<int> v; v.reserve(1000); // Pre-allocate memory // Use string_view to avoid copies void process(std::string_view sv) { // No allocation, just a view } // Small Buffer Optimization (SBO) // std::string uses it automatically for small strings std::string small = "Hello"; // No heap allocation

Move Semantics

Avoid unnecessary copies with move semantics:

class Buffer { std::unique_ptr<char[]> data_; size_t size_; public: // Move constructor - transfer ownership Buffer(Buffer&& other) noexcept : data_(std::move(other.data_)) , size_(std::exchange(other.size_, 0)) {} // Move assignment Buffer& operator=(Buffer&& other) noexcept { if (this != &other) { data_ = std::move(other.data_); size_ = std::exchange(other.size_, 0); } return *this; } }; // Use std::move when transferring std::vector<Buffer> buffers; buffers.push_back(std::move(myBuffer));

Compiler Optimizations

// constexpr - compile-time computation constexpr int factorial(int n) { return n <= 1 ? 1 : n * factorial(n - 1); } constexpr int result = factorial(10); // Computed at compile time // Force inlining for performance-critical code [[gnu::always_inline]] inline void hot_function() { // Critical code }

SIMD and Parallelism

#include <execution> #include <algorithm> std::vector<int> v(1000000); // Parallel algorithms (C++17) std::sort(std::execution::par, v.begin(), v.end()); std::for_each(std::execution::par_unseq, v.begin(), v.end(), [](int& x) { x *= 2; }); // Manual SIMD (with intrinsics) #include <immintrin.h> void add_vectors(float* a, float* b, float* c, size_t n) { for (size_t i = 0; i < n; i += 8) { __m256 va = _mm256_load_ps(a + i); __m256 vb = _mm256_load_ps(b + i); __m256 vc = _mm256_add_ps(va, vb); _mm256_store_ps(c + i, vc); } }

Profiling Tools

← Back to Developer Resources