Burning Samba with perf and FlameGraph

Explore how Catalyst's Samba team enhanced performance for the Samba 4.5 release.

Samba, serving as an Active Directory Domain Controller (DC), is widely used across various sectors, including education and large corporations, owing to its dependable implementation of essential protocols. As users deploy Samba at increasingly vast scales, they expect seamless performance. Let's delve into how Catalyst's Samba team enhanced performance for the Samba 4.5 release.

Identifying and Addressing Issues

Client feedback highlighted two critical issues:

  • Samba Replication Performance: Particularly noticeable with a large number of linked attributes.
  • Runtime Performance: Specifically, add/remove operations with linked attributes.

Reproducing the Problem

Efforts commenced with replicating the client's environment—a domain with numerous users, especially those in groups—challenging due to extended replication times. A pre-created database facilitated this task.

Profiling with Perf Tools

The team utilised perf record, a low-overhead profiling tool, to statistically analyse replication operations. This tool aided in identifying time-consuming processes.

Visualising with FlameGraph

Brendan Gregg's FlameGraph tool visualised performance bottlenecks. It revealed suboptimal processing within Samba, notably in tight loops and during database transactions.

Optimising Replication Process

  • Algorithmic Improvement: Avoiding costly processing steps (e.g., excessive checks and comparisons) significantly enhanced replication speed.
  • Early Attribute Application: By applying linked attributes earlier during replication, unnecessary traversal of lengthy lists was eliminated.

Enhancing Memory Efficiency

  • NDR Parsing Optimization: Implemented memory-efficient NDR parsing routines to reduce memory allocation overhead.
  • Talloc Optimization: Investigated and improved talloc (memory management library) performance, reducing overhead related to memory allocation and deallocation.

Results and Impact

The optimisations led to substantial improvements:

  • Reduced Replication Time: Achieved replication in half the previous duration.
  • Memory Efficiency: Enhanced NDR parsing and talloc performance across the Samba codebase.

Through meticulous profiling, algorithmic improvements, and memory optimisation, Catalyst's Samba team significantly enhanced replication performance in Samba 4.5. This effort ensures that Samba continues to meet the demands of large-scale deployments, reinforcing its reputation as a robust solution for diverse environments.

Return to summary