LOC (Lines of Code) has long been a go-to proxy to measure developer productivity.
Although easy to quantify, do more lines of code actually reflect the output?
In reality, LOC tells you nothing about the new features added, the effort spent, or the work quality.
In this post, we discuss how measuring LOC can mislead productivity and explore better alternatives.
Measuring dev productivity by counting lines of code may seem straightforward, but this simplistic calculation can heavily impact code quality. For example, some lines of code such as comments and other non-executables lack context and should not be considered actual “code”.
Suppose LOC is your main performance metric. Developers may hesitate to improve existing code as it could reduce their line count, causing poor code quality.
Additionally, you can neglect to factor in major contributors, such as time spent on design, reviewing the code, debugging, and mentorship.
# A verbose approach
def add(a, b):
result = a + b
return result
# A more efficient alternative
def add(a, b): return a + b
Cyclomatic complexity measures a piece of code’s complexity based on the number of independent paths within the code. Although more complex, these code logic paths are better at predicting maintainability than LOC.
A high LOC with a low CC indicates that the code is easy to test due to fewer branches and more linearity but may be redundant. Meanwhile, a low LOC with a high CC means the program is compact but harder to test and comprehend.
Aiming for the perfect balance between these metrics is best for code maintainability.
Example Python script using the radon library to compute CC across a repository:
from radon.complexity import cc_visit
from radon.metrics import mi_visit
from radon.raw import analyze
import os
def analyze_python_file(file_path):
with open(file_path, 'r') as f:
source_code = f.read()
print("Cyclomatic Complexity:", cc_visit(source_code))
print("Maintainability Index:", mi_visit(source_code))
print("Raw Metrics:", analyze(source_code))
analyze_python_file('sample.py')
Python libraries like Pandas, Seaborn, and Matplotlib can be used to further visualize the correlation between your LOC and CC.
Despite LOC’s limitations, it can still be a rough starting point for assessments, such as comparing projects within the same programming language or using similar coding practices.
Some major drawbacks of LOC is its misleading nature, as it factors in code length and ignores direct performance contributors like code readability, logical flow, and maintainability.
LOC fails to measure the how, what, and why behind code contributions. For example, how design changes were made, what functional impact the updates made, and why were they done.
That’s where Git-based contribution analysis helps.
PyDriller and GitPython are Python frameworks and libraries that interact with Git repositories and help developers quickly extract data about commits, diffs, modified files, and source code.
from git import Repo
repo = Repo("/path/to/repo")
for commit in repo.iter_commits('main', max_count=5):
print(f"Commit: {commit.hexsha}")
print(f"Author: {commit.author.name}")
print(f"Date: {commit.committed_datetime}")
print(f"Message: {commit.message}")
Metrics to track and identify consistent and actual contributors:
Metrics to track and identify code dumpers:
A sole focus on output quantity as a performance measure leads to developers compromising work quality, especially in a collaborative, non-linear setup. For instance, crucial non-code tasks like reviewing, debugging, or knowledge transfer may go unnoticed.
Variance analysis identifies and analyzes deviations happening across teams and projects. For example, one team may show stable weekly commit patterns while another may have sudden spikes indicating code dumps.
import pandas as pd
import matplotlib.pyplot as plt
# Mock commit data
df = pd.DataFrame({
'team': ['A', 'A', 'B', 'B'],
'week': ['W1', 'W2', 'W1', 'W2'],
'commits': [50, 55, 20, 80]
})
df.pivot(index='week', columns='team', values='commits').plot(kind='bar')
plt.title("Commit Variance Between Teams")
plt.ylabel("Commits")
plt.show()
Using generic metrics like the commit volume, LOC, deployment speed, etc., to indicate performance across roles is an incorrect measure.
For example, developers focus more on code contributions while architects are into design reviews and mentoring. Therefore, normalization is a must to evaluate role-wise efforts effectively.
Three more impactful performance metrics that weigh in code quality and not just quantity are:
Defect density measures the total number of defects per line of code, ideally measured against KLOC (a thousand lines of code) over time.
It’s the perfect metric to track code stability instead of volume as a performance indicator. A lower defect density indicates greater stability and code quality.
To calculate, run a Python script using Git commit logs and big tracker labels like JIRA ticket tags or commit messages.
# Defects per 1,000 lines of code
def defect_density(defects, kloc):
return defects / kloc
Used with commit references + issue labels.
The change failure rate is a DORA metric that tells you the percentage of deployments that require a rollback or hotfix in production.
To measure, combine Git and CI/CD pipeline logs to pull the total number of failed changes.
grep "deployment failed" jenkins.log | wc -l
This measures the average time to respond to a failure and how fast changes are deployed safely into production. It shows how quickly a team can adapt and deliver fixes.
Three ways you can implement the above metrics in real time:
Integrating your custom Python dashboard with GitHub or GitLab enables interactive data visualizations for metric tracking. For example, you could pull real-time data on commits, lead time, and deployment rate and display them visually on your Python dashboard.
If you want to forget the manual work, try tools like Prometheus - a monitoring system to analyze data and metrics across sources with Grafana - a data visualization tool to display your monitored data on customized dashboards.
CI/CD pipelines are valuable data sources to implement these metrics due to a variety of logs and events captured across each pipeline. For example, Jenkins logs to measure lead time for changes or GitHub Actions artifacts to oversee failure rates, slow-running jobs, etc.
Caution: Numbers alone don’t give you the full picture. Metrics must be paired with context and qualitative insights for a more comprehensive understanding. For example, pair metrics with team retros to better understand your team’s stance and behavioral shifts.
Combine quantitative and qualitative data for a well-balanced and unbiased developer performance model.
For example, include CC and code review feedback for code quality, DORA metrics like bug density to track delivery stability, and qualitative measures within collaboration like PR reviews, pair programming, and documentation.
Metric gaming can invite negative outcomes like higher defect rates and unhealthy team culture. So, it’s best to look beyond numbers and assess genuine progress by emphasizing trends.
Although individual achievements still hold value, an overemphasis can demotivate the rest of the team. Acknowledging team-level success and shared knowledge is the way forward to achieve outstanding performance as a unit.
Lines of code are a tempting but shallow metric. Real developer performance is about quality, collaboration, and consistency.
With the right tools and analysis, engineering leaders can build metrics that reflect the true impact, irrespective of the lines typed.
Use Typo’s AI-powered insights to track vital developer performance metrics and make smarter choices.