AI-powered sprint planning: dự đoán scope creep trước khi nó xảy ra

Scope creep không phải đột ngột xuất hiện — nó tích lũy từng ticket một, từng "nhỏ thôi" một. Team tôi mất một sprint rưỡi mới nhận ra điều đó, cho đến khi chúng tôi để AI theo dõi thay. Bài này chia sẻ kiến trúc thực tế: Azure DevOps API + Python feature engineering + LLM risk scoring — đủ để bạn triển khai trong một tuần. ---

Sprint 14 và cái giá của "chỉ thêm một story nhỏ thôi"

Sprint 14 của dự án ERP module — tôi vẫn nhớ cái cảm giác buổi review hôm đó. Velocity dự kiến là 42 points, team commit 40. Nghe ổn. Nhưng đến ngày thứ 8 của 10 ngày sprint, burndown chart vẫn nằm ở 28 points chưa xong.

PM nhìn tôi. Tôi nhìn lại backlog. Và tôi thấy: sprint backlog đã tăng từ 40 lên 57 story points mà không ai để ý. Một story "estimate lại vì scope rõ hơn" (+5). Một dependency mới được phát hiện khi dev implement (+4). Một yêu cầu từ stakeholder được thêm vào giữa sprint vì "urgent" (+8). Và một loạt acceptance criteria silently mở rộng.

Không phải ai sai. Mà là không ai nhìn bức tranh toàn cục.

Sau sprint đó, tôi ngồi với team: "Chúng ta cần dự đoán được điều này từ Sprint 1, không phải Sprint 2 khi đã muộn."

Scope creep thực sự là gì trong môi trường agile?

Textbook nói scope creep là "thêm feature ngoài plan". Thực tế thì tinh vi hơn nhiều.

Trong môi trường agile, scope creep không đến bằng một quyết định lớn — nó đến bằng hàng chục quyết định nhỏ:

Story point inflation: Story được re-estimate cao hơn sau planning vì dev hiểu rõ hơn (hoặc sợ hơn)
Ticket churn: Ticket bị tạo mới, split, merge, hoặc re-open liên tục trong sprint
Acceptance criteria drift: Criteria được append thêm sau planning mà không có story point adjustment
Late dependency discovery: Dependency với team khác chỉ xuất hiện khi implement đến nơi
Velocity-commitment gap ngày càng rộng: Team commit nhiều hơn actual velocity 2-3 sprint liên tiếp

Mỗi signal này một mình không đáng lo. Nhưng khi chúng xuất hiện đồng thời trong cùng một sprint? Đó là scope creep đang hình thành.

Vấn đề là con người rất tệ trong việc nhận ra pattern này trong real-time. AI thì không.

Giải pháp chúng tôi chọn: một pipeline AI nhỏ gọn

Tôi không muốn over-engineer. Mục tiêu ban đầu rất cụ thể: cảnh báo sớm trước ngày thứ 5 của sprint khi có dấu hiệu scope creep đang hình thành.

Architecture đơn giản gồm 3 layer:

Azure DevOps REST API
        ↓
Python Feature Engineering (chạy daily, hoặc trigger khi có change)
        ↓
LLM Risk Scoring (GPT-4o hoặc Claude — phân tích context)
        ↓
Teams/Slack Alert + Dashboard

Không cần model tự train. Không cần data science team. Chỉ cần hiểu data và biết đặt câu hỏi đúng cho LLM.

Chi tiết kỹ thuật

Layer 1: Thu thập dữ liệu từ Azure DevOps

Chúng tôi dùng Azure DevOps REST API để lấy sprint data. Đây là Python snippet cơ bản:

import requests
import base64
from datetime import datetime

class AzureDevOpsSprintCollector:
    """
    Thu thập sprint metrics từ Azure DevOps REST API.
    Dùng để feed vào pipeline scope creep detection.
    """

    def __init__(self, org: str, project: str, team: str, pat_token: str):
        self.base_url = f"https://dev.azure.com/{org}/{project}"
        self.team = team
        # PAT token encode theo Azure DevOps convention
        token = base64.b64encode(f":{pat_token}".encode()).decode()
        self.headers = {"Authorization": f"Basic {token}"}

    def get_current_sprint_items(self, iteration_id: str) -> dict:
        """Lấy tất cả work items trong sprint hiện tại."""
        url = f"{self.base_url}/{self.team}/_apis/work/teamsettings/iterations/{iteration_id}/workitems"
        resp = requests.get(url, headers=self.headers, params={"api-version": "7.1"})
        resp.raise_for_status()
        return resp.json()

    def get_sprint_capacity(self, iteration_id: str) -> dict:
        """Lấy capacity của team cho sprint."""
        url = f"{self.base_url}/{self.team}/_apis/work/teamsettings/iterations/{iteration_id}/capacities"
        resp = requests.get(url, headers=self.headers, params={"api-version": "7.1"})
        resp.raise_for_status()
        return resp.json()

    def get_work_item_history(self, work_item_id: int) -> list:
        """
        Lấy update history của một work item.
        Đây là nơi phát hiện story point inflation và AC drift.
        """
        url = f"https://dev.azure.com/{self.base_url}/_apis/wit/workItems/{work_item_id}/updates"
        resp = requests.get(url, headers=self.headers, params={"api-version": "7.1"})
        resp.raise_for_status()
        return resp.json().get("value", [])

Layer 2: Feature Engineering — tính các signal scope creep

Đây là phần quan trọng nhất. Chúng tôi tính 5 feature chính:

from dataclasses import dataclass
from typing import Optional
import statistics

@dataclass
class SprintScopeSignals:
    """
    Container cho các signals scope creep của một sprint.
    Score càng cao, rủi ro càng lớn.
    """
    sprint_id: str
    sprint_name: str
    
    # Signal 1: Story point inflation (so sánh estimate lúc planning vs hiện tại)
    planned_points: float
    current_points: float
    point_inflation_pct: float       # (current - planned) / planned * 100
    
    # Signal 2: Ticket churn (số lần tickets bị add/remove/reopen trong sprint)
    tickets_added_mid_sprint: int
    tickets_removed_mid_sprint: int
    churn_ratio: float               # (added + removed) / total_planned_tickets
    
    # Signal 3: Velocity vs commitment gap
    avg_velocity_last_3_sprints: float
    current_commitment: float
    overcommit_ratio: float          # current_commitment / avg_velocity
    
    # Signal 4: Acceptance criteria drift
    tickets_with_ac_changes: int
    total_tickets: int
    ac_drift_ratio: float
    
    # Signal 5: Late dependency discovery
    new_dependencies_added: int
    blocked_tickets_count: int
    
    # Composite risk score (0-100)
    risk_score: float
    risk_level: str                  # "LOW", "MEDIUM", "HIGH", "CRITICAL"


def compute_risk_score(signals: SprintScopeSignals) -> float:
    """
    Tính composite risk score dựa trên weighted combination của các signals.
    Weights dựa trên correlation với sprint failure từ historical data.
    """
    score = 0.0
    
    # Point inflation — weight cao nhất vì dự đoán tốt nhất về scope creep
    if signals.point_inflation_pct > 20:
        score += 30
    elif signals.point_inflation_pct > 10:
        score += 15
    elif signals.point_inflation_pct > 5:
        score += 8
    
    # Overcommitment (velocity gap)
    if signals.overcommit_ratio > 1.3:
        score += 25
    elif signals.overcommit_ratio > 1.15:
        score += 12
    
    # Ticket churn
    if signals.churn_ratio > 0.25:
        score += 20
    elif signals.churn_ratio > 0.15:
        score += 10
    
    # AC drift
    if signals.ac_drift_ratio > 0.3:
        score += 15
    elif signals.ac_drift_ratio > 0.15:
        score += 7
    
    # Blocked tickets từ late dependencies
    blocked_pct = signals.blocked_tickets_count / max(signals.total_tickets, 1)
    if blocked_pct > 0.2:
        score += 10
    elif blocked_pct > 0.1:
        score += 5
    
    return min(score, 100.0)

Layer 3: LLM Risk Scoring — ngữ cảnh mà con số không nói được

Điểm số từ Layer 2 cho chúng tôi biết "bao nhiêu" — nhưng LLM cho chúng tôi biết "tại sao" và "cần làm gì". Chúng tôi dùng C# để gọi LLM vì phần còn lại của toolchain là .NET:

using Azure.AI.OpenAI;
using System.Text.Json;

/// <summary>
/// LLM-based sprint risk analyzer.
/// Nhận structured sprint signals và trả về human-readable risk assessment.
/// </summary>
public class SprintRiskAnalyzer
{
    private readonly OpenAIClient _client;
    private readonly string _deploymentName;

    public SprintRiskAnalyzer(string endpoint, string apiKey, string deployment = "gpt-4o")
    {
        _client = new OpenAIClient(new Uri(endpoint), new Azure.AzureKeyCredential(apiKey));
        _deploymentName = deployment;
    }

    public async Task<SprintRiskReport> AnalyzeAsync(SprintSignalDto signals)
    {
        var prompt = BuildAnalysisPrompt(signals);
        
        var chatOptions = new ChatCompletionsOptions
        {
            DeploymentName = _deploymentName,
            Messages =
            {
                new ChatRequestSystemMessage(
                    "Bạn là một Agile Coach AI chuyên phân tích rủi ro sprint. " +
                    "Phân tích các signals dưới đây và đưa ra assessment ngắn gọn, actionable. " +
                    "Trả về JSON với các field: risk_summary, top_risks (array), immediate_actions (array), " +
                    "predicted_completion_pct (0-100)."),
                new ChatRequestUserMessage(prompt)
            },
            MaxTokens = 800,
            Temperature = 0.3f  // Low temperature để output nhất quán, dự đoán được
        };

        var response = await _client.GetChatCompletionsAsync(chatOptions);
        var content = response.Value.Choices[0].Message.Content;
        
        return JsonSerializer.Deserialize<SprintRiskReport>(content)!;
    }

    private static string BuildAnalysisPrompt(SprintSignalDto s)
    {
        return $"""
            Sprint: {s.SprintName} | Risk Score: {s.RiskScore}/100 | Level: {s.RiskLevel}
            
            SIGNALS:
            - Story point inflation: +{s.PointInflationPct:F1}% so với planning
              (Planned: {s.PlannedPoints} pts → Current: {s.CurrentPoints} pts)
            - Overcommitment ratio: {s.OvercommitRatio:F2}x avg velocity
              (Committed: {s.CurrentCommitment} pts | Avg velocity: {s.AvgVelocityLast3} pts)
            - Ticket churn ratio: {s.ChurnRatio:F2}
              ({s.TicketsAddedMidSprint} tickets added, {s.TicketsRemovedMidSprint} removed)
            - Acceptance criteria drift: {s.AcDriftRatio:P0} tickets có AC thay đổi
            - Blocked tickets: {s.BlockedTicketsCount}/{s.TotalTickets}
              (từ {s.NewDependenciesAdded} late dependencies mới phát hiện)
            
            Historical context: {s.HistoricalContext}
            
            Phân tích rủi ro scope creep và đề xuất action cụ thể.
            """;
    }
}

Kết quả LLM trả về ví dụ:

{
  "risk_summary": "Sprint có nguy cơ scope creep cao. Point inflation 18% kết hợp với 3 tickets bị block bởi late dependencies cho thấy estimate ban đầu thiếu chính xác về technical dependencies.",
  "top_risks": [
    "Integration với payment service chưa được clarify — có thể thêm 5-8 points",
    "2 stories liên quan auth module đang pending review từ security team",
    "AC của US-234 đã mở rộng từ 'basic CRUD' sang 'với audit trail' — chưa re-estimate"
  ],
  "immediate_actions": [
    "Book 30-min scope refinement session với PM hôm nay hoặc ngày mai",
    "Re-estimate US-234 với full AC mới",
    "Tách dependency với payment service sang sprint sau nếu không unblock trong 48h"
  ],
  "predicted_completion_pct": 68
}

Kết quả sau 6 sprint áp dụng

Tôi không muốn đưa ra con số "AI cải thiện X%" mà không có context. Đây là những gì team tôi thực sự quan sát được:

Điều thay đổi rõ nhất: Chúng tôi có một cuộc họp mới: "Day 3 Scope Check" — 15 phút, chỉ xem dashboard alert. Trước đây không ai nghĩ cần họp vào ngày 3. Bây giờ đây là meeting hiệu quả nhất trong sprint.

Về dự đoán: Pipeline alert đúng khoảng 75-80% trong 6 sprint đầu. 20% còn lại là false positive — alert nhưng sprint vẫn xong đúng hạn. Chúng tôi chấp nhận trade-off này vì cost của false positive (một cuộc họp 15 phút) thấp hơn nhiều so với cost của false negative (sprint fail).

Thay đổi về văn hóa: Đây mới là điều tôi không ngờ. Khi team biết có AI đang track point inflation, họ tự nhiên cẩn thận hơn khi re-estimate. Không phải vì sợ — mà vì họ có data để đưa ra quyết định tốt hơn. PM cũng bớt "thêm story nhỏ thôi" vì biết rằng hành động đó sẽ trigger alert.

Bài học rút ra

1. Bắt đầu từ data, không phải từ model

Lỗi phổ biến nhất là đi tìm AI tool trước rồi mới nghĩ đến data. Chúng tôi mất 2 tuần để clean và normalize sprint history data trước khi viết một dòng code AI. Đó là 2 tuần đáng giá nhất.

2. Signal đơn giản thường tốt hơn model phức tạp

Tôi đã từng nghĩ đến việc train một classification model với scikit-learn. Sau đó nhận ra: 5 feature engineering rules đơn giản + LLM context đã cho kết quả tốt hơn trong thực tế, và dễ debug hơn nhiều. KISS wins.

3. Alert phải actionable, không chỉ informational

Ban đầu alert của chúng tôi chỉ nói "Risk: HIGH". Không ai làm gì cả. Sau khi thêm LLM layer để generate "immediate actions", adoption rate tăng đáng kể. Con người cần biết làm gì tiếp theo, không chỉ biết có vấn đề.

4. Tích hợp vào workflow hiện có

Chúng tôi không tạo ra một dashboard mới mà mọi người phải nhớ mở. Alert đi thẳng vào Teams channel mà team đã dùng hằng ngày. Friction bằng zero = adoption tự nhiên.

5. Scope creep không hoàn toàn là xấu

Insight quan trọng nhất: không phải mọi scope change đều là scope creep cần ngăn chặn. AI giúp chúng tôi phân biệt giữa scope change có giá trị (product learning) và scope creep vô tình (không ai để ý). Quyết định vẫn là của con người — AI chỉ đảm bảo quyết định đó được đưa ra có ý thức.

Bước hành động: bắt đầu từ đâu?

Nếu bạn muốn áp dụng ngay mà không cần build full pipeline, đây là lộ trình theo từng tuần:

Tuần 1 — Baseline đo lường:

Xuất sprint data của 6-10 sprint gần nhất từ Azure DevOps/Jira. Tính tay 3 số: planned points, final points, và số tickets added mid-sprint cho mỗi sprint. Nếu average point inflation > 10%, bạn đang có scope creep pattern.

Tuần 2 — Automation đơn giản:

Set up một Python script chạy daily, gọi Azure DevOps API lấy sprint snapshot và ghi vào spreadsheet hoặc database đơn giản. Chỉ vậy thôi. Không cần ML.

Tuần 3 — Alert threshold:

Thêm một condition check: nếu current_points > planned_points * 1.1, gửi một message vào Teams/Slack. Test xem alert có ý nghĩa với team không.

Tuần 4+ — LLM layer:

Khi đã có data và alert hoạt động, thêm LLM prompt để contextualize alert. Đây là bước tạo ra sự khác biệt lớn nhất về adoption.

Toàn bộ code trong bài này đã đủ để bắt đầu. Azure DevOps Python SDK có tại github.com/microsoft/azure-devops-python-api. Nếu bạn dùng Jira, Atlassian cũng có REST API tương tự với endpoints cho sprint reports và issue history.

Takeaway

Scope creep không đánh bại sprint trong một đêm. Nó đến từ hàng chục quyết định nhỏ, mỗi cái đều có vẻ hợp lý ở thời điểm đó. Vấn đề là không ai đang nhìn toàn bộ bức tranh cùng một lúc — cho đến khi đã muộn.

AI không giải quyết được scope creep. Nhưng nó làm được một điều: đảm bảo bạn nhìn thấy pattern đủ sớm để làm gì đó với nó. Phần còn lại vẫn là conversation giữa người với người — PM, tech lead, và team cùng nhau đưa ra quyết định tốt hơn với data tốt hơn.

Architecture là team sport. Sprint planning cũng vậy. AI chỉ là một người đồng đội biết đọc số rất giỏi.

Bạn đang dùng tool hoặc approach nào để track scope creep trong team? Hoặc nếu bạn muốn trao đổi về implementation cụ thể — comment hoặc tìm tôi trên LinkedIn.

Son Do — BKGlobal Tech Team

#BKGlobal #dotnet #architecture #1percentbetter