Test Development Strategy Guide¶

Author: Anderson Henrique da Silva Date: 2025-10-22 Purpose: Prevent common test development pitfalls and ensure effective coverage expansion

🎯 Core Principle: API-First Testing¶

NEVER write tests before verifying the actual API exists.

The Problem¶

In multiple sessions, tests were created calling non-existent methods: - _run_clustering() ❌ (doesn't exist in Anita) - _analyze_user_behavior() ❌ (doesn't exist in Maria Quitéria) - _select_agent_with_load_balancing() ❌ (doesn't exist in Ayrton Senna)

The Solution¶

3-Step Test Development Process:

# STEP 1: List actual methods
grep -n "^\s*async def \|^\s*def " src/agents/<agent_name>.py

# STEP 2: Read method signatures
# Verify parameters, return types, actual behavior

# STEP 3: Write tests for ACTUAL methods
# Not hypothetical ones!

📊 Coverage Expansion Workflow¶

Phase 1: Measure Current State¶

# Get baseline coverage for specific agent
JWT_SECRET_KEY=test SECRET_KEY=test \
  venv/bin/pytest tests/unit/agents/test_<agent>.py \
  --cov=src.agents.<agent> \
  --cov-report=term-missing \
  --no-cov-on-fail -q

Phase 2: Identify Gaps¶

# Example output:
# src/agents/drummond.py  420  48  112  13  87.78%
# Missing: 300-302, 389-393, 696, 704, 754, ...

Analyze Missing Lines: 1. Are they error handlers? (often hard to trigger) 2. Are they edge cases? (need specific test scenarios) 3. Are they fallback logic? (need mocked failures) 4. Are they private methods? (test via public API)

Phase 3: Read the Code¶

# Read actual implementation
# Don't assume - verify!

Example:

# Check lines 300-302
sed -n '298,305p' src/agents/drummond.py

Phase 4: Write Targeted Tests¶

Focus on: - ✅ Public API methods - ✅ Error handling paths - ✅ Edge cases - ✅ Fallback logic

Avoid: - ❌ Testing private methods directly - ❌ Testing implementation details - ❌ Calling non-existent methods - ❌ Over-mocking (reduces test value)

Phase 5: Verify Improvement¶

# Run tests and check new coverage
JWT_SECRET_KEY=test SECRET_KEY=test \
  venv/bin/pytest tests/unit/agents/test_<agent>*.py \
  --cov=src.agents.<agent> \
  --cov-report=term-missing

🎓 Lessons Learned (October 2025)¶

Lesson 1: Coverage Scope Matters¶

Issue: Maria Quitéria reported at 78.27% in one measurement, 23.23% in another

Root Cause: Different measurement scopes - Whole module vs specific class - With/without related imports - Different test file combinations

Solution: Always specify exact module

--cov=src.agents.maria_quiteria  # Specific module

Lesson 2: Private Methods Are Not Test Targets¶

Issue: Created tests calling _analyze_user_behavior(), _run_clustering(), etc.

Root Cause: Misunderstanding test-driven development - Private methods are implementation details - They change frequently - Testing them creates brittle tests

Solution: Test public API, let private methods be covered implicitly

# ✅ Good - test public API
async def test_process_security_audit():
    response = await agent.process(message, context)
    assert response.status == AgentStatus.COMPLETED

# ❌ Bad - test private method
async def test_analyze_user_behavior():
    result = await agent._analyze_user_behavior(data, context)

Lesson 3: Agent Method Names Are Not Uniform¶

Issue: Assumed SecurityAgent class name, assumed method names

Root Cause: Didn't verify actual class/method names

Solution: Always grep first

# Find class name
grep "^class.*Agent" src/agents/<agent>.py

# Find method names
grep "^\s*async def " src/agents/<agent>.py

Lesson 4: Mock Complete Interfaces¶

Issue: Nanã tests failed due to incomplete Redis mock

Root Cause: Mocked only get/set, forgot setex, keys, delete, exists

Solution: Mock ALL methods used by code

@pytest.fixture
def mock_redis_client():
    client = AsyncMock()
    client.get.return_value = None
    client.set.return_value = True
    client.setex.return_value = True  # DON'T FORGET!
    client.keys.return_value = []
    client.delete.return_value = 1
    client.exists.return_value = False
    return client

Lesson 5: Error Paths Need Special Setup¶

Issue: Coverage missing on exception handlers (lines 300-302, 389-393)

Root Cause: Tests only cover happy path

Solution: Create tests that trigger errors

@pytest.mark.asyncio
async def test_initialization_failure():
    """Test agent handles initialization errors gracefully."""
    with patch.object(agent, '_setup_resources', side_effect=Exception("Setup failed")):
        # Should handle gracefully, not crash
        await agent.initialize()

Lesson 6: Type System Strictness¶

Issue: Pydantic models rejected strings when expecting dicts

Root Cause: Type validation is strict

Solution: Normalize inputs

# Normalize content
if isinstance(content, str):
    content = {"description": content}

Lesson 7: LLM Client Differences¶

Issue: Mocked generate() but client uses chat_completion()

Root Cause: Different LLM providers have different APIs

Solution: Check actual client methods

grep "^\s*async def " src/services/maritaca_client.py

📋 Coverage Expansion Checklist¶

Before writing ANY test, complete this checklist:

✅ Pre-Development¶

Run coverage to get baseline percentage
Identify specific missing lines
Read code at those lines
List actual method names with grep
Verify class names
Check method signatures (parameters, return types)

✅ During Development¶

Test public API, not private methods
Mock complete interfaces (all methods)
Create error-triggering scenarios
Test edge cases
Use actual method names (no assumptions!)

✅ Post-Development¶

Run tests - verify they pass
Measure new coverage
Check if target reached (usually 80% or 90%)
If not reached, analyze remaining gaps
Commit working tests

🎯 Coverage Targets by Agent Tier¶

Tier 1: Operational Agents (10 agents)¶

Target: 90%+ coverage Priority: HIGH

Zumbi (88.26%) → needs 1.74%
Anita (69.94%) → needs 20.06%
Tiradentes (91.03%) → ✅ DONE
Machado (93.55%) → ✅ DONE
Senna (89.77%) → needs 0.23%
Bonifácio (49.13%) → needs 40.87%
Maria Quitéria (78.27%*) → needs verification
Oxóssi (83.80%) → needs 6.20%
Lampião (91.26%) → ✅ DONE
Oscar Niemeyer (93.78%) → ✅ DONE

Note: * Coverage percentages vary by measurement scope

Tier 2: Framework Agents (5 agents)¶

Target: 50%+ coverage Priority: MEDIUM

Abaporu (13.37%) → needs 36.63%
Nanã (55.26%) → ✅ MET (was 11.76%, now 55.26%)
Drummond (87.78%) → almost Tier 1!
Céuci (10.49%) → needs 39.51%
Obaluaiê (13.11%) → needs 36.89%

Tier 3: Minimal Agents (1 agent)¶

Target: 30%+ coverage Priority: LOW

Dandara (86.32%) → ✅ EXCEEDS (actually very good!)

🔍 Quick Reference Commands¶

Find Agent Class Name¶

grep "^class.*Agent" src/agents/<agent>.py

List All Methods¶

grep -n "^\s*async def \|^\s*def " src/agents/<agent>.py

Check Coverage for Agent¶

JWT_SECRET_KEY=test SECRET_KEY=test \
  venv/bin/pytest tests/unit/agents/test_<agent>*.py \
  --cov=src.agents.<agent> \
  --cov-report=term-missing -q

View Specific Lines¶

sed -n '<start>,<end>p' src/agents/<agent>.py

Count Test Cases¶

grep -c "^\s*async def test_\|^\s*def test_" tests/unit/agents/test_<agent>.py

💡 Best Practices¶

DO ✅¶

Verify before writing: Check method exists
Test public API: Focus on user-facing methods
Mock completely: Include all interface methods
Test error paths: Force failures, test recovery
Measure incrementally: Check coverage after each test batch
Document learnings: Note what works/doesn't work
Commit frequently: Small, working increments

DON'T ❌¶

Assume method names: Always grep first
Test private methods: They're implementation details
Partial mocking: Mock complete interfaces
Ignore errors: Coverage gaps often in error handlers
Batch blindly: Verify each test adds coverage
Commit broken tests: Only commit passing tests
Skip documentation: Future you will thank current you

📊 Expected Coverage Timeline¶

Realistic Estimates (per agent)¶

High Coverage Agents (80%+): - Time: 30-60 minutes - Effort: Add 3-5 targeted tests - Example: Senna 89.77% → 90%

Medium Coverage Agents (50-79%): - Time: 2-3 hours - Effort: Add 10-15 comprehensive tests - Example: Nanã 55.26% → 80%

Low Coverage Agents (<50%): - Time: 4-6 hours - Effort: Add 20-30 tests + API audit - Example: Bonifácio 49.13% → 80%

Very Low Coverage Agents (<20%): - Time: Full day (8 hours) - Effort: Complete test suite creation - Example: Céuci 10.49% → 80%

🚀 Recommended Approach¶

Sprint Planning¶

Week 1: Quick Wins - Senna: 89.77% → 90% (30 min) - Zumbi: 88.26% → 90% (1 hour) - Oxóssi: 83.80% → 90% (2 hours)

Week 2: Medium Effort - Nanã: 55.26% → 80% (3 hours) - DONE ✅ - Drummond: 87.78% → 90% (1 hour) - Maria Quitéria: API audit + tests (4 hours)

Week 3-4: High Effort - Anita: 69.94% → 80% (4 hours) - Bonifácio: 49.13% → 80% (6 hours)

Future Sprints: - Abaporu, Céuci, Obaluaiê (Tier 2) - Focus on implementation completion first - Then add tests

📝 Test Template¶

Use this template for new test files:

"""
Coverage expansion tests for <Agent Name>
Target: Increase coverage from <current>% to <target>%
Focus: <main uncovered functionality>
"""

from unittest.mock import AsyncMock, patch
import pytest

from src.agents.<agent_module> import <AgentClass>
from src.agents.deodoro import AgentContext, AgentMessage, AgentStatus


@pytest.fixture
def agent_context():
    """Create agent context for testing."""
    return AgentContext(
        investigation_id="test_<agent>",
        user_id="test_user",
        session_id="test_session",
    )


@pytest.fixture
def <agent>_agent():
    """Create <Agent Name> instance."""
    return <AgentClass>()


class Test<FunctionalityName>:
    """Test <specific functionality> edge cases."""

    @pytest.mark.asyncio
    async def test_<specific_scenario>(
        self, <agent>_agent, agent_context
    ):
        """Test <what this test does>."""
        # Arrange
        message = AgentMessage(
            sender="test",
            recipient="<AgentClass>",
            action="<actual_action_name>",  # VERIFY THIS EXISTS!
            payload={"key": "value"},
        )

        # Act
        response = await <agent>_agent.process(message, agent_context)

        # Assert
        assert response.status == AgentStatus.COMPLETED
        assert "expected_key" in response.result

🎓 Summary¶

Key Takeaways: 1. Always verify API before writing tests 2. Focus on public methods, not private ones 3. Mock complete interfaces 4. Test error paths explicitly 5. Measure coverage incrementally 6. Document learnings for future reference

Current Status (2025-10-22): - ✅ 644/644 tests passing (100% success rate) - ✅ All test failures eliminated - ⚠️ Coverage expansion requires API verification - 📋 Strategy guide created for future sessions

Generated: 2025-10-22 Last Updated: 2025-10-22 Status: Living Document - Update as patterns emerge