Flutter testing — unit tests, widget tests, golden tests, integration tests, mock strategies

Testing

Flutter ships four test layers out of the box, and the discipline is knowing which layer to use for which guarantee. The wrong layer wastes either time or confidence. This page covers each layer in depth, with production-grade patterns and the anti-patterns that waste your team's time.

Testing Pyramid in Flutter

The classic testing pyramid (many unit, fewer integration, fewest E2E) still applies, but Flutter shifts the proportions. Widget tests in Flutter are cheap — they run in a headless environment with no emulator, complete in milliseconds, and exercise real rendering logic. This makes them far cheaper than browser-based component tests in web frameworks, so you can afford more of them.

Layer	Speed	Confidence	Maintenance Cost	Recommended Volume
Unit	~1 ms per test	Logic correctness	Very low	Many
Widget	~10-50 ms per test	Rendering + interaction	Low	Many
Golden	~100 ms per test	Visual regression	Medium-high	Selective
Integration	Seconds-minutes	Full-stack behavior	High	Critical paths only

The practical distribution for most Flutter projects: 60% unit, 25% widget, 10% golden, 5% integration. Adjust based on your app's risk profile — a design-heavy app with strict brand guidelines deserves more goldens; a data-pipeline app deserves more unit tests.

The cost axis that matters most is maintenance, not writing time. A test that takes 10 minutes to write but never breaks is cheaper than a test that takes 2 minutes to write but breaks every sprint. Integration tests and golden tests are expensive not because they're hard to write, but because they break for reasons unrelated to bugs.

Unit Testing

Unit tests verify pure Dart logic with no Flutter framework involvement. They are the fastest to write, fastest to run, and most stable.

Testing Pure Dart Logic

Validators, formatters, parsers, and business rules are the highest-ROI test targets. They are deterministic, have no dependencies, and break only when the logic is wrong.

// lib/core/utils/email_validator.dart
class EmailValidator {
  static bool isValid(String email) {
    return RegExp(r'^[\w\-.]+@([\w-]+\.)+[\w-]{2,}$').hasMatch(email);
  }
}

// test/core/utils/email_validator_test.dart
import 'package:test/test.dart';
import 'package:myapp/core/utils/email_validator.dart';

void main() {
  group('EmailValidator', () {
    test('should accept valid email', () {
      expect(EmailValidator.isValid('user@example.com'), isTrue);
    });

    test('should reject email without domain', () {
      expect(EmailValidator.isValid('user@'), isFalse);
    });

    test('should reject empty string', () {
      expect(EmailValidator.isValid(''), isFalse);
    });
  });
}

Testing Repository / Service Classes

Repositories and services have external dependencies (APIs, databases). Inject dependencies via constructor parameters so tests can substitute fakes or mocks.

// lib/features/order/data/order_repository.dart
class OrderRepository {
  final OrderApi _api;
  final OrderCache _cache;

  OrderRepository({required OrderApi api, required OrderCache cache})
      : _api = api,
        _cache = cache;

  Future<Order> getOrder(String id) async {
    final cached = _cache.get(id);
    if (cached != null) return cached;
    final order = await _api.fetchOrder(id);
    _cache.put(order);
    return order;
  }
}

// test/features/order/data/order_repository_test.dart
import 'package:mocktail/mocktail.dart';
import 'package:test/test.dart';

class MockOrderApi extends Mock implements OrderApi {}
class MockOrderCache extends Mock implements OrderCache {}

void main() {
  late OrderRepository repository;
  late MockOrderApi mockApi;
  late MockOrderCache mockCache;

  setUp(() {
    mockApi = MockOrderApi();
    mockCache = MockOrderCache();
    repository = OrderRepository(api: mockApi, cache: mockCache);
  });

  test('should return cached order when available', () async {
    final order = Order(id: '1', total: 99.99);
    when(() => mockCache.get('1')).thenReturn(order);

    final result = await repository.getOrder('1');

    expect(result, equals(order));
    verifyNever(() => mockApi.fetchOrder(any()));
  });

  test('should fetch from API and cache when not cached', () async {
    final order = Order(id: '1', total: 99.99);
    when(() => mockCache.get('1')).thenReturn(null);
    when(() => mockApi.fetchOrder('1')).thenAnswer((_) async => order);
    when(() => mockCache.put(order)).thenReturn(null);

    final result = await repository.getOrder('1');

    expect(result, equals(order));
    verify(() => mockCache.put(order)).called(1);
  });
}

Testing ChangeNotifier / ViewModel

ChangeNotifier-based ViewModels can be tested without widgets by listening for notifyListeners calls.

// lib/features/cart/presentation/cart_viewmodel.dart
class CartViewModel extends ChangeNotifier {
  final CartRepository _repository;
  List<CartItem> _items = [];
  List<CartItem> get items => List.unmodifiable(_items);
  double get total => _items.fold(0, (sum, item) => sum + item.price);

  CartViewModel({required CartRepository repository})
      : _repository = repository;

  Future<void> loadItems() async {
    _items = await _repository.getItems();
    notifyListeners();
  }

  void removeItem(String id) {
    _items.removeWhere((item) => item.id == id);
    notifyListeners();
  }
}

// test/features/cart/presentation/cart_viewmodel_test.dart
void main() {
  late CartViewModel viewModel;
  late MockCartRepository mockRepo;

  setUp(() {
    mockRepo = MockCartRepository();
    viewModel = CartViewModel(repository: mockRepo);
  });

  test('should load items and notify listeners', () async {
    final items = [CartItem(id: '1', name: 'Widget', price: 9.99)];
    when(() => mockRepo.getItems()).thenAnswer((_) async => items);

    var notified = false;
    viewModel.addListener(() => notified = true);

    await viewModel.loadItems();

    expect(viewModel.items, equals(items));
    expect(notified, isTrue);
  });

  test('should calculate total correctly', () async {
    final items = [
      CartItem(id: '1', name: 'A', price: 10.0),
      CartItem(id: '2', name: 'B', price: 20.0),
    ];
    when(() => mockRepo.getItems()).thenAnswer((_) async => items);
    await viewModel.loadItems();

    expect(viewModel.total, equals(30.0));
  });
}

Riverpod Testing with ProviderContainer

Riverpod providers can be tested without any widget tree by using ProviderContainer directly. Override dependencies by passing them to the container.

// Provider under test
final orderRepositoryProvider = Provider<OrderRepository>((ref) {
  return OrderRepository(api: ref.read(apiProvider));
});

final orderProvider = FutureProvider.family<Order, String>((ref, id) {
  return ref.read(orderRepositoryProvider).getOrder(id);
});

// Test
void main() {
  test('orderProvider returns order from repository', () async {
    final mockRepo = MockOrderRepository();
    final order = Order(id: '1', total: 50.0);
    when(() => mockRepo.getOrder('1')).thenAnswer((_) async => order);

    final container = ProviderContainer(
      overrides: [
        orderRepositoryProvider.overrideWithValue(mockRepo),
      ],
    );
    addTearDown(container.dispose);

    // Wait for the FutureProvider to resolve
    final result = await container.read(orderProvider('1').future);
    expect(result, equals(order));
  });
}

Always call container.dispose() in teardown. Leaked containers mean leaked subscriptions. Use addTearDown(container.dispose) immediately after creation to ensure cleanup even when assertions fail.

Bloc Testing with bloc_test

The bloc_test package provides blocTest — a declarative way to test event-to-state transitions.

import 'package:bloc_test/bloc_test.dart';
import 'package:test/test.dart';

void main() {
  group('AuthBloc', () {
    late MockAuthRepository mockAuthRepo;

    setUp(() {
      mockAuthRepo = MockAuthRepository();
    });

    blocTest<AuthBloc, AuthState>(
      'emits [loading, authenticated] when login succeeds',
      build: () {
        when(() => mockAuthRepo.login('user', 'pass'))
            .thenAnswer((_) async => User(name: 'user'));
        return AuthBloc(authRepository: mockAuthRepo);
      },
      act: (bloc) => bloc.add(LoginRequested('user', 'pass')),
      expect: () => [
        AuthState.loading(),
        AuthState.authenticated(User(name: 'user')),
      ],
    );

    blocTest<AuthBloc, AuthState>(
      'emits [loading, error] when login fails',
      build: () {
        when(() => mockAuthRepo.login(any(), any()))
            .thenThrow(AuthException('Invalid credentials'));
        return AuthBloc(authRepository: mockAuthRepo);
      },
      act: (bloc) => bloc.add(LoginRequested('user', 'wrong')),
      expect: () => [
        AuthState.loading(),
        AuthState.error('Invalid credentials'),
      ],
    );

    blocTest<AuthBloc, AuthState>(
      'starts from seeded state',
      seed: () => AuthState.authenticated(User(name: 'existing')),
      build: () => AuthBloc(authRepository: mockAuthRepo),
      act: (bloc) => bloc.add(LogoutRequested()),
      expect: () => [AuthState.unauthenticated()],
    );
  });
}

Async Testing Patterns

Dart's test framework has first-class support for Futures and Streams.

void main() {
  test('stream emits values in order', () {
    final controller = StreamController<int>();

    expectLater(
      controller.stream,
      emitsInOrder([1, 2, 3, emitsDone]),
    );

    controller
      ..add(1)
      ..add(2)
      ..add(3)
      ..close();
  });

  test('future completes with value', () {
    expect(
      Future.delayed(Duration(milliseconds: 10), () => 42),
      completion(equals(42)),
    );
  });

  test('future throws expected error', () {
    expect(
      Future.error(ArgumentError('bad')),
      throwsA(isA<ArgumentError>()),
    );
  });

  test('stream emits error then recovers', () {
    final stream = Stream.fromIterable([1, 2])
        .map((i) => i == 2 ? throw FormatException() : i)
        .handleError((_) {});

    expectLater(
      stream,
      emitsInOrder([1, emitsDone]),
    );
  });
}

Mocking with mockito and mocktail

Both packages provide the same core API (when, verify, argument matchers). mocktail is preferred in modern Flutter projects because it does not require code generation.

import 'package:mocktail/mocktail.dart';

class MockUserApi extends Mock implements UserApi {}

void main() {
  late MockUserApi mockApi;

  setUpAll(() {
    // Register fallback values for argument matchers with custom types
    registerFallbackValue(CreateUserRequest(name: '', email: ''));
  });

  setUp(() {
    mockApi = MockUserApi();
  });

  test('verify interaction with argument matchers', () async {
    when(() => mockApi.createUser(any()))
        .thenAnswer((_) async => User(id: '1', name: 'Test'));

    final service = UserService(api: mockApi);
    await service.register('Test', 'test@example.com');

    verify(() => mockApi.createUser(
      any(that: isA<CreateUserRequest>()
          .having((r) => r.name, 'name', 'Test')
          .having((r) => r.email, 'email', 'test@example.com')),
    )).called(1);
  });

  test('stub sequential responses', () async {
    var callCount = 0;
    when(() => mockApi.fetchUser('1')).thenAnswer((_) async {
      callCount++;
      if (callCount == 1) throw NetworkException();
      return User(id: '1', name: 'Test');
    });

    final service = UserService(api: mockApi);
    expect(() => service.getUser('1'), throwsA(isA<NetworkException>()));

    final user = await service.getUser('1');
    expect(user.name, equals('Test'));
  });
}

Faking vs Mocking

Aspect	Mock	Fake
Definition	Auto-generated stub with `when`/`verify`	Hand-written implementation of the interface
Best for	Verifying interactions, quick setup	Complex stateful behavior, realistic simulation
Maintenance	Low per test, but `when` chains get noisy	Higher upfront, lower long-term for complex deps
Danger	Over-specifying interactions	Drift from real implementation

Decision tree: if the dependency is simple and you mainly care about "was it called correctly?" — mock it. If the dependency has complex state transitions (an in-memory database, a navigation stack) — write a fake. If in doubt, start with a mock; refactor to a fake when the mock setup becomes painful.

// Fake example — simulates real cache behavior
class FakeOrderCache implements OrderCache {
  final _store = <String, Order>{};

  @override
  Order? get(String id) => _store[id];

  @override
  void put(Order order) => _store[order.id] = order;

  @override
  void clear() => _store.clear();
}

Widget tests render real widgets in a headless test environment. They exercise layout, rendering, interaction, and state management without an emulator. This is Flutter's biggest testing advantage over web frameworks.

testWidgets and WidgetTester

Every widget test uses testWidgets, which provides a WidgetTester for controlling the widget lifecycle.

import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';

void main() {
  testWidgets('CounterPage increments on tap', (tester) async {
    await tester.pumpWidget(
      const MaterialApp(home: CounterPage()),
    );

    // Initial state
    expect(find.text('0'), findsOneWidget);

    // Tap the button
    await tester.tap(find.byIcon(Icons.add));
    await tester.pump(); // Rebuild after setState

    // Verify new state
    expect(find.text('1'), findsOneWidget);
  });
}

Key WidgetTester methods:

Method	When to use
`pumpWidget(widget)`	First render of the widget tree
`pump()`	Trigger a single frame rebuild
`pump(Duration)`	Advance clock by duration and rebuild
`pumpAndSettle()`	Pump until no more frames are scheduled (animations complete)
`pumpAndSettle(timeout)`	Same, with a timeout to avoid infinite loops

Avoid pumpAndSettle when the widget has infinite animations (progress indicators, shimmer effects). It will time out. Use pump(Duration) instead to advance a known amount of time.

Finding Widgets

The find object provides finders for locating widgets in the rendered tree.

// By type
find.byType(ElevatedButton)

// By Key — the most reliable finder for testing
find.byKey(const Key('submit-button'))

// By text content
find.text('Submit')

// By icon
find.byIcon(Icons.add)

// By widget predicate — for complex conditions
find.byWidgetPredicate(
  (widget) => widget is Text && widget.data!.startsWith('Error'),
)

// Descendant — find a Text inside a specific Card
find.descendant(
  of: find.byKey(const Key('order-card')),
  matching: find.text('\$99.99'),
)

Interaction Testing

testWidgets('login form validates and submits', (tester) async {
  await tester.pumpWidget(MaterialApp(home: LoginPage()));

  // Enter text
  await tester.enterText(find.byKey(const Key('email-field')), 'user@test.com');
  await tester.enterText(find.byKey(const Key('password-field')), 'secret');

  // Tap submit
  await tester.tap(find.byKey(const Key('login-button')));
  await tester.pumpAndSettle();

  // Verify navigation occurred
  expect(find.byType(HomePage), findsOneWidget);
});

testWidgets('swipe to dismiss removes item', (tester) async {
  await tester.pumpWidget(MaterialApp(home: ItemListPage()));

  // Swipe the first item left
  await tester.drag(find.text('Item 1'), const Offset(-500, 0));
  await tester.pumpAndSettle();

  expect(find.text('Item 1'), findsNothing);
});

testWidgets('long press shows context menu', (tester) async {
  await tester.pumpWidget(MaterialApp(home: NotesPage()));

  await tester.longPress(find.text('My Note'));
  await tester.pumpAndSettle();

  expect(find.text('Delete'), findsOneWidget);
  expect(find.text('Share'), findsOneWidget);
});

Testing with Provider / Riverpod Overrides

Widget tests need to supply dependencies. Wrap the widget under test with appropriate providers.

// Riverpod widget test with overrides
testWidgets('OrderPage shows order details', (tester) async {
  final mockRepo = MockOrderRepository();
  final order = Order(id: '1', title: 'Test Order', total: 42.0);
  when(() => mockRepo.getOrder('1')).thenAnswer((_) async => order);

  await tester.pumpWidget(
    ProviderScope(
      overrides: [
        orderRepositoryProvider.overrideWithValue(mockRepo),
      ],
      child: const MaterialApp(home: OrderPage(orderId: '1')),
    ),
  );
  await tester.pumpAndSettle();

  expect(find.text('Test Order'), findsOneWidget);
  expect(find.text('\$42.00'), findsOneWidget);
});

Mock NavigatorObserver to verify route transitions without rendering the destination page.

class MockNavigatorObserver extends Mock implements NavigatorObserver {}

testWidgets('tapping order navigates to detail page', (tester) async {
  final observer = MockNavigatorObserver();

  await tester.pumpWidget(
    MaterialApp(
      home: OrderListPage(),
      navigatorObservers: [observer],
      routes: {
        '/order': (_) => const Scaffold(body: Text('Detail')),
      },
    ),
  );

  await tester.tap(find.text('Order #1'));
  await tester.pumpAndSettle();

  verify(() => observer.didPush(any(), any())).called(greaterThan(0));
  expect(find.text('Detail'), findsOneWidget);
});

Testing Async Widgets

Widgets backed by FutureBuilder or StreamBuilder need careful pumping to move through loading, data, and error states.

testWidgets('UserProfile shows loading then data', (tester) async {
  final completer = Completer<User>();

  await tester.pumpWidget(
    MaterialApp(
      home: UserProfile(userFuture: completer.future),
    ),
  );

  // Loading state
  expect(find.byType(CircularProgressIndicator), findsOneWidget);

  // Resolve the future
  completer.complete(User(name: 'Alice'));
  await tester.pumpAndSettle();

  // Data state
  expect(find.text('Alice'), findsOneWidget);
  expect(find.byType(CircularProgressIndicator), findsNothing);
});

MediaQuery / Theme / Locale Wrappers

Production widgets depend on MediaQuery, Theme, and Localizations. Create a shared test wrapper to avoid boilerplate.

// test/helpers/test_app.dart
Widget buildTestApp(
  Widget child, {
  Size screenSize = const Size(390, 844), // iPhone 14 Pro
  Brightness brightness = Brightness.light,
  Locale locale = const Locale('en'),
}) {
  return MediaQuery(
    data: MediaQueryData(size: screenSize),
    child: MaterialApp(
      theme: brightness == Brightness.light ? lightTheme : darkTheme,
      locale: locale,
      localizationsDelegates: AppLocalizations.localizationsDelegates,
      supportedLocales: AppLocalizations.supportedLocales,
      home: child,
    ),
  );
}

// Usage in tests
testWidgets('renders correctly on small screen', (tester) async {
  await tester.pumpWidget(
    buildTestApp(
      const ProductCard(product: testProduct),
      screenSize: const Size(320, 568), // iPhone SE
    ),
  );
  // assertions...
});

Custom Matchers

Flutter's test framework provides several widget-specific matchers. Combine them with isA<T> for type-safe assertions.

// Built-in matchers
expect(find.byType(AppBar), findsOneWidget);
expect(find.text('Deleted'), findsNothing);
expect(find.byType(ListTile), findsNWidgets(3));
expect(find.byType(ListTile), findsAtLeast(1));

// Type-checking with property matchers
expect(
  tester.widget<Text>(find.byKey(const Key('price'))),
  isA<Text>()
      .having((t) => t.data, 'data', '\$42.00')
      .having((t) => t.style?.color, 'color', Colors.green),
);

Golden Testing

Golden tests capture a screenshot of a widget and compare it against a saved reference image. They catch visual regressions that unit and widget tests miss — wrong colors, misaligned layouts, missing icons.

When Golden Tests Are Worth It

Golden tests have the highest maintenance cost of any Flutter test type. Use them selectively:

Good candidates	Poor candidates
Design system components (buttons, cards, inputs)	Screens with dynamic data
Brand-critical UI (logo, onboarding)	Lists with variable-length content
Complex custom painting (charts, graphs)	Widgets that change frequently during development

Setting Up Golden Files

testWidgets('PrimaryButton matches golden', (tester) async {
  await tester.pumpWidget(
    MaterialApp(
      theme: appTheme,
      home: Scaffold(
        body: Center(
          child: PrimaryButton(
            label: 'Continue',
            onPressed: () {},
          ),
        ),
      ),
    ),
  );

  await expectLater(
    find.byType(PrimaryButton),
    matchesGoldenFile('goldens/primary_button.png'),
  );
});

Updating Goldens

When you intentionally change a widget's appearance, update the reference images:

flutter test --update-goldens
# Or for a specific file:
flutter test --update-goldens test/widgets/primary_button_test.dart

Review the diff in your image diff tool or PR review before committing. Blind updates defeat the purpose of golden tests.

Multi-Platform Golden Divergence

Golden files are pixel-sensitive, and font rendering differs between macOS, Linux, and Windows. This causes goldens generated on one OS to fail on another.

Strategies:

Generate goldens in CI only. Run --update-goldens in a dedicated CI step on a fixed OS (typically Linux). Developers run the comparison tests locally, but only CI is authoritative.
Use a font that renders identically cross-platform. Load a bundled test font (e.g., Roboto from the google_fonts package) to reduce divergence.
Tolerance threshold. Some golden packages allow pixel-diff tolerances.

Font Loading in Tests

By default, Flutter tests use the Ahem font (all squares). To render real text in goldens, load your app fonts:

// test/helpers/golden_helpers.dart
Future<void> loadAppFonts() async {
  final fontData = rootBundle.load('assets/fonts/Roboto-Regular.ttf');
  final fontLoader = FontLoader('Roboto')..addFont(fontData);
  await fontLoader.load();
}

// In the test
void main() {
  setUpAll(() async {
    await loadAppFonts();
  });

  testWidgets('renders with real fonts', (tester) async {
    // ...
  });
}

CI Considerations

Font rendering differences are the most common cause of golden test failures in CI. Mitigations:

Pin the CI runner OS and version (e.g., ubuntu-22.04, not ubuntu-latest).
Include fonts in the repo rather than downloading at test time.
Run goldens as a separate CI job that can be retried independently.
Consider the golden_toolkit package for multi-device screenshot generation.

golden_toolkit

The golden_toolkit package simplifies multi-scenario golden testing:

import 'package:golden_toolkit/golden_toolkit.dart';

void main() {
  testGoldens('ProductCard renders across devices', (tester) async {
    final builder = DeviceBuilder()
      ..overrideDevicesForAllScenarios(devices: [
        Device.phone,
        Device.iphone11,
        Device.tabletPortrait,
      ])
      ..addScenario(
        widget: const ProductCard(product: sampleProduct),
        name: 'default',
      )
      ..addScenario(
        widget: const ProductCard(product: sampleProduct, isOnSale: true),
        name: 'on_sale',
      );

    await tester.pumpDeviceBuilder(builder);
    await screenMatchesGolden(tester, 'product_card_multi_device');
  });
}

Integration Testing

Integration tests run on a real device or emulator and exercise the full app stack — real rendering, real navigation, real animations, and optionally real network calls.

Setup

Add the integration_test package (shipped with Flutter, no pub dependency needed):

# pubspec.yaml
dev_dependencies:
  integration_test:
    sdk: flutter
  flutter_test:
    sdk: flutter

Create the test entry point:

// integration_test/app_test.dart
import 'package:flutter_test/flutter_test.dart';
import 'package:integration_test/integration_test.dart';
import 'package:myapp/main.dart' as app;

void main() {
  IntegrationTestWidgetsFlutterBinding.ensureInitialized();

  testWidgets('full login flow', (tester) async {
    app.main();
    await tester.pumpAndSettle();

    // Enter credentials
    await tester.enterText(find.byKey(const Key('email')), 'test@test.com');
    await tester.enterText(find.byKey(const Key('password')), 'password123');
    await tester.tap(find.byKey(const Key('login-button')));
    await tester.pumpAndSettle(const Duration(seconds: 5));

    // Verify we landed on the home page
    expect(find.text('Welcome back'), findsOneWidget);
  });
}

Run with:

flutter test integration_test/app_test.dart
# On a specific device:
flutter test integration_test/app_test.dart -d <device-id>

Real vs Mock Backends

Approach	Pros	Cons
Real backend	Tests the full stack, catches API contract bugs	Slow, flaky, needs test data management
Mock backend	Fast, deterministic, runs offline	Misses real integration bugs
Recorded responses (VCR-style)	Deterministic + realistic data	Recordings go stale

For CI, use a mock backend or recorded responses. Run against the real staging backend in a nightly job, not on every PR.

Running on Real Devices vs Emulators

Emulators/simulators: faster to spin up, adequate for most integration tests. Use them in CI.
Real devices: needed for performance testing, camera/GPS/Bluetooth tests, and final pre-release validation.
Firebase Test Lab: runs integration tests on a matrix of real devices in Google's lab. Configure via gcloud firebase test android run.

patrol Package

The patrol package extends Flutter integration tests with native interaction capabilities — handling permission dialogs, system alerts, notifications, and WebView interactions that integration_test cannot reach.

// integration_test/permissions_test.dart
import 'package:patrol/patrol.dart';

void main() {
  patrolTest('grants camera permission and takes photo', ($) async {
    await $.pumpWidgetAndSettle(const MyApp());

    await $.tap(find.text('Take Photo'));

    // Handle the native permission dialog
    await $.native.grantPermissionWhenInUse();

    await $.pumpAndSettle();
    expect(find.byType(PhotoPreview), findsOneWidget);
  });
}

CI Setup for Integration Tests

Integration tests in CI require a running emulator or a device farm.

GitHub Actions with Android emulator:

jobs:
  integration_test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: subosito/flutter-action@v2
      - name: Start emulator
        uses: reactivecircus/android-emulator-runner@v2
        with:
          api-level: 33
          script: flutter test integration_test/

Firebase Test Lab:

# Build the test APK
flutter build apk --debug
flutter build apk --debug integration_test/app_test.dart

# Run on Test Lab
gcloud firebase test android run \
  --type instrumentation \
  --app build/app/outputs/flutter-apk/app-debug.apk \
  --test build/app/outputs/flutter-apk/app-debug-androidTest.apk \
  --device model=Pixel6,version=33

Mock Strategies

Platform Channel Mocking

Platform channels (camera, battery, file picker) need explicit mocking in tests. Use TestDefaultBinaryMessengerBinding to intercept method channel calls.

testWidgets('shows battery level', (tester) async {
  // Mock the platform channel
  TestDefaultBinaryMessengerBinding.instance.defaultBinaryMessenger
      .setMockMethodCallHandler(
    const MethodChannel('plugins.flutter.io/battery'),
    (call) async {
      if (call.method == 'getBatteryLevel') return 72;
      return null;
    },
  );

  await tester.pumpWidget(const MaterialApp(home: BatteryPage()));
  await tester.pumpAndSettle();

  expect(find.text('72%'), findsOneWidget);
});

HTTP Mocking

For Dio-based HTTP clients, use http_mock_adapter:

import 'package:dio/dio.dart';
import 'package:http_mock_adapter/http_mock_adapter.dart';

void main() {
  late Dio dio;
  late DioAdapter adapter;

  setUp(() {
    dio = Dio();
    adapter = DioAdapter(dio: dio);
  });

  test('fetches user from API', () async {
    adapter.onGet(
      '/users/1',
      (server) => server.reply(200, {'id': '1', 'name': 'Alice'}),
    );

    final service = UserService(dio: dio);
    final user = await service.fetchUser('1');

    expect(user.name, equals('Alice'));
  });

  test('handles 404', () async {
    adapter.onGet(
      '/users/999',
      (server) => server.reply(404, {'error': 'Not found'}),
    );

    final service = UserService(dio: dio);
    expect(
      () => service.fetchUser('999'),
      throwsA(isA<UserNotFoundException>()),
    );
  });
}

Mocking SharedPreferences

import 'package:shared_preferences/shared_preferences.dart';

void main() {
  test('reads saved theme preference', () async {
    // Set initial values before any code runs
    SharedPreferences.setMockInitialValues({'theme': 'dark'});

    final prefs = await SharedPreferences.getInstance();
    final service = ThemeService(prefs: prefs);

    expect(service.currentTheme, equals('dark'));
  });
}

Mocking Time

Time-dependent code (token expiry, cache staleness, rate limiting) should use an injectable clock rather than DateTime.now().

import 'package:clock/clock.dart';

class TokenStore {
  final Clock _clock;
  TokenStore({Clock? clock}) : _clock = clock ?? const Clock();

  bool isExpired(Token token) {
    return _clock.now().isAfter(token.expiresAt);
  }
}

// Test
void main() {
  test('detects expired token', () {
    final fixedTime = DateTime(2025, 6, 15, 12, 0);
    final clock = Clock.fixed(fixedTime);
    final store = TokenStore(clock: clock);

    final expired = Token(expiresAt: DateTime(2025, 6, 15, 11, 0));
    final valid = Token(expiresAt: DateTime(2025, 6, 15, 13, 0));

    expect(store.isExpired(expired), isTrue);
    expect(store.isExpired(valid), isFalse);
  });
}

When NOT to Mock

Mocking is not always the right answer. Do not mock when:

The real thing is fast and deterministic. An in-memory SQLite database or a pure Dart utility does not need mocking.
The mock would replicate the implementation. If your mock's when chain mirrors the source code line-for-line, the test proves nothing.
You are testing the integration itself. If the point of the test is "does my code work with SharedPreferences?", mocking SharedPreferences defeats the purpose.

Testing Architecture

What to Test vs What Not to Test

Not all code deserves a test. Prioritize by ROI:

High ROI (test these)	Low ROI (skip or test lightly)
Business logic, validators, formatters	Trivial getters/setters
State transitions (Bloc events, Notifier methods)	Framework boilerplate (MaterialApp setup)
Error handling paths	Auto-generated code (freezed, json_serializable)
Complex widget interaction flows	Simple pass-through widgets
Edge cases from bug reports	Styling (unless golden-tested)

Test Organization

Two viable strategies:

Mirror lib/ structure (recommended for large projects):

test/
├── features/
│   └── order/
│       ├── data/
│       │   └── order_repository_test.dart
│       └── presentation/
│           ├── order_page_test.dart
│           └── order_viewmodel_test.dart
├── core/
│   └── utils/
│       └── email_validator_test.dart
└── helpers/
    ├── test_app.dart
    └── mocks.dart

Flat test/ (acceptable for small projects): all test files at the top level. Stops scaling past about 30 tests.

Code Coverage Strategy

Coverage is a useful signal but a terrible target. The goal is not 100% coverage — the goal is confidence that important paths work.

# Generate coverage
flutter test --coverage

# Generate HTML report (requires lcov)
genhtml coverage/lcov.info -o coverage/html

Set a coverage floor (e.g., 70%) in CI to catch large drops, but do not set it to 100%. Chasing 100% leads to tests that verify trivial code and make refactoring expensive.

Coverage tells you what is NOT tested, not what IS tested well. A line covered by a test that never asserts anything useful shows as "covered" but provides zero confidence. Read coverage reports to find blind spots, not to celebrate green numbers.

Test Naming Conventions

Consistent names make test output scannable. The should_X_when_Y pattern reads well in failure reports:

group('OrderRepository', () {
  test('should return cached order when cache hit', () { ... });
  test('should fetch from API when cache miss', () { ... });
  test('should throw OrderNotFoundException when API returns 404', () { ... });
});

Alternative: sentence-style names that describe the behavior:

test('returns cached order on cache hit', () { ... });
test('fetches from API on cache miss', () { ... });

Pick one convention and enforce it in code review. The format matters less than consistency.

Shared Test Utilities

Centralize repeated test setup in helper files.

// test/helpers/mocks.dart
class MockOrderRepository extends Mock implements OrderRepository {}
class MockAuthService extends Mock implements AuthService {}
class MockNavigatorObserver extends Mock implements NavigatorObserver {}

// test/helpers/builders.dart — object mothers for test data
class OrderBuilder {
  String _id = '1';
  double _total = 99.99;
  OrderStatus _status = OrderStatus.pending;

  OrderBuilder withId(String id) { _id = id; return this; }
  OrderBuilder withTotal(double total) { _total = total; return this; }
  OrderBuilder withStatus(OrderStatus status) { _status = status; return this; }

  Order build() => Order(id: _id, total: _total, status: _status);
}

// Usage
final order = OrderBuilder().withStatus(OrderStatus.shipped).build();

Anti-Patterns

Testing Implementation Details

Tests that assert on the exact widget tree structure break every time you refactor the UI, even when behavior is unchanged.

// Anti-pattern: testing widget tree structure
expect(find.byType(Padding), findsNWidgets(3));  // breaks if you add padding
expect(find.byType(Column), findsOneWidget);       // breaks if you switch to Row

// Better: test observable behavior
expect(find.text('Order #1'), findsOneWidget);
expect(find.byKey(const Key('submit')), findsOneWidget);

Over-Mocking

Mocking every dependency creates tests that verify your mock wiring, not your actual code.

// Anti-pattern: mocking what you own for no reason
when(() => mockFormatter.format(42.0)).thenReturn('\$42.00');
expect(service.getDisplayPrice(42.0), equals('\$42.00'));
// This test proves nothing — it tests that the mock returns what you told it to.

// Better: use the real formatter
final service = PriceService(formatter: CurrencyFormatter());
expect(service.getDisplayPrice(42.0), equals('\$42.00'));

Flaky Tests from Timing

pumpAndSettle waits for all animations to complete. If a widget has an infinite animation (loading spinner, shimmer), pumpAndSettle times out and the test fails intermittently.

// Flaky: pumpAndSettle with infinite animation
await tester.tap(find.text('Load'));
await tester.pumpAndSettle(); // times out if loading spinner is showing

// Stable: pump a known duration
await tester.tap(find.text('Load'));
await tester.pump(const Duration(milliseconds: 500));
expect(find.byType(CircularProgressIndicator), findsOneWidget);

Golden Test Churn

Golden tests that cover entire screens break on every minor UI change — new spacing, updated copy, different icon. This trains the team to blindly update goldens, which defeats the purpose.

Limit golden tests to isolated, stable components (design system atoms, brand-critical visuals). Do not golden-test full pages unless they are truly static.

Integration Tests That Test Everything

A single integration test that walks through the entire app is a "mega-test." It is slow, brittle, and when it fails you have no idea what broke.

// Anti-pattern: mega-test
testWidgets('the entire app works', (tester) async {
  app.main();
  // 200 lines of taps, scrolls, and assertions...
});

// Better: focused integration tests
testWidgets('login flow succeeds with valid credentials', ...);
testWidgets('checkout flow charges correct total', ...);
testWidgets('search returns and displays results', ...);

Each integration test should cover one user journey. If it takes more than 30 seconds to run, it is probably testing too much.

The test suite is a product. It needs maintenance, refactoring, and pruning just like production code. A flaky test that the team ignores is worse than no test — it trains everyone to dismiss test failures. Delete or fix flaky tests immediately; never leave them red.

Testing

On this page