IoT Testing: What Connected Device Teams Get Wrong (And How to Fix It)
IoT Testing: What Connected Device Teams Get Wrong (And How to Fix It)
The short answer: Most IoT product teams test their mobile app and call it done. The device firmware, communication protocol behaviour, OTA update resilience, cloud failover handling, device security, power and voltage resilience, and performance at scale — the layers that cause field failures — go undertested because the tooling is less familiar and the failure modes are less visible. Having validated IoT product launches for Xiaomi and Aqara across their UK smart home ranges, we have seen the full spectrum of what can go wrong. Here's how to approach IoT testing properly.
The Seven Layers of IoT Testing
Effective IoT testing requires coverage across seven distinct layers, each with different failure modes and different tooling requirements:
1. Device & Firmware Layer
This is where most IoT failures originate and where most teams have the least visibility. Firmware running on constrained microcontrollers — ARM Cortex-M, ESP32, Nordic nRF series — operates under memory constraints that create failure modes that never surface in desktop software: stack overflows under edge-case input, heap fragmentation after days of uptime, watchdog timer resets under unexpected load.
Key tests at this layer:
- Memory leak detection over extended runtimes — devices that work fine for the first 24 hours and crash after 72 are a common field failure pattern
- Watchdog timer behaviour — does the device recover correctly from a firmware hang, or does it reboot into a broken state?
- Factory reset integrity — after a factory reset, is the device genuinely clean of prior configuration?
- Multi-device simultaneous operation — in a smart home scenario, 20+ devices may be operated concurrently; does the hub maintain stability under this load?
When we tested Xiaomi's smart plug range for UK launch, firmware-level testing under sustained load revealed a heap fragmentation issue that manifested only after 96+ hours of continuous polling from the Mi Home app — a scenario that would have been invisible to any test run of less than four days.
2. Communication Protocol Layer
Zigbee, Z-Wave, Thread, Matter, BLE, and Wi-Fi all have implementation quirks that sit at the edge of their respective specifications. Devices that pass certification testing can still fail in real-world deployments where multiple vendors, multiple protocols, and real-world RF interference interact.
Key tests at this layer:
- Mesh network resilience — Zigbee and Thread networks self-heal when nodes drop; test whether that healing behaves correctly under realistic topology changes
- Pairing under RF congestion — dense 2.4GHz environments (common in UK flats and offices) create pairing failures that look like firmware bugs but are actually protocol-layer issues
- Message delivery under latency — what happens when a cloud command takes 3 seconds instead of 300 milliseconds? Does the device queue, discard, or duplicate the command?
- Protocol version compatibility — particularly relevant for Matter, where controller implementations across Apple Home, Google Home, and Amazon Alexa interpret edge cases differently
For Aqara's UK launch, protocol testing revealed that their Zigbee sensors exhibited message duplication under specific channel congestion conditions — a failure mode that manifested as phantom state changes in the app, but was caused by a retransmission timing issue at the protocol layer.
3. OTA Update Layer
Over-the-air firmware updates are one of the highest-risk operations in an IoT product's lifecycle. A failed OTA update can brick devices in the field at scale — and bricked consumer devices generate support costs, returns, and reputational damage that far exceed the cost of thorough OTA testing.
Key tests at this layer:
- Interrupted update recovery — cut power, interrupt connectivity, or deplete battery mid-update; does the device recover cleanly?
- Rollback behaviour — if an update fails validation, does the device revert to the prior firmware version correctly?
- Update under load — can users continue to operate the device during an update, or does the update process cause functional interruption?
- Staged rollout validation — if you're rolling out updates in batches, test that the staged delivery mechanism behaves correctly and that v1 and v2 firmware devices coexist on the same network
- Post-update regression — a comprehensive functional regression test immediately after each OTA update is non-negotiable; we have seen OTA updates that shipped cleanly but silently broke scheduling functionality
4. Cloud & Integration Layer
IoT products are distributed systems. The device, the cloud platform, the mobile app, and any third-party integrations (Alexa, Google Home, HomeKit, SmartThings) are all potential failure points. Testing each in isolation is insufficient — the interesting failures occur at the boundaries.
Key tests at this layer:
- Cloud failover and offline mode — when the cloud platform is unreachable, does the device operate in local mode? Does it resync correctly on reconnection?
- API rate limiting behaviour — aggressive polling from companion apps or automation platforms can trigger rate limits; how does the device behave when rate-limited?
- Third-party platform certification — Apple HomeKit and Matter each have specific compliance requirements that go beyond functional compatibility; devices that work with Alexa may still fail HomeKit certification
- Webhook and event delivery reliability — automation platforms depend on reliable event delivery; test event loss rates, duplicate delivery, and delivery ordering under load
UK PSTI Compliance: What IoT Teams Need to Know
The UK Product Security and Telecommunications Infrastructure (PSTI) Act came into force in April 2024, placing legal obligations on manufacturers and importers of connectable consumer products. Non-compliance is not a paperwork issue — it is a potential market withdrawal issue.
The PSTI Act requires three things:
1. No default passwords. Every device must ship with a unique password per unit, or must require the user to set a password on first use. Universal default passwords ("admin/admin", "1234") are prohibited.
2. A published vulnerability disclosure policy. Manufacturers must provide a published point of contact for security researchers to report vulnerabilities, and must specify the timelines within which they will respond.
3. A declared minimum security support period. The packaging and documentation must state how long the product will receive security updates.
Testing PSTI compliance involves validating all three requirements technically — confirming that default passwords genuinely cannot be used to access devices, that the disclosure process is reachable and functional, and that the declared support period commitment is documented in the correct locations.
For teams selling into both UK and EU markets, ETSI EN 303 645 (the European baseline cybersecurity standard for consumer IoT) covers a broader set of 13 provisions and aligns closely with what the EU Cyber Resilience Act will mandate from 2027. Building your compliance programme to the ETSI standard now provides a path to both markets.
5. Security Layer
IoT security testing is distinct from web application security testing — the attack surface is different and the tooling is different. Beyond PSTI baseline requirements, there are several security failure modes that are endemic to consumer IoT and that standard penetration testing engagements are not structured to find.
Unauthenticated local LAN APIs. Many smart home devices — including a number of Xiaomi and Aqara products in their earlier firmware revisions — expose HTTP or MQTT interfaces on the local network that require no authentication. An attacker on the same Wi-Fi network can control the device directly, bypassing the cloud authentication layer entirely. Testing for this requires network scanning and API probing on the local subnet, not just traffic analysis.
Weak or missing TLS between device and cloud. Some devices implement TLS but with weak cipher suites, expired certificates, or no certificate pinning — meaning the cloud communication can be intercepted via man-in-the-middle even though TLS is nominally present. We test TLS configurations using Wireshark and mitmproxy, validating cipher strength, certificate validity, and whether the device rejects certificates from untrusted authorities.
CVE exposure in embedded components. Devices running embedded Linux (common in hubs and cameras) or RTOS variants carry third-party components with their own CVE histories — OpenSSL, BusyBox, libcurl, uClibc. We scan firmware images for component versions and cross-reference against current CVE databases. Unpatched critical CVEs in shipping firmware are a regulatory risk under PSTI and a reputational risk if discovered by security researchers post-launch.
Authentication bypass on companion app APIs. The cloud APIs that back companion apps frequently have BOLA (Broken Object Level Authorisation) vulnerabilities — endpoints where changing a device ID in the request allows one user to control another user's device. We test these using the OWASP API Security Top 10 methodology adapted for IoT API structures.
6. Power & Electrical Reliability Layer
Power-related failures are one of the most underappreciated categories in IoT testing. They are disproportionately represented in field returns and 1-star reviews, and they are almost never caught by software-only testing approaches.
Brownout and voltage edge testing. UK mains supply under BS EN 50160 can legitimately range from 207V to 253V. Smart plugs, lighting controllers, and other mains-powered devices must operate correctly across this entire range. We test using a programmable AC power source, stepping voltage through the tolerance range and deliberately inducing brownout conditions (sustained voltage drop to 85–90% of rated supply). Devices that reset, enter undefined states, or lose their configuration during brownout events represent a quality failure even if they pass at nominal voltage.
Fast transient and surge response. Mains supply in residential environments contains fast transients — voltage spikes caused by other appliances switching on and off. We test device resilience to fast transients within the limits of IEC 61000-4-4, confirming that devices do not crash, corrupt their configuration, or exhibit erratic behaviour in response to normal mains-borne interference.
Battery-powered sensor profiling. For door sensors, motion detectors, temperature sensors, and other battery-powered devices, we profile current draw across all operating modes: active, idle, and deep sleep. We test functional behaviour at low battery thresholds — typically 10–15% charge — confirming that the device reports its low battery status correctly and degrades gracefully rather than failing silently. We also validate that reported battery percentage in the companion app correlates accurately with actual remaining capacity across the discharge curve.
Wake-from-sleep reliability. Battery-powered sensors typically spend most of their time in deep sleep, waking on a schedule or in response to a trigger event. Wake-from-sleep failures — where the device fails to wake, wakes but does not transmit, or wakes and reports stale data — are a common source of field complaints that are difficult to reproduce without extended hardware-in-the-loop testing.
When we profiled the battery drain characteristics of Aqara door sensors during UK launch testing, we identified a sleep mode regression in a late firmware build that increased current draw in idle mode by approximately 40% — reducing expected battery life from 2 years to under 14 months. Caught in testing; would have been a significant warranty and support cost in the field.
7. Performance Layer
IoT performance testing is not about API response times in isolation. It is about the full command-to-action latency — the time from a user tapping a button in the app to the moment the device physically responds — and how that latency degrades as device count, network load, and hub processing demand increase.
Multi-device rig testing. We build test rigs scaled to represent realistic deployment scenarios — typically 20, 50, and 100 concurrent devices. We measure latency across the full stack: app → cloud → hub → device. For a smart home hub managing 50 Zigbee devices, we measure whether command-to-action latency remains under 500 milliseconds for direct device commands and under 1 second for scenes and automations — thresholds that correspond to perceived responsiveness for end users.
Hub CPU and memory exhaustion. Smart home hubs have fixed CPU and RAM budgets. As device count increases, hub resource utilisation increases. We test at maximum supported device counts and beyond, measuring at what point the hub begins to exhibit latency degradation, missed commands, or instability. The maximum supported device count in the product documentation should be a tested performance boundary, not a theoretical limit.
Mesh network throughput degradation. In Zigbee and Thread mesh networks, devices can act as routers as well as end nodes, and network traffic is routed dynamically through the mesh. As node count increases and mesh topology changes — nodes added, removed, or moved — throughput and latency characteristics change. We measure these characteristics systematically, confirming that the network performs within specification at maximum topology complexity.
Cloud round-trip latency under peak load. Consumer IoT cloud platforms experience peak traffic patterns — evenings and weekends when smart home usage is highest. We test cloud API latency under simulated peak load conditions, validating that the P95 round-trip time from device to cloud and back remains within acceptable bounds and that the cloud platform does not rate-limit or queue commands in ways that degrade user experience.
Common Mistakes We See
Testing only the happy path. IoT testing that only validates "device pairs correctly and responds to commands" is insufficient. The failure modes that matter — degraded connectivity, concurrent operations, long runtimes, edge-case inputs — require deliberate adversarial testing.
Ignoring protocol-layer behaviour. Teams with software backgrounds treat the device as a black box and test via the app and cloud API. This misses the protocol-layer failures that cause ghost commands, duplicate state changes, and mesh degradation.
Skipping OTA adversarial testing. Every product team intends to test OTA updates. Many test the happy path only — update works under ideal conditions. The interrupted-update and rollback scenarios are the ones that cause field disasters, and they require deliberate simulation.
Treating compliance as documentation only. PSTI and ETSI compliance are testable technical requirements. Running actual tests against the security provisions — attempting to use default credentials, probing the password enforcement implementation — is the only reliable way to confirm compliance before you ship.
Not testing at realistic device counts. Single-device testing is the default because it is the easiest. But hub performance degradation, mesh latency, and cloud rate limiting only manifest at realistic deployment scale. Testing with 50 devices is qualitatively different from testing with 5.
Skipping power edge testing. Smart plug and lighting controller teams routinely skip voltage tolerance and brownout testing because it requires hardware test equipment. The result is field returns from customers in areas with lower-quality mains supply — a predictable and preventable failure mode.
Key Takeaways
- IoT testing requires seven layers: device/firmware, communication protocol, OTA update, cloud/integration, security, power/electrical, and performance
- Firmware-layer issues (memory leaks, watchdog failures) only surface after extended runtimes — test for 72–96 hours minimum
- Local LAN API security testing is distinct from cloud security testing and requires network-level tooling
- Power edge testing (brownout, voltage tolerance, battery profiling) catches a class of field failures that software testing cannot find
- Command-to-action latency must be benchmarked at realistic device counts (50+), not in single-device lab conditions
- UK PSTI Act compliance (April 2024) is a legal requirement; ETSI EN 303 645 provides the path to both UK and EU market compliance