“With Observability, Debugging Can Start Directly From Real System Data”- Johan Kraft, Andreas Lifvendahl, Percepio

A daunting task otherwise, Percepio’s Tracealyzer and Detect are changing the game of debugging embedded systems with real-time observability and faster fault resolution. Speaking to Johan Kraft and Andreas Lifvendahl, EFY’s Nidhi Agarwal explores how these tools are transforming development workflows in India and worldwide.

Johan Kraft, CTO and Founder, and Andreas Lifvendahl, CEO, at Percepio

Q. What led you to found Percepio, and how does your tracing approach differ from traditional methods?

A. Percepio originated from my PhD work on embedded software timing analysis, which highlighted the lack of practical runtime visibility in complex embedded systems. Traditional tracing relied on hardware-based, instruction-level tracing using physical probes. Percepio introduced a software-based approach in which runtime events are instrumented directly into embedded software, recorded in memory, and analysed using ‘Tracealyzer’. This enables detailed runtime insights without dedicated trace hardware.

- Advertisement -

Q. Can you briefly describe the product line?

A. Our core product, Tracealyzer, offers advanced trace visualisation and analysis for real-time operating systems and Linux-based embedded systems. It is used across industries like automotive, aerospace, and defence. We also have ‘Percepio Detect’, a server-based solution for systematic observability during testing. Detect monitors multiple devices, detects anomalies, and accelerates fault analysis during validation, whereas Tracealyzer is more developer-focused for debugging and performance analysis on individual systems. Detect is designed for teams and large-scale testing environments, providing automated monitoring, anomaly detection, and crash diagnostics across many devices through a centralised dashboard.

Q. Your solutions are entirely software-based, or do they require specialised hardware?

A. Our products are entirely software-based. We may usestandard debug interfaces such as JTAG or Segger J-Link for data transport, but no proprietary hardware is required.

Q. How versatile are your products across industries, and how easily can they integrate into existing workflows or systems?

A. Tracealyzer is designed for flexibility. It isa desktop tool you can use on demand, with an intuitive interface and extensive documentation. With minimal overhead, it integrates easily into systems with its TraceRecorder module, which can be embedded with a simple compile-time flag. For larger-scale solutions like Detect, we provide a Docker-based server and database that integrates seamlessly into continuous integration workflows. Although we do not offer bespoke integrations, everything is scriptable to enable easy pipeline integration.

- Advertisement -

Q. Who are your primary customers, and which roles typically use your tools?

A. Traditionally, the primary users are embedded and firmware software engineers and team leaders. Our best ambassadors remain the engineers themselves, often bringing the tool to new companies. We are now expanding into broader enterprise adoption, with customers such as BMW using it extensively in their QA processes.

Q. Which industries are adopting your tools the fastest, and in what types of projects?

A. We see fast adoption in defence, consumer, and industrial electronics, especially in low-power radio frequency (RF) and Bluetooth devices, where timing and runtime performance directly impact end-user experience and brand reliability.

Q. What parts of embedded software are still the hardest to observe today?

A. In my experience, the hardest parts are timing-critical embedded systems, where adding instrumentation or software tracing introduces unacceptable overhead. It becomes even more challenging in heterogeneous systems built from third-party components, where we do not have full access to source code or internal behaviour. Modern architectures that combine centralised high-performance computing with legacy ECUs make root cause analysis difficult, as timing or latency issues in one node can manifest elsewhere. This complexity increases further with software-defined systems running AI workloads alongside traditional sensors and actuators. Overall, it is not specific; it is driven by system scale, integration depth, and limited observability across distributed components.

Q. How does observability change the way embedded engineers debug issues?

A. You see, observability reduces the need to reproduce failures. By proactively capturing trace and runtime data, engineers gain immediate insight when an issue occurs, whether during testing or in the field. This eliminates long reproduction cycles where crashes are vague and hard to diagnose. With observability, debugging can start directly from real system data, focusing on root cause rather than guesswork. As a result, the time to repair drops from weeks or months to, in many cases, the same day.

Q. How early in the development cycle should engineers start using tracing and observability?

A. We recommend starting as early as prototyping and technology evaluation. Tracing helps identify performance issues early, leading to up to 90% reduction in debugging time and faster development cycles.

Q. What design choices were critical in making Tracealyzer work across many real-time operating system (RTOS) events?

A. One key decision we made was building an advanced internal system model within Tracealyzer that represents the platform behaviour. Tracealyzer has a native understanding of common RTOS concepts, including threads, semaphores, and message queues. When porting to a new RTOS like FreeRTOS or Zephyr, we only need to map RTOS-specific events to this generic model using XML. It allows us to support multiple RTOSes without compromising on analytical power.

Q. How do you balance deep runtime visibility with low overhead on highly constrained embedded devices?

A. This is something we have worked on for over a decade.. We use a lightweight TraceRecorder that logs only the most valuable RTOS events, typically just eight to 16 bytes per event. Instead of tracing every function call, which would be too costly, we focus on core scheduling and timing events with high explanatory power. Event rates are typically a few thousand per second and add only a small, measurable central processing unit (CPU) load. There is no silver bullet, but with sound instrumentation choices, this approach is far more efficient than traditional print-based debugging.

Q. How do you optimise memory usage when storing core dumps on devices with limited random access memory (RAM)?

A. In the latest version of Detect, we have madecore dumps selective, focusing on key data like the stack of the running thread, reducing dump size to just a few hundred bytes. A core dump captures processor registers and memory, but you do not need to include the entire RAM. Implementing this was somewhat tricky, especially with FreeRTOS and other small details, but the result is highly efficient. For example, our demo on GitHub shows core dumps around 330 bytes, which can be easily uploaded and handled even on small microcontrollers.

Q. How does your tool distinguish between a firmware freeze caused by CPU overload and one caused by an anomaly in an RTOS system?

A. We use the design for manufacture (DFM) library to monitor features and capture fault exceptions automatically. Our task monitor, introduced in Percepio Detect 2025.2, checks per-thread CPU usage at configurable intervals and flags anomalies such as thread stops.

We also monitor latency anomalies by measuring the time between code points, using DFM API functions to distinguish overload-related freezes from RTOS anomalies.

Let me tell you this story. Once, a customer’s device occasionally froze without explanation. Using the Task Monitor, CPU usage deviations triggered automatic data capture before a power cycle, helping them identify the root cause while preserving critical information.

Q. When your tools run on real-time systems, does detecting an intermittent anomaly affect system performance?

A. It can, depending on how much data you capture at runtime. If a device has crashed, a 10-100 millisecond delay in recording diagnostics is usually acceptable. For non-critical anomalies, we use clever methods to minimise impact, such as storing data in RAM buffers that survive restarts or logging through low-priority idle tasks to avoid disturbing higher-priority threads. With tracing, event filtering is essential to reduce overhead, and Detect saves data less frequently but efficiently. We also support fast interfaces such as IAR IJet debug probes and Ethernet for alerts with latency under 10 milliseconds, while slower channels like UART are less suitable for real-time reporting.

Q. What runtime metrics best indicate the health, stability, and predictability of an embedded system?

A. We look at system health from both a detection and a diagnostics perspective. Hard fault exceptions, such as invalid memory access, are critical indicators and should be captured with a clear fault context. Beyond faults, we focus heavily on timing, scheduling, and resource usage to see how close the system operates to its limits. One often overlooked metric is runtime variability, especially in execution times and response latencies in multithreaded RTOS systems. A healthy system behaves predictably and consistently, with minimal variation between runs. Tracing helps us identify sources of nondeterminism, such as data-dependent algorithms, and adjust priorities or design to improve reliability.

Q. How can customers verify if your debugging or observability tools are compatible with their microcontroller and RTOS?

A. We provide a list of supported processor families on GitHub. If a microcontroller or RTOS is unsupported, customers can often add support with minimal code changes.

Q. Are there any limitations to the automatic anomaly detection in your tools?

A. Yes, there are some. Automatic detection primarily monitors resource usage and latency, so subtle issues such as occasional arithmetic errors may go unnoticed unless explicitly configured. It complements traditional testing, which works well for single-threaded code but often misses multi-threaded timing anomalies that our tools can reveal.

Q. How do you test and validate your tools, and how do you measure their effectiveness?

A. We rely on customer feedback and real-world usage, and expert input. Our products evolve through continuous iteration, and we use test-driven development and CI test automation to validate performance. We measure effectiveness by analyzing customer feedback and industry input.

Q. How does your tool handle detection sensitivity, false positives, and prioritisation of issues?

A. The tool does not produce false positives because it does not guess. It simply reports rule violations defined by the user, such as a thread exceeding a set CPU load. Some alerts may be benign, but they serve as learning opportunities, with users refining rules over time to improve reporting.

Q. Is AI or machine learning currently used in your tools, or are there plans to integrate it?

A. Not yet in the publicly released tools, but we are exploring prototypes for future releases. Trace data is well-suited to AI because it captures patterns, which AI excels at recognising. Potential applications include using machine learning to analyse traces or even leveraging AI like LLMs to explain data and assist with integration, acting as a virtual expert FAE. When used carefully, AI could significantly enhance insight and usability.

Q. Why do embedded teams often delay adopting observability tools, and what are the common barriers?

A. Teams often delay adoption due to confidence in their current practices or tight deadlines. They recognise the value of observability ina crisis. Awareness, especially among younger engineers, is also a factor. Our free version, Percepio View, helps introduce observability early.

Q. Do you have an Indian presence or customers in India?

A. Yes, India is one of our fastest-growing markets. . We work with an active local distributor, FTD Infocom, who supports our engagement in the region. Our customer base spans large telecom players as well as companies in the medical and life sciences sectors.

Q. What is your target customer base for this product in India?

A. We target industries like aerospace and medical devices, where regulatory requirements are high, and performance assurance early in development is critical.

Q. How do you see the maturity of embedded engineering teams in India, and are they early adopters of observability tools?

A. We see the full spectrum, but generally Indian teams show higher curiosity and openness compared to Europe, where legacy practices dominate. Many prospectiveand early customers in India are actively exploring tools like Detect. A notable trend is inbound interest driven by independent research: engineers are Googling, using ChatGPT, and reaching out with questions, demonstrating a proactive, exploratory mindset that we see less often in Europe.

Q. Are there any specific regulatory or market challenges you have faced in India?

A. We have nothing Indian specific, but of course, now we areworking with some of the large medical clients, they are typically multinational companies, some with headquarters in India, some had some headquarters in the US with big development centres in India. And so, for these companies with their end products, let us say, it is medical devices, they need to comply with EU regulations, FDA regulations and then follow these standards. We would say the regulatory are typically global for the products our customers work with. And in terms of our own sales and marketing, we have not really encountered any limitations in operating in India, as we think that is due to working with a very good, sort of established distribution partner.

Q. Are you looking for partnerships in India with OEMs, system integrators, or silicon vendors?

A. Yes, we are actively exploring partnerships, with system integrators and Indian companies serving global markets to introduce best practices and expand adoption. We are open and happy to collaborate with more Indian OEMs and system integrators.

Q. Do you have any academic or startup-friendly programs in India?

A. Yes, we are exploring opportunities to engage with startups and the academic community. Although our company is small and travel has been limited, we plan to host a roadshow in 2026 with our distributor to meet with both customers and partners.