Distributed Request Tracing

Project Status:

Define

Start:

August 2, 2021

Estimated Completion:

December 31, 2021
  • Home
  • Distributed Request Tracing

Share this page:

Share on facebook
Share on twitter
Share on linkedin
Share on email
Share on print

Problem Statement

The premise of this project is that by collaboratively combining logs, metrics, and distributed request traces throughout the set of services necessary to support video playback we can dramatically increase observability of the overall streaming video experience and drive increased quality of service for content providers and their service providers to mutual benefit.

Project Description

Content providers, who frequently rely on third party software and services (Players, CDNs, Origin Services), struggle to achieve the observability necessary to achieve their QoE goals. In short, they are responsible for the entire customer experience but only able to fully control or even observe what they themselves instrument and what their service providers are able/willing to share. In addition, service providers struggle to provide optimal experiences to their customers (content providers) due to the same observability challenge. In short, telemetry is fragmented and siloed making it virtually impossible to get the complete architectural or operational picture. Data sharing and correlation between content providers and CDN service providers are emerging trends in the streaming video industry. Several CDNs offer logging data feeds with varying levels of sophistication. Content providers are starting to look at CDN and play data together to better understand QoE. Conventions like the ​Common Media Client Data (CTA-5004) specification​ have established methods for how player session info can be relayed to CDNs as metadata via HTTP requests and correlated with cache sessions logs. This is both a huge step forward and yet insufficient for achieving sustained, high levels of QoE. Shared, correlated logs and metrics are not enough to fully understand where things fail or why. Logs provide fine grained, event-level, service-specific telemetry to perform deep, localized, analysis. Metrics, which may be derived from logs or independently generated, are summarized calculations which may act as signals for quality issues but they frequently don’t help operators and engineers develop a deep understanding of the services they support or rapidly address production issues. For that we must add distributed request tracing, the third leg of the observability stool. Distributed Request Tracing​ builds upon logs and metrics, surpassing their utility by creating an observational map of a distributed system, frequently a cloud-based, microservice architecture. The emerging standard for distributed request tracing is the OpenTracing project (recently merged with OpenSensus into the new OpenTelemetry project).

Current Document

There is not a document currently associated with this project.

Project Leads

Goals and Objectives

The deliverable for the first phase of this initiative is a detailed internal report and presentation to the SVA. This report will include specifics on the following:
  • The instrumentation and implementation performed to enable observability through logs, metrics, and traces across the participating services.
  • The scenarios simulated within the instrumented set of services.
  • The tools, visualizations, and analytics developed to enhance observability
  • The results of the simulations and how the instrumentation and tools developed enable rapid resolution of real-world problems.
  • Optionally (depending on the results) a white paper and presentation may be developed for public consumption.
  • Optionally, build on the work done in phase 1 to add more complex use cases like live streaming, ad delivery, or other scenarios that might be triaged out of phase 1.

Project Scope

The first phase of this project will focus on the end to end tracing of the runtime streaming video playback path (from player to CDN to origin and back). Initially we will focus on VOD delivery since live streaming dramatically increases scope and complexity. This means that the project requires representation (in the form of SMEs and/or development resources) of the following architectural elements – application, player, CDN, and origin service. Telemetry will be collected, stored, and made available for analysis centrally by one or more data platforms. At least one streaming video player will be selected for instrumentation and integration with one or more telemetry collectors using industry standard content formats (e.g. h.264) and delivery protocols (e.g. Dash, HLS). We will start with a web-based javascript player for ease of instrumentation and rapid iteration. Collected telemetry data will be centralized with appropriate metrics and logs for analysis. While the primary focus of this project is on traces as an analytics data source we also want to build on work like the ​Common Media Client Data (CTA-5004) specification​. To this end will layer in metrics and log information that enables correlation between playback session logs and related CDN cache logs. It is through the combination of these enriched logs and trace data that we will build a holistic operational image of the larger, multi-service ecosystem. While adding an ISP component to the project would increase observability it will not be included in the first phase. Network technologies don’t easily lend themselves to whitebox-level tracing techniques and ISPs aren’t well-positioned to provide network logs for external consumption. Adding network telemetry will be considered for a later phase. For phase one we will rely on network information provided by the application, player, CDN, and origin services (mainly IP addresses and information that can be derived from them).

Contributors

The following members have contributed to this project. Click on their name to visit their profile. If the have not published their profile, the link will redirect to their LinkedIn profile.

Additional References

The following are recommended readings prior to participating in this project:

Presentations

The following presentations delivered during Measurement/QoE working group sessions may provide additional information about this project.
Scroll to Top
X