Streamlining Dataset Migrations at Spotify: How Honk, Backstage, and Fleet Management Work Together

At Spotify, migrating thousands of datasets for downstream consumers used to be a painful, error-prone process. Engineers had to manually track dependencies, update configurations, and coordinate across teams. To solve this, Spotify developed a trio of internal tools—Honk, Backstage, and Fleet Management—that automate and simplify the migration flow. Below, we answer key questions about how these tools supercharge dataset migrations and reduce operational overhead.

What is Honk and how does it help with dataset migrations?

Honk is Spotify's background coding agent designed to automate the repetitive aspects of dataset migrations. It acts as a coordinator that manages the lifecycle of migration tasks—from analyzing existing dataset schemas to generating transformation code and validating changes. Honk works in the background, continuously scanning for datasets that need to be migrated based on upstream changes (e.g., a new schema version). It then generates migration scripts, applies them to test environments, and rolls out changes to production gradually. This eliminates manual scripting errors and reduces the time engineers spend on boilerplate migration work. Honk also tracks the state of each migration, provides rollback capabilities, and alerts teams if any downstream consumer fails. By automating the heavy lifting, Honk allows engineers to focus on higher-level logic rather than tedious data plumbing.

Streamlining Dataset Migrations at Spotify: How Honk, Backstage, and Fleet Management Work Together — Source: engineering.atspotify.com

What role does Backstage play in the migration process?

Backstage, Spotify's developer portal, serves as the central hub for managing and visualizing dataset migrations. When Honk triggers a migration, Backstage provides a user-friendly interface where engineers can see a dashboard of all datasets, their migration status, and any potential conflicts. Backstage also integrates with Spotify's software catalog, allowing teams to discover which services consume each dataset. This visibility is crucial because it shows the full dependency graph—who upstream affects whom. Engineers can then approve or reject migration proposals directly in Backstage, view logs from Honk's execution, and track rollbacks. Additionally, Backstage offers self-service actions, such as pausing a migration for a specific consumer or re-running a failed step. Without Backstage, teams would have to manually coordinate through chat or tickets, leading to delays and miscommunication.

How does Fleet Management support large-scale dataset migrations?

Fleet Management at Spotify is the infrastructure layer that handles the orchestration and execution of migrations across thousands of datasets. While Honk decides what to migrate and Backstage shows who is affected, Fleet Management controls how the migration is rolled out at scale. It manages the deployment of migration jobs to clusters, ensures resource limits are respected, and handles retries and error escalation. Fleet Management also implements gradual rollout strategies—migrating a small percentage of datasets first to catch issues, then ramping up. It monitors system health during the migration, automatically pausing if CPU usage spikes or if consumer latency increases. This guardrail protects downstream services from cascading failures. By decoupling the migration logic from the execution infrastructure, Fleet Management allows Honk and Backstage to remain focused on business logic while Fleet Management handles the operational challenges of moving petabytes of data.

What challenges did Spotify face before using these tools together?

Before adopting Honk, Backstage, and Fleet Management, Spotify's dataset migrations were largely manual and error-prone. Engineers had to identify all downstream consumers of a dataset—a task that often required digging through code repositories and contacting multiple teams. Schema changes could break dependent services without warning, leading to incidents and fire drills. Each migration required custom scripts, code reviews, and careful timing to avoid data loss. The process was slow: a simple schema change might take weeks to propagate safely. Moreover, there was no centralized way to track migration progress or roll back changes quickly. Cross-team coordination was chaotic, with frequent miscommunications about which datasets had been migrated. These pain points motivated Spotify to build an integrated system that automates detection, visualization, and execution in one cohesive workflow.

What benefits have Spotify engineers seen after implementing this system?

Since deploying the combination of Honk, Backstage, and Fleet Management, Spotify engineers have reported dramatically faster and safer dataset migrations. The time to complete a typical migration dropped from weeks to days, and in many cases to hours for straightforward schema changes. The error rate decreased because automated scripts eliminate human typos and missing edge cases. Engineers now have full visibility into dependencies, so they can confidently approve migrations without fear of breaking downstream services. The system's gradual rollout and automatic rollback capabilities mean that if a migration causes unexpected behavior, it is reverted before affecting all consumers. Furthermore, teams spend less time in coordination meetings and more time building features. The unified dashboard in Backstage also gives managers a clear snapshot of migration health across the organization. Overall, Spotify has turned a painful operational task into a streamlined, self-service process.

Can other companies adopt a similar approach using open-source tools?

While Honk, Backstage, and Fleet Management are internal Spotify tools, the principles behind them can be replicated using open-source alternatives. For example, Backstage itself is open-source and can be used as the developer portal for visualizing dataset dependencies—just populate it with metadata from your data catalog. For the automation agent (Honk-like), one could use workflow engines like Apache Airflow or Prefect to orchestrate migration scripts, combined with schema-aware diff tools to detect changes. For fleet orchestration (Fleet Management-like), tools like Kubernetes with custom operators or Nomad can manage rollout strategies, resource limits, and health checks. Integrations are key: connect your data catalog (e.g., Apache Atlas or Amundsen) to your workflow engine, and expose actions via Backstage. The main takeaway is that automating migrations requires three layers: detection, visibility, and execution. By combining existing open-source solutions, any organization can reduce the pain of dataset migrations.

Tags: