Theory-Structured Harmonic Embeddings for Chord-Conditioned Melody Generation

Overview

Most chord conditioned melody models only see coarse chord information such as root and quality, so they struggle with extended and altered chords and often mishandle tension, resolution, and harmonic motion.

We propose a Theory Structured Harmonic Embedding that represents each chord as a sum of four interpretable parts: Root, Quality, Extension, and Tension. This keeps the chord vocabulary manageable while giving fine-grained, theory-aware control over how the model responds to harmony.

At inference time, Harmony Aware Soft Constrained Decoding adjusts pitch logits using music theory priors on chord tones, allowed tensions, non chord tone resolution, and scale membership. A single scalar λ controls constraint strength and the tradeoff between theory alignment and diversity. Our model improves chord tone ratio, tension correctness, and non chord tone resolution over both a CMT style baseline and an EC2 VAE baseline, while keeping pitch and rhythm statistics close to the dataset under MGEval.

Listening Examples: Comparing Models on the Same Chord Progressions

We compare three models that generate melodies over the same chord progressions: a CMT-style baseline, an EC2-VAE baseline, and our full model with theory structured harmonic embeddings and harmony aware soft constrained decoding. For each case below, all three audio clips use the same chord progression, so differences highlight how each model handles chord tones, tensions, and resolution.

Case 1: Chord progression #3

CMT baseline

EC2-VAE baseline

Our model

Case 2: Chord progression #5

CMT baseline

EC2-VAE baseline

Our model

Case 3: Chord progression #14

CMT baseline

EC2-VAE baseline

Our model

Effect of Constraint Strength λ

In this section we use our full model on a fixed chord progression and vary the soft constraint strength parameter λ at inference time. Higher λ encourages more theory-aligned behavior, such as stronger chord tone usage and more consistent tension resolution, while potentially reducing some aspects of free melodic variety.

λ = 0 (no soft constraints)

λ = 0.5 (default setting in the paper)

λ = 1.0 (strong constraints)

Paper

For more details about the model architecture, training procedure, and evaluation, please refer to the full paper below.

PDF