Word-Level Timing Editor in Epitrite: Fine-Tune Lyric Sync to the Millisecond
Epitrite's AI transcription auto-times every word to the audio. For most songs, this is accurate within 100-200ms — close enough that viewers don't notice. But "close enough" doesn't always feel right. The Word-Level Timing Editor lets you push timing to perfection.
Here's the full guide.
What the Editor Does
The Word-Level Timing Editor lets you:
- See every word in your lyrics as a draggable element on the audio timeline
- Drag any word to adjust its appearance time
- Type exact millisecond values for precision
- Listen to the audio while watching word timing
- Adjust multiple words at once (bulk shift)
For most lyric videos: the default AI timing is fine. For sync-critical content (karaoke, congregational worship, viral hooks), word-level editing is the difference between "good" and "perfect."
When to Use It
The Word-Level Timing Editor is essential when:
- Karaoke Pro use — congregations sing along; timing must be precise
- Viral hook moments — that one lyric everyone screenshots must land on the beat
- Spotify Canvas — 8-second loop where every word matters
- Drop-heavy moments — when "the drop" lyric lands on the bass hit
- Cover song precision — matching original artist's exact phrasing
Skip the editor for:
- Standard TikTok lyric videos (AI default is fine)
- Background lyric atmosphere
- Multi-aspect bulk variants (default works)
How to Access the Editor
- Open your project in Epitrite
- Click the lyric layer panel
- Click "Edit Word Timing" (or similar — varies by interface version)
- The Word-Level Timing Editor opens
Editor Interface
The editor shows:
- Audio waveform at the top (visual representation of your audio)
- Lyric timeline below — each word is a draggable block
- Playhead that moves with playback
- Word details panel for selected word (start time, duration, exact ms)
- Bulk controls at the bottom (shift all, scale all, reset)
Basic Workflow
Step 1: Run AI Transcribe First
Always run AI transcription before manual word timing. AI handles the bulk (95%+ of words). Manual editor handles the precision pass.
Step 2: Play Back at 1x Speed
Listen to the song with word timing visible. Identify words that feel:
- Slightly late (word appears after the vocal hits)
- Slightly early (word appears before the vocal hits)
- Way off (more than 300ms off — usually an AI mistake)
Note timestamps where you hear timing issues.
Step 3: Zoom Into the Problem Areas
In the editor, zoom into the audio waveform at problem timestamps. The waveform shows where vocal energy peaks — the word should land on those peaks.
Step 4: Drag or Type
For each problem word:
- Drag the word block to a new position on the timeline, or
- Type exact ms value in the word details panel (e.g., 5230ms)
After adjusting, listen again. Iterate until timing feels right.
Step 5: Bulk Shift if Needed
If many words feel slightly late (or early) by the same amount:
- Select all words
- Use "Bulk Shift" to move everything by +50ms (or -50ms)
- Re-listen
Bulk shifting is useful if the AI was consistently off in one direction.
Common Timing Issues
Issue 1: Words Land Slightly Late
The most common AI timing error. Vocal peaks at 5.0s, AI predicts word at 5.1s.
Fix: bulk shift all words -50 to -100ms.
Issue 2: Multi-Syllable Words Compressed
AI may compress a 4-syllable word ("syllable") into half its real duration.
Fix: extend the word's duration by dragging its end.
Issue 3: Run-On Lyrics
If the AI groups "I don't know" as one word, syllables are compressed.
Fix: split the word in the editor; assign individual timings to each piece.
Issue 4: Repeat Words Misaligned
When chorus repeats, AI may have inconsistent timing across repetitions.
Fix: manually adjust each repetition for consistent timing.
Issue 5: Adlibs in Wrong Place
Vocal ad-libs ("yeah", "uh") may appear as lyrics when they shouldn't.
Fix: delete the unwanted word, or move it out of the visible lyric flow.
Pro Tips for Precision
Tip 1: Listen with Headphones
Speaker playback masks timing imperfections. Headphones reveal everything.
Tip 2: Slow Playback Speed
Some timing issues only become obvious at 0.5x or 0.75x speed. Use playback speed in the editor.
Tip 3: Watch the Waveform
Vocal energy peaks visible in the waveform are where words should land. Visual confirmation prevents subjective "feels right" mistakes.
Tip 4: Save Increments
Save the project after every major timing pass. If you mess up later, you can restore.
Tip 5: Test the Final Version
After editing, watch the rendered video. Timing in the editor may feel different in the rendered video. Verify.
Word Timing for Karaoke
Karaoke Pro uses word-level timing to highlight each word as it's sung. Karaoke timing has stricter requirements than standard lyric videos:
- Word highlight should activate exactly when the vocal hits (not 50ms early or late)
- Each word's highlight duration should match the vocal sustain
- Congregation timing matters — singers follow the highlighted word
For karaoke use: spend 20-30 minutes on word-level timing per song. The payoff is congregational singing accuracy.
Word Timing for Drops
For songs with drops (EDM, trap, hyperpop), the drop lyric should land on the bass hit:
- Identify the drop's exact ms timing
- Confirm the "drop lyric" word in the editor
- Manually move that word to land on the drop ms
- Even 50ms off feels wrong on a drop
This is the difference between a drop that hits and one that misses.
Word Timing for Cover Songs
For covers of famous songs:
- Audiences know the original phrasing
- Match the original artist's exact timing
- Reference the original's lyric video if it exists
- Manually verify every word lands on the matching moment
For viral covers: word-level timing precision pays off in engagement metrics.
Combining Word Timing with Bulk Create
After word-level timing on the source project:
- Run Bulk Create
- All variants inherit the timing
- Each variant has the same precise timing across all aspect ratios
This means: 30 minutes of word-level timing once → applied to 30+ bulk-created variants.
Time Investment
| Use case | Word timing time | |---|---| | Default AI is fine | 0 min | | Light verification | 5 min | | Standard precision pass | 10-15 min | | Karaoke / sync critical | 20-30 min | | Full precision (every word) | 30-60 min |
For most projects: 10-15 min of word-level editing is enough.
Word Timing + AI Transcription Workflow
The full workflow:
- Upload audio (1 min)
- AI transcribe (1-2 min)
- Read-through and verify lyrics (5 min)
- Word-level timing pass (10-15 min)
- Preview and final adjustments (5 min)
- Export (1-2 min)
Total: 25-35 minutes for precision lyric video. AI handles the bulk; you handle the polish.
Common Questions
Is word-level timing available on Free?
Yes — the editor is on Free tier.
Can I bulk-shift entire sections?
Yes — select a range of words, use bulk shift. Useful for fixing a verse that's all slightly off.
What if I'm not sure where a word should land?
Zoom into the waveform. Vocal energy peaks are visible. The word should land on those peaks.
Does the editor support keyboard shortcuts?
Yes — arrow keys nudge words by 10ms. Shift+arrow nudges by 50ms. Cmd/Ctrl+arrow nudges by 100ms.
Can I export word-level timing data as SRT?
Yes — Epitrite's SRT export uses the word-level timing automatically.
Takeaway
For 90% of lyric videos: AI default word timing is fine. For sync-critical content (karaoke, drops, viral hooks, covers): word-level timing editor pushes accuracy to perfection.
15-30 minutes of fine-tuning per song. The result is the difference between a lyric video that "works" and one that "lands perfectly."
Try the word-level timing editor free — every Epitrite feature on the free tier.