Why can’t musicians jam with each other online without latency or other issues?

Caleb Dolister
7 min readApr 5, 2020

It’s the year 2020. Broadband is everywhere. Networks are fast. We can join an MMO gaming server (massively multiplayer online) and support hundreds of players with low latency, but a couple of musicians can’t sync up and jam together?

For nearly every musician, COVID-19 has caused many disruptions. Health aside, this includes canceled tours, gigs, lessons, and likely most studio work. In an ideal world, this time off could be used productively to work from home practicing, but musicians need to rehearse and write with other musicians. Music teachers are also under pressure from parents and school boards to figure out a way to interact with their band students from a distance. It’s incredibly burdensome for a teacher to replace band classes for 40–50 individual lessons.

Netflix created “watching parties,” co-workers are having happy hours over Zoom, and millions of gamers can jump online together to fight virtual villains. It seems everyone has a way to work together, except musicians.

The idea of getting together online to play music is not new. Developers have been working on it for years and tech-savvy musicians have been enthusiastically downloading apps or buying and building hardware to try and make it work. There are some solutions, but latency is unavoidable, even in the most professional of setups.

It seems musicians all over are looking for answers to the question: even with a high-bandwidth connection and today’s tech, why doesn’t jamming online work?

Motion, Video Games, Baseball, and the Brain’s Fantastic Ability for Prediction

Latency has been tormenting online gamers since the dialup days. I grew up playing online games in the late ’90s in a small mountain town in Northern California and I remember feeling like I had a bomb-proof connection when my community’s dialup network provided 250ms.

Why? Honestly, I didn’t know better. There was no broadband. Rural “gamers” like myself learned how to play with latency by anticipating what other players would do, and compensating by analyzing motion. It was less about see-and-react and more about see-and-predict.

Similarly, Major League Baseball hitters talk about the speed of trying to watch a fastball from the moment it leaves the hand until it crosses the plate. It actually happens too fast to react in real-time. Hitters instead look for movement which allows them to predict where the ball will go. It takes about 300ms–400ms for the pitch to arrive at the plate, roughly the same amount of time it takes to blink an eye, and batters have about 150ms to react before it’s too late to even swing the bat.

In truth, we already deal with latency in music all the time.

Music and sound travel at varying speeds depending on factors like altitude, temperature, humidity, frequency, and more. Here’s an interesting chart showing measured sonic latency. In general, we can assume sound travels through air at about 18ms per 20'.

Consider how far you typically setup from other musicians in a live setting. If you’re playing acoustic music on a big stage (no monitors), you’re already compensating for latency, comfortably, at distances up to about 15–20'. Anyone that has played a big stage without amps, or in an orchestra or marching band can share that it becomes difficult to play at larger distances.

However, acceptable latency for recording is less, usually around 10-12ms, though humans can detect latency at about 5ms. The studio challenge is that the notes you play reach your ears inside headphones slower than they do acoustically because of encoding and hardware latency.

Latency is a problem for musicians. For any readers that are unfamiliar with how time is calculated in music, speed is interpreted as a number of beats per minute (bpm), called a tempo. 60bpm = 1 beat per second, 120bpm = 2 beats per second, and so on. If the tempo of a song is 120bpm, this equates to 500ms between beats (1sec=1000ms, .5sec=500ms). At a distance of 20', there is a natural latency of 18ms causing that 500ms/120bpm to feel like 518ms/115bpm. In layman’s terms, it feels like the other player is performing at a slower speed even if they are not. To compensate for this natural occurrence of latency, larger bands are lead by conductors so that there is a visual representation of time.

Natural latency can be dealt with on a state because musicians can compensate when it’s consistent. It becomes impossible in a larger setting with multiple players and latencies, especially when those latencies are changing.

Consider a marching band where there may be well-over 200 musicians spread out across an entire football field. Players on the edge can be separated by long distances, while some are just a few feet apart, but the band is always moving. At any given time, these players hear the person next to them at about 5ms, the person 10' away at about 9ms, 20' at 18ms, all the way up to players 200' away at 180ms. Without a conductor, it’s a mess. If latency a challenge while consistent, dealing with multiple latencies that are changing becomes a crisis.

Jamming online, unfortunately, is just like being in different parts of the field in a marching band. Each player is going to have a different latency depending on where they are in relation to the common server, and that latency will change because of network congestion.

The Deal Breaker: Variation

For musicians that are unfamiliar with network pings and jitter, here is a little experiment: run a ping test to 8.8.8.8. That IP is the primary DNS server for Google DNS and is one of the most consistent servers on the internet.

You’ll notice that the results can fluctuate for each packet sent. Sometimes the time can jump by +5ms…. -10ms… -1ms…. +7ms….., or the packet can get dropped entirely. It’s rarely the same.

This is caused by network traffic and in a music setting, creates a bigger problem because each musician in an online session is going to have these fluctuations. Not only is the sense of musical timing delayed, but it’s also moving. There are no solutions to predict and compensate for network traffic.

In conclusion

We can learn to deal with latency in the range of 20–30ms, maybe even 50ms or more if it was totally reliable. But public networks have traffic, and lots of it, causing even the best connections to fluctuate, drop packets, lag, reset, etc. Musicians are sensitive enough to feel a change of 5ms, so these latency variations are disrupting, especially because they are impossible to predict.

I have seen some musicians playing duo videos, even with latency. I would guess they’re not really playing written music and it looks to work best with a rhythmic player on one side and someone soloing at the other. It seems to gel, but only until it’s time to play parts together. Adding a third player is where it ultimately breaks down.

And for anyone considering that 5G connections are just around the corner — while these can theoretically support consistent 1–5ms round-trip speeds, the networks are still going to be affected by fluctuations caused by network traffic, and you still have your local environment and home-network latencies. Most audio platforms create latency through digital audio encoding, and video platforms have to encode both audio and video data. And unless you are using a wired connection to the internet, you have a mess of latency between your audio source and your WiFi router.

Even with the most ideal setup, assuming there is a direct fiber connection between two hosts, can it work?

I’ve learned a lot since publishing the first version of this post. In one case, I learned about https://lola.conts.it/. This seems to be the most capable setup to allow musicians to play over distances. Unfortunately, it’s not a home-consumer product and requires very specific hardware and a powerful 1GB clear-path connection between locations. The hardware helps reduce latency, and the clear-path data connection helps prevent spikes due to network traffic. The project has installations all over the world thanks to universities and research labs and has enabled musicians to rehearse and in some cases perform.

Another promising option (audio only) is JackTrip.
https://ccrma.stanford.edu/groups/soundwire/software/jacktrip/
It’s getting quite a bit of attention right now thanks to a mention by NPR and seems to work best over shorter distances, 500 miles max. Jacktrip vastly shortens the time that it takes to encode your audio signal into packets that can be sent over the internet.

According to mathematician & physicist Philippe Kahn, we still have one main challenge that prevents musicians from being able to achieve a real-time experience: Einstein’s relativity theory that nothing can travel faster than the speed of light. In addition to mathematics, one of Philippe’s many passions is his life-long practice of classical and jazz music.

“No matter how efficient the network & equipment, latency is unavoidable. Therefore the problem of real-time remote music performance comes down to “What is the acceptable latency?” My personal opinion is that a consistent 10ms is a minimum to serve all musical styles. The less the better. But there is always going to be some latency. You can’t beat Einstein and the laws of physics, except in science fiction books where we travel in time, which is a lot of fun!”

The speed of light has about 5ms latency over a distance of 1500km. A network connection, under perfect conditions, cannot go faster than this.

Unfortunately, most of us don’t have perfect network connections. We’re stuck with network traffic and trying to balance fluctuations from multiple musicians trying to connect with varied latencies. Achieving a real-time feel online won’t work for music despite our brain’s ability to learn how to compensate and predict.

That doesn’t mean we can’t hop online and have a little fun.

--

--