Preston Hess

<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:media="http://search.yahoo.com/mrss/"
	
	>

<channel>
	<title>Preston Hess</title>
	<link>https://rphess.cargo.site</link>
	<description>Preston Hess</description>
	<pubDate>Tue, 28 Apr 2026 00:13:24 +0000</pubDate>
	<generator>https://rphess.cargo.site</generator>
	<language>en</language>
	
		
	<item>
		<title>Main Slideshow</title>
				
		<link>https://rphess.cargo.site/Main-Slideshow</link>

		<pubDate>Thu, 26 Oct 2023 18:48:35 +0000</pubDate>

		<dc:creator>Preston Hess</dc:creator>

		<guid isPermaLink="true">https://rphess.cargo.site/Main-Slideshow</guid>

		<description>
&#60;img width="1797" height="1211" width_o="1797" height_o="1211" data-src="https://freight.cargo.site/t/original/i/8e180942b1844db9d645d4288ccfbc2790cb367b5a0dac23835d0e2589e84ee0/bcs_headshot.png" data-mid="248161595" border="0"  src="https://freight.cargo.site/w/1000/i/8e180942b1844db9d645d4288ccfbc2790cb367b5a0dac23835d0e2589e84ee0/bcs_headshot.png" /&#62;
&#60;img width="4032" height="3024" width_o="4032" height_o="3024" data-src="https://freight.cargo.site/t/original/i/9348c16538ec5a0e0e72afb5ec71956edcd95a15146552df79ef8e3e9fd1ab42/IMG_0274.JPG" data-mid="194957550" border="0" alt="Sitting in the Speaker Array I built for MIT's Lab for Computational Audition" data-caption="Sitting in the Speaker Array I built for MIT's Lab for Computational Audition" src="https://freight.cargo.site/w/1000/i/9348c16538ec5a0e0e72afb5ec71956edcd95a15146552df79ef8e3e9fd1ab42/IMG_0274.JPG" /&#62;
&#60;img width="828" height="1104" width_o="828" height_o="1104" data-src="https://freight.cargo.site/t/original/i/6ebe4c137ea7e62d80f44fbd3467bb92f4130d58d4e26f5da15afe0e382d6f34/IMG_1100.JPG" data-mid="194957553" border="0" alt="Teaching a Cog Sci Class in Bremen, Germany" data-caption="Teaching a Cog Sci Class in Bremen, Germany" src="https://freight.cargo.site/w/828/i/6ebe4c137ea7e62d80f44fbd3467bb92f4130d58d4e26f5da15afe0e382d6f34/IMG_1100.JPG" /&#62;
&#60;img width="4032" height="3024" width_o="4032" height_o="3024" data-src="https://freight.cargo.site/t/original/i/78655e58d405a53edb75107b26b72564e8eb3945bcb549f73f1058ee016be942/IMG_1622.JPG" data-mid="194957555" border="0" alt="w/ Kat Chan: MIT Chapel Bookends, 2023 (Rockite, Metal, Spray Paint)" data-caption="w/ Kat Chan: MIT Chapel Bookends, 2023 (Rockite, Metal, Spray Paint)" src="https://freight.cargo.site/w/1000/i/78655e58d405a53edb75107b26b72564e8eb3945bcb549f73f1058ee016be942/IMG_1622.JPG" /&#62;
&#60;img width="3024" height="4032" width_o="3024" height_o="4032" data-src="https://freight.cargo.site/t/original/i/ea52424fda5baaab36f2214a0eba0e1ce25458e7283c6d25c1ecb37ae34324ec/IMG_2512.JPEG" data-mid="194957558" border="0" alt="Serving as Captain for MIT's Football Team" data-caption="Serving as Captain for MIT's Football Team" src="https://freight.cargo.site/w/1000/i/ea52424fda5baaab36f2214a0eba0e1ce25458e7283c6d25c1ecb37ae34324ec/IMG_2512.JPEG" /&#62;
&#60;img width="2219" height="2219" width_o="2219" height_o="2219" data-src="https://freight.cargo.site/t/original/i/3705faaf840f8272a4f87d12d337ad29b296c9db7297920a4a58e24982eaceb0/IMG_0827.jpg" data-mid="194957552" border="0" alt="Arm Chair and 1/4 Size Model, 2023 (Wood, Paint; Masonite)" data-caption="Arm Chair and 1/4 Size Model, 2023 (Wood, Paint; Masonite)" src="https://freight.cargo.site/w/1000/i/3705faaf840f8272a4f87d12d337ad29b296c9db7297920a4a58e24982eaceb0/IMG_0827.jpg" /&#62;
&#60;img width="828" height="552" width_o="828" height_o="552" data-src="https://freight.cargo.site/t/original/i/52cc8d15a86e3f637af4681057bc6cc1d30d59cb108610d400415fbf7882a4a2/IMG_0672.JPG" data-mid="194957551" border="0" alt="Serving as President in my Fraternity" data-caption="Serving as President in my Fraternity" src="https://freight.cargo.site/w/828/i/52cc8d15a86e3f637af4681057bc6cc1d30d59cb108610d400415fbf7882a4a2/IMG_0672.JPG" /&#62;
&#60;img width="3847" height="3847" width_o="3847" height_o="3847" data-src="https://freight.cargo.site/t/original/i/61d9927719cb09dc00eb7dd9c955b514e461307d8865c2d287172ed4cd306ca4/4.021_project_3_preston_h_square.JPEG" data-mid="194957548" border="0" alt="Desk Mounted Guitar Stand, 2022 (Metal, Leather, Powder Coat)" data-caption="Desk Mounted Guitar Stand, 2022 (Metal, Leather, Powder Coat)" src="https://freight.cargo.site/w/1000/i/61d9927719cb09dc00eb7dd9c955b514e461307d8865c2d287172ed4cd306ca4/4.021_project_3_preston_h_square.JPEG" /&#62;
&#60;img width="1536" height="2048" width_o="1536" height_o="2048" data-src="https://freight.cargo.site/t/original/i/3816e2534eb06eed4f8e413e3a45b1ea29bf1ec032577bec8268963f0ee6e177/IMG_9380.jpg" data-mid="194960789" border="0" alt="WWOOFing on a Farm in Hawaii" data-caption="WWOOFing on a Farm in Hawaii" src="https://freight.cargo.site/w/1000/i/3816e2534eb06eed4f8e413e3a45b1ea29bf1ec032577bec8268963f0ee6e177/IMG_9380.jpg" /&#62;
&#60;img width="4032" height="3024" width_o="4032" height_o="3024" data-src="https://freight.cargo.site/t/original/i/8d6745181f94355f3dab206b6b632dae38be8dee6573440a42949fa395c661bd/IMG_9695.JPG" data-mid="194957560" border="0" alt="Coffee Filter Lamp, 2022 (Paper, Coffee Grounds, Wood, Paint, Metal)" data-caption="Coffee Filter Lamp, 2022 (Paper, Coffee Grounds, Wood, Paint, Metal)" src="https://freight.cargo.site/w/1000/i/8d6745181f94355f3dab206b6b632dae38be8dee6573440a42949fa395c661bd/IMG_9695.JPG" /&#62;
︎︎︎</description>
		
	</item>
		
		
	<item>
		<title>About Me</title>
				
		<link>https://rphess.cargo.site/About-Me</link>

		<pubDate>Thu, 26 Oct 2023 18:48:36 +0000</pubDate>

		<dc:creator>Preston Hess</dc:creator>

		<guid isPermaLink="true">https://rphess.cargo.site/About-Me</guid>

		<description>

About Me
&#38;nbsp;

“It is the job of artists to open doors and invite in prophesies, the unknown, the unfamiliar; it’s where their work comes from, although its arrival signals the beginning of the long disciplined process of making it their own. Scientists too, as J. Robert Oppenheimer once remarked, ‘live always at the ‘edge of mystery’- the boundary of the unknown.’ But they transform the unkown into the known, haul it in like fisherman; artists get you out into that dark sea.”- A Field Guide to Getting Lost, Rebecca Solnit (2006)My name is Preston Hess. I am a PhD Student in MIT Brain and Cognitive Sciences. I spent undergraduate at MIT majoring in Computation and Cognition and minoring in Design, with a Humanities concentration in Music.&#38;nbsp;
I am currently researching sound localization and auditory attention at MIT Brain and Cognitive Science’s Laboratory for Computational Audition, headed by Dr. Josh McDermott. Over the last three years in the lab, I have built a large speaker array (in slideshow above), run human psychophysics experiments, and dedicated significant time building, hacking, and testing Convolutional Neural Networks inspired by the human auditory system.
My passion for creating and my admiration of artists such as Tom Sachs inspired me to pursue a minor in design, where I experiment with materials and fabrication techniques to make many objects that now furnish my room. A humanities concentration in music allows me to gain a deeper understanding of music theory and technology, enriching my 13 years of guitar playing.
Around MIT, I played on MIT’s football team, where I was voted team captain for the 2023 season. I also held positions in MIT’s chapter of the Sigma Chi fraternity, such as President, House Manager, Social Chair, and Brotherhood chair. I acted as the lead for MIT student governments Sexual Assault Awareness &#38;amp; Gender Equity (SAAGE) Committee for two years, organizing events to educate the MIT population on issues surrounding Title IX. Now, I act as a Social Chair for the MIT BCS Graduate Student Council and host a radio show on WMBR, the student radio show.
My hobbies include specialty coffee, nice sounds, and reading. I would love to connect with you on any of those topics, and am open to discuss my time and experiences at MIT and in life.
	︎&#38;nbsp;︎&#38;nbsp;︎ CV

</description>
		
	</item>
		
		
	<item>
		<title>Projects</title>
				
		<link>https://rphess.cargo.site/Projects</link>

		<pubDate>Thu, 04 Jan 2024 20:23:15 +0000</pubDate>

		<dc:creator>Preston Hess</dc:creator>

		<guid isPermaLink="true">https://rphess.cargo.site/Projects</guid>

		<description>Publications and Research

Papers


  2026&#38;nbsp;&#38;nbsp;
  
    
      Optimized feature gains explain and predict successes and failures of human selective listening
    
  
  Ian M. Griffith, R. Preston Hess, Josh H. McDermott
  March 2026. Nature Human Behavior.


  2025&#38;nbsp;&#38;nbsp;
  
    
      Training Transformers with Enforced Lipschitz Bounds
    
  
  Laker Newhouse*, R. Preston Hess*, Franz Cesista*, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola
  July 2025. arXiv:2507.13338. *Equal Contribution.

Conference Presentations &#38;amp; Posters


  2026&#38;nbsp;&#38;nbsp;
  Optimized feature gains explain and predict successes and failures of human selective listening


  Ian M. Griffith, R. Preston Hess, Josh H. McDermott
  49th MidWinter Meeting of the Association for Research in Otolaryngology. February 2026. Poster Presentation.


  2025&#38;nbsp;&#38;nbsp;
  Human-like feature attention emerges in task-optimized models of the cocktail party problem


  Ian M. Griffith, R. Preston Hess, Josh H. McDermott
  48th MidWinter Meeting of the Association for Research in Otolaryngology. February 2025. Podium Talk.


  2025&#38;nbsp;&#38;nbsp;
  Cross-cultural influences of beating on music perception


  Josh H. McDermott, Bryan Medina, R. Preston Hess, Malinda McPherson, Eduardo Undurraga, Ricardo Godoy
  48th MidWinter Meeting of the Association for Research in Otolaryngology. February 2025. Poster Presentation.


  2024&#38;nbsp;&#38;nbsp;
  Human-like feature attention emerges in task-optimized models of the cocktail party problem


  Ian M. Griffith, R. Preston Hess, Josh H. McDermott
  Advances and Perspectives in Auditory Neuroscience 2024. October 2024. Poster Presentation.


  2024&#38;nbsp;&#38;nbsp;
  
    
      Human-like feature attention emerges in task-optimized models of the cocktail party problem
    
  
  Ian M. Griffith, R. Preston Hess, Josh H. McDermott
  7th annual conference on Cognitive Computational Neuroscience. August 2024. Poster Presentation.


  2024&#38;nbsp;&#38;nbsp;
  Modeling auditory attention with machine learning


  Ian M. Griffith, R. Preston Hess, Josh H. McDermott
  47th MidWinter Meeting of the Association for Research in Otolaryngology. February 2024. Poster Presentation.


  2023&#38;nbsp;&#38;nbsp;
  Measuring and modeling real-world sound localization


  Sagarika Alavilli, Andrew Francl, R. Preston Hess, Josh H. McDermott
  Advances and Perspectives in Auditory Neuroscience 2023. November 2023. Poster Presentation.


Music


  2025&#38;nbsp;&#38;nbsp;
  Form and Void, techno radio show hosted with Isabella Dobrinov, mixed by me. Due to the free tier of SoundCloud, only three episodes are currently up.


  Episode 1
  Episode 2
  Episode 6


  2024&#38;nbsp;&#38;nbsp;
  Creative demos for Interactive Music Systems. These were a chance to show creativity using techniques we learned during a particular problem set. Audio and visuals programmed by me.


  1: Programmed a Simple Synth
  2: Programmed a Drum Machine for Practicing
  3: Musical Version of Conway’s Game of Life (personal favorite)
  4: Click and Drag “Smart Soloist” with 3 Different Modes
  5: Hand-Tracking Controlled Digital String Instrument
  6: Guitar Hero!


  2023&#38;nbsp;&#38;nbsp;
  
    5-Minute Untitled Ambient Track
  

Design


  2023&#38;nbsp;&#38;nbsp;
  
    Class Portfolio
  , Design Studio: Design Techniques and Technologies


Miscellaneous


  2023&#38;nbsp;&#38;nbsp;
  
    Blog Post
  
  for Phillip Isola’s graduate course, Deep Learning


  2022&#38;nbsp;&#38;nbsp;
  
    Podcast Episode
  
  for Simply Neuroscience, where I talk about my experience as a neuro undergraduate at MIT.
</description>
		
	</item>
		
		
	<item>
		<title>Solving the cocktail party problem with a design inspired by the brain</title>
				
		<link>https://rphess.cargo.site/Solving-the-cocktail-party-problem-with-a-design-inspired-by-the-brain</link>

		<pubDate>Tue, 28 Apr 2026 00:13:24 +0000</pubDate>

		<dc:creator>Preston Hess</dc:creator>

		<guid isPermaLink="true">https://rphess.cargo.site/Solving-the-cocktail-party-problem-with-a-design-inspired-by-the-brain</guid>

		<description>

Solving the cocktail party problem with a design inspired by the brainOn our recent paper, “Optimized feature gains explain and predict successes and failures of human selective listening,” by Ian Griffith, Josh McDermott, and me
By Preston Hess, May 6th, 2026
&#60;img width="7110" height="4740" width_o="7110" height_o="4740" data-src="https://freight.cargo.site/t/original/i/0ccc9ea18dfa3220fc991db19aede709dba037b42adc8ae2ff3c9cb9eb8ae4df/cocktailparty_adobe.jpeg" data-mid="247644985" border="0" data-scale="79" src="https://freight.cargo.site/w/1000/i/0ccc9ea18dfa3220fc991db19aede709dba037b42adc8ae2ff3c9cb9eb8ae4df/cocktailparty_adobe.jpeg" /&#62;

The cocktail party problemIt’s Friday night and you’re walking into the hottest new Italian spot with a group of friends. The restaurant is buzzing with sound---a good sign that you are about to have some great food. A hostess guides your group to your table and you take your seats. Once seated, you greet your friends. The conversation is easy. You listen to the person across from you without thinking about it. Then, the table next to you erupts into laughter and gets louder. You realize that someone nearby has a voice that sounds just like your friend’s. Suddenly, you’re completely lost. You smile and nod at your friend across the table but you can’t really hear what they said.

This ability to listen to one person’s voice among many other voices and sounds is called the “cocktail party problem.” Neuroscientists have studied it for a long time and have shown that sometimes it can be easy, and sometimes it is almost impossible for people to do. Why does this happen? Why is it easy to pay attention to one voice in a noisy room sometimes, and why does it feel impossible other times? We can answer these questions with a different approach: instead of asking when listening succeeds or fails, we can ask what the brain is actually doing when it decides whom to listen to in a buzzy new Italian restaurant on a Friday night.

Before getting into the science, try it yourself. First listen to the cue, or “target voice,” alone. Then listen to the mixture and see if you can follow that same person. (These are bit quiet, so you may have to turn your volume up to hear them)


  Play Cue
  Play Mixture


How the brain pays attention
One way the brain makes sense of the world is by breaking it into features. A face, a voice, or a word is not represented by one neuron for every example you see or hear. Instead, many neurons respond to smaller features of the signal such as edges, colors, pitches, textures, locations, and so on. Attention seems to change how strongly some of those feature-detectors matter. How did we find this out? We turn to a really cool result from the late 1950s and another one from the late 1990s.

In 1959, two scientists, David Hubel and Torsten Wiesel, were trying to figure out what individual neurons in the visual part of the brain respond to. To do this, they were recording neurons in the brains of cats. Interestingly, we see that the brain represents objects by representing “features” that make up those objects1. For example, if I asked you to look for a green box, you might have some neurons representing features that boxes have, like corners and straight lines, and then some neurons representing the color green. Hearing can be thought of in a similar way. A voice has features like pitch, tone, accent, location, rhythm, and other acoustic details that help separate one person from another. Neurons typically represent a certain thing that they “like,” and when you are seeing or hearing the thing it likes, the neuron is very active. Roughly speaking, one neuron might act like a little detector that says, “this feature is present.”


    Mock tuning curve of a visual neuron that responds to the color “green”
  

    0
    0.5
    1

    Firing rate

    
    Color
  

This also extends to many other animals, including monkeys. In the late ‘90s, already knowing that some neurons in the monkeys’ brains were sensitive to particular features, the scientists Stefan Treue and Julio Martínez Trujillo asked, “What happens to these neurons when we ask the monkeys to pay attention to the things they prefer?” What they found is, in a word, bananas. When the monkeys pay attention to something that the neuron already ‘likes,’ it becomes even more active. Conversely, if the monkey pays attention to something the neuron doesn’t fire for, it becomes less active2. The most important takeaway from this study is that the brain does not seem to simply switch those neurons on or off. It scales their activity up or down, like turning a dial. Scientists call this kind of scaling a “multiplicative gain.” In plain English, it means boosting the activity of neurons that care about the thing you are trying to attend to.


    Attention multiplicatively changes the response of a neuron 
  

    Pay attention to green
    No attention
    Pay attention to another color
  

    0
    0.5
    1

    
      Firing rate
    

    Color
  

    No attention shows the baseline response. Attending to green increases the neuron’s response. Attending elsewhere suppresses it.

Now, while the results of this study help us to understand attention’s effect on neurons, it leaves us wondering, “If attention works by boosting the features of what we care about, could that alone explain how we listen in noisy environments?”
Let’s try it in a machineArtificial neural networks are computer models loosely inspired by the brain. They are trained on many examples until they learn to recognize patterns, such as objects in images or words in speech. One useful thing about these models is that, like brains, their “neurons” respond to features. This became especially clear in vision models after the deep learning breakthroughs of the early 2010s, when researchers found that artificial networks trained to recognize images developed feature detectors that looked surprisingly similar to ones studied in brains3.


    Features learned by a neural network
  

  &#60;img src="https://files.cargocollective.com/c2058912/feature_tuning.png" alt="Examples of visual features learned by AlexNet" class="figure-image"&#62;

  
    These small images show patterns that neurons of a vision neural network learned to detect after being trained to recognize objects. The model became sensitive to simple visual patterns, like edges, colors, and textures. This helped show that artificial neural networks can develop feature detectors that resemble some of the feature detectors found in the visual region of the brain (Krizhevsky et al., 2012).
  

Knowing that these features also exist in artificial neural networks opens up the possibility for us to use them as a tool to answer the question we posed at the end of the last section. If we boost the model’s internal features the way attention seems to boost brain responses, can we make the model behave like it is paying attention? Does this work for hearing the same way it works for vision?

There was already a clue that this might work. In 2018, Grace Lindsay and Ken Miller tested a model of feature boosting in vision. They showed a neural network images containing several objects at once. When they boosted the model features associated with one object, the model became more likely to report that object4. Their study suggested feature boosts are possibly enough to create attention-like behavior in a machine. However, it was still unclear whether the same idea would work for speech, or whether this type of model could actually predict human behavior in new environments.What we actually did
Our model played a simple listening game. On each trial, it first heard one person speaking alone. This was the cue, or the voice to listen for. Then it heard a mixture where that same person was speaking at the same time as other sounds. The model’s job was to report the word spoken by the target person in the mixture, just like what you tried to do at the beginning of this blog post. This gave us a well-defined way to train the model. We knew the correct answer on every trial, so the model could learn from its mistakes (this is called supervised learning in the machine learning world). This task was inspired by the idea of using the memory of a sound of a voice to pick that voice out later.
The key change in our model was simple: we gave it a way to learn which features to boost and how much to boost them. When the model heard the target talker alone, it could ‘remember’ which internal features became active. Then, when the model heard the mixture, it could boost those same features. In that sense, we let the model try to pay attention in the same way that we think the brain pays attention.


    How the model listens for the target voice
  

      Cue voice
      target speaker alone

      
      These features mark the voice to listen for.
    

    →

    
      Mixture
      target voice plus other sounds

      
      Cue-matching features get boosted.
    

    →

    
      Target word
      model’s answer
      “tomorrow”
    

    Next step
  

    The model uses the cue voice to decide which internal features matter. When the mixture arrives, features that match the cue are boosted, helping the model recover the target word.
  

Did that actually work?After the model was trained, we were able to show it new examples that it had not seen during training. It was the same task, but with talkers and mixtures it hadn’t heard before. Then we gave humans the same listening tests, so we could compare the model’s mistakes and successes to real behavior. Like the model, human listeners heard the target voice, then the mixture, and then guessed the target word. In the first set of experiments, these were presented to humans over headphones. To imitate this, we presented the sounds to the model as if they were coming through headphones, too.

We saw some pretty cool things. Overall, the model matched human behavior pretty well when it comes to getting the word right. In the two plots below, you see human accuracy and model accuracy for two subsets of the experiment. Before we look at the results of the experiment, let’s listen to a quick example. In the mixture, the additional voice is called the distractor---it’s the voice you’re supposed to ignore. The example below uses the same target talker as above, but now the mixture has the target talker speaking at the same time as a distractor talker that is the same sex. Take a listen. It suddenly becomes harder than when they were speaking with a distractor talker of a different sex, right?


  Play Cue
  Play Mixture


The first result showed that this is also true for the model. When the distractor sounded more like the target, both humans and the model struggled more. Same-sex distractors were harder than different-sex distractors. English distractors were harder than Mandarin distractors for English-speaking listeners. The model showed the same pattern. This makes intuitive sense. The more similar the background voice is to the voice you are trying to attend to, the harder it is to keep them apart.


    The model made human-like listening errors
  

    a.
    Distractor sex

    b.
    Distractor language

    Human listeners
    Feature-gain model
    Human listeners
    Feature-gain model

    
      -9-6-303
      -9-6-303

      
      -9-6-303
      -9-6-303

      
      00.51
      00.51

      
      00.51
      00.51
    

    SNR (dB)
    SNR (dB)
    SNR (dB)
    SNR (dB)

    Accuracy
    Accuracy

    
      Same-sex distractor

      
      Different-sex distractor

      
      English distractor

      
      Mandarin distractor
    
  
    a. Same-sex distractors were harder than different-sex distractors for both humans and the model. b. English distractors were harder than Mandarin distractors for both humans and the model. SNR means signal-to-noise ratio: negative values mean the target voice is quieter than the background, so the listening problem is harder. Hover over dots with your mouse to see their values.

We want the model to match all aspects of human behavior. That means if humans are getting a specific example wrong much more than they are getting it right, the model should also do the same thing. To measure this, we looked at cases with only two talkers. In these cases, humans could make a very particular kind of mistake. Sometimes people did not just guess randomly. They reported the word spoken by the wrong person. We call this particular type of mistake a ‘confusion.’
That kind of error is especially revealing because it means attention selected the distractor instead of the target. As you can see below, the model didn’t only get things right in the same way, it got things wrong in the same way as humans, too!

  
    The model made the same kind of mistakes as humans
  

    Human listeners
    Feature-gain model

    
      -9-6-303
      -9-6-303

      00.51
      00.51
    

    SNR (dB)
    SNR (dB)
    Proportion distractor word

    
    Humans and the model were most likely to report the distractor word when the target voice was hardest to hear. As the SNR increased, the target became easier to separate from the distractor, and confusions decreased. SNR means signal-to-noise ratio: negative values mean the target voice is quieter than the distractor, so the listening problem is harder.

Listening is easier when voices come from different placesThe experiments above showed differences when the properties of the two talkers were more or less similar to each other, but they were presented over headphones, with no information about spatial location. Let’s think back to our buzzy Italian restaurant. When you are actually talking in a restaurant, some of the other people talking are closer to or farther from the person you are trying to listen to. In fact, we know that the farther away the distractor talker is from the target talker, the easier it is for humans to hear the target talker. To show this, we used a large speaker array that let us place voices at different locations around a listener. Unlike the first set of experiments, these trials included information about where sounds were coming from, instead of just what they were. We were able to test the exact same examples on our model and compare their performance. The model showed the same basic pattern as humans. The farther apart the voices were, the easier the task became. 


    Separating voices in space made listening easier
  

    Spatial separation between voices

    
      0
      5
      10
      20
      40

      -18
      -12
      -6
      0
      6
    

    How far apart the voices were
    Distractor angle from the target voice, in degrees

    
      Listening benefit
    
    
      Lower values mean the target was easier to hear
    

      Feature-gain model

      
      Human listeners
    
  
    When the distractor voice moved farther away from the target voice, both humans and the model needed less help from the target being louder. In other words, spatial separation made it easier to pick out the voice of interest. Human data were scanned in from the original publication5 and replotted.
Finding something new with our model
Up to this point, the story was pretty clear. Our model behaves a lot like people do. It succeeds in the same situations and fails in many of the same ones. But the most exciting test was not whether the model could explain results we already knew about. It was whether the model could point us toward something new.
We were very interested in the spatial effects of attention. Because running experiments on people is slow, it is impractical to test every possible combination of target and distractor locations. But with the model, we could evaluate them all. By first exploring this huge range of listening conditions in the model, we found a few especially interesting patterns that we hadn’t seen reported in humans before. Using these specific location settings, we designed focused experiments to see whether humans behaved the same way.

At first, you might think separation is separation. In other words, if two voices are farther apart, listening should get easier no matter which direction they move. But the model predicted something more specific. Separating voices left or right made hearing the target word easier than separating them the same amount up or down. Intrigued, we ran the same experiment in humans, and the same pattern showed up. One possible reason for this is that left-right location gives your two ears different information, while up-down location depends on subtler cues shaped by the outer ear.


    Left-right separation helped people pick out the correct word more than up-down separation
  

    Human listeners
    Feature-gain model

    
      0
      10
      60

      0
      10
      60

      -9
      -6
      -3
      0
      3
      6

      -9
      -6
      -3
      0
      3
      6
    

    How far the distractor moved
    How far the distractor moved
    degrees away from the target voice
    degrees away from the target voice

    Listening benefit
    Lower values mean the target was easier to hear

    
      Up-down separation

      
      Left-right separation
    
  
    Moving the distractor left or right made the listening task much easier for both humans and the model. Moving it up or down helped much less. This suggests that the model learned to use spatial cues in a way that resembles human listening.
  

Second, we saw that when the target voice was directly in front of the model, even a small distance left or right produced a noticeable improvement in picking out the word that the target talker said. But when the target voice was off to the side, the model needed a much larger separation to get the same benefit. It’s as if the “spotlight” of attention is sharper at the center and becomes less precise off to the side. This isn’t something that had been established by past literature, but once again, when we tested it in humans, it was real.

  
    Attention works best when the target voice is directly in front
  

    Human listeners
    Feature-gain model

    
      0
      10
      30
      90

      0
      10
      30
      90

      0.4
      0.5
      0.6
      0.7
      0.8
      0.9

      0.4
      0.5
      0.6
      0.7
      0.8
      0.9
    

    Separation between voices
    Separation between voices

    
 Accuracy

    
        Target in front
      

        Target off to the side
      
    
    When the target voice was directly in front, even a small separation from other voices made a big difference. But when the target voice was off to the side, the system needed much more separation before it helped. This pattern was the same in people and in the model, suggesting that attention is most precise at the center and less precise away from it.
  

What’s really interesting is that neither of these effects was engineered into the model or discovered from testing humans first. They naturally happened to show up from a model that was only trained to pick out one voice in a mixture using feature-based attention. The model went beyond just matching behavior, and became a tool for exploring it and revealing patterns we never knew about before!What this means
So what should we take away from all of this?
Our conclusion is that selective listening may not require a mysterious extra ingredient. A surprisingly simple idea went a long way: boost the features of the voice you want to attend to. When we built that idea into a model and trained it to recognize words in mixtures, it ended up successfully paying attention to the target talker in many of the same situations as human listeners.

What’s perhaps even more exciting is not just that the model succeeds when humans do, but that it fails in the same ways, too. That suggests that some of our difficulties hearing people in specific settings are not just mistakes or lapses, but reflect the limitations of the strategy your brain is using to separate one voice from another. There are situations where feature-based gain simply isn’t enough to cleanly separate one voice from another.

We still have a long way to go, of course. For example, in real life you do not always get a clean preview of someone’s voice before the restaurant gets buzzy. You can also decide to listen to “the person on my left” or “the person who just said my name,” which our model does not fully capture yet. But we are pretty excited about this model being a good first step.
Closing
Thank you so much for reading---I hope you think about this next time you’re at dinner with your friends! To make this study more digestible, I’ve skipped over a lot of details, but the full paper has much more about the model, the experiments, and the results. If you’re interested, you can read it here.

This is my first blog post, so I’m still figuring out what works. If anything was unclear, confusing, or particularly interesting, I’d love to hear your thoughts. If you’re curious about the project or would like to discuss any part of it further, please feel free to reach out.

References
1. Hubel, D. H., &#38;amp; Wiesel, T. N. Receptive fields of single neurones in the cat's striate cortex. The Journal of physiology, 148(3), 574–591 (1959). https://doi.org/10.1113/jphysiol.1959.sp006308
2. Treue, S., &#38;amp; Trujillo, J. Feature-based attention influences motion processing gain in macaque visual cortex. Nature 399, 575–579 (1999). https://doi.org/10.1038/21176
3. Krizhevsky, A., Sutskever, I., &#38;amp; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
4. Lindsay, G. W., &#38;amp; Miller, K. D. How biological attention mechanisms improve task performance in a large-scale visual system model. eLife (2018). https://doi.org/10.7554/eLife.38105

5. Byrne, A. K., Conroy, C., &#38;amp; Kidd, G. Individual differences in speech-on-speech masking are correlated with cognitive and visual task performance. Journal of the Acoustic Society of America 154, 2137-2153 (2023).&#38;nbsp;https://doi.org/10.1121/10.0021301
6. Griffith, I. M., Hess, R.P., &#38;amp; McDermott, J.H. Optimized feature gains explain and predict successes and failures of human selective listening. Nature Human Behaviour (2026).&#38;nbsp;https://doi.org/10.1038/s41562-026-02414-7


← Back to Blog</description>
		
	</item>
		
	</channel>
</rss>