Clustering Images with Autoencoders and Attention Maps

(Note: You can find the full notebook for this project here, or you can just scroll down to see the cool images it makes.)

I recently approached a new project where I wanted to create a model that sorted images into similar, automatically-generated groups. I hadn’t done an unsupervised clustering project with neural networks before, so this idea was intriguing to me. I looked through the Keras documentation for a clustering option, thinking this might be an easy task with a built-in method, but I didn’t find anything. I knew I wanted to use a convolutional neural network for the image work, but it looked like I would have to figure out how to feed that output into a clustering algorithm elsewhere (spoiler: it’s just scikit-learn’s K-Means).

Why not just feed the images into KMeans directly? Well, it could work, but the number of features would be out of control for my processing abilities, even after renting a GPU. My dataset was 200,000 images of faces from IMDB and Wikipedia, centered and pre-cropped around each face. Even with the nicely formatted data, the preprocessing was fairly extensive, and eventually included resizing all the images to 150×150 pixels. With three channels (RGB), that means (150x150x3) = 67,500 features and 200,000 examples. That’s a lot of information, and a lot more than we need to cluster effectively.

The solution I found was to build an autoencoder, grab an attention map (basically just the compressed image) from the intermediate layers, then feed that lower-dimension array into KMeans. Okay, so what does that mean?


The basic idea behind a CNN autoencoder is that you take the training data, feed it through alternating convolutional and pooling layers until you get to the desired compression, then feed it through convolutional and upsampling layers until you get back to the original size. You set the output to be the same as the input, so the model is learning to compress and unpack images to get them to be as close to the original as possible. My model looked like this:

CONV → MaxPooling → CONV → MaxPooling → CONV → MaxPooling → CONV → UpSampling → CONV → UpSampling → CONV → UpSampling → CONV

I made sure the padding and pool shapes lined up so the output was the same dimension as the input, then trained it on X_input = X_output. Voila! A lot of information is lost in the process, but luckily we’re not using this solely as an image compression technique.

Before encoding:

After decoding:The idea here is that the autoencoder is capturing the essence these images. Ideally it is keeping only the most important features. And it should be noted that nothing about this model is trained on finding the faces — this clustering works because the images all have a similar formatting. It could be modified to work on a model that is specifically trained on finding face components, though.

So I have a model that both encodes and decodes images. I want the compressed image, so I have to grab the intermediate output, i.e., the output from the last pooling layer before decoding begins. This turned out to be easier than I thought, and takes just one line of code:

get_encoded = K.function([auto_model.layers[0].input], [auto_model.layers[5].output])

But don’t worry about the code. All this does is run the model from layer 0 to layer 5 — the encoding portion. The output of this compression process is an array of shape (19, 19, 8). So we’ve gone from 67,500 features to (19x19x8) = 2,888. Much better. We could have started with larger images and compressed more before grabbing the output. Can we visualize this compressed array to see what we’ve got?

Attention Map

One of the main criticisms of neural networks is that they are very black-boxy. It’s hard to figure out why they do what they do. With image data, however, we have a convenient way of seeing which values the convolutional layers pass on to the final layers of a model. For our autoencoder, these internal layers are actually the whole point of the model. So let’s run the model just through the encoding layers (or for a supervised model, you could run just as far as the final convolutional layer).

Now the problem is that the output has too many channels to visualize properly (remember, our array is 19x19x8). We can just pool once more over the final dimension (like, encoded_array.max(axis=-1)) to get an array that is (19x19x1). This should tell us which pixels are important, although we do still lose the information as to why they are important (that is contained in the dimension we just pooled over). Here’s what it looks like:

Cool. Then model is mostly interested in bright spots and solid colors and celebrities. Now let’s actually do something with those compressed images.


Here we go! K-Means seems like the most straight-forward model for the task, and I found that 25 clusters gave enough variety without making the clusters overly broad. This part of the project is as simple as plugging the encoded array, shaped (200,000, 19, 19, 8), into scikit-learn and grabbing the labels as the output.

Here are images from a few of the clusters it created:

Cluster 1:

Cluster 5: Cluster 12: Cluster 19: Cluster 24: Okay, this is pretty cool, but with reservations. While some elements are obvious (Cluster 24 is low-lit, Cluster 19 has white backgrounds, Cluster 12 is more gray/neutral), it’s hard to tell what the defining features are for each cluster from just a few images. Remember, each cluster has 4,000 – 15,000 images. This is way too many to look through by hand (by eye?). We could average all the images themselves, but I think it would be better to average the encoded versions of all the images so we’re looking at the same thing the clustering algorithm was looking at.

Again, it’s just one line of code, this time taking the mean over the first dimension (axis=0). Here’s what the average encoded image looks like for each cluster:

Alright! Now we’re getting somewhere. These clusters are clearly distinct from one another, so something is going right. The last thing I want to do is see what these averages look like as full RBG images. How we can make an encoded image into a full size image? Decode it! So we’ll just run these averaged images through the second half of our autoencoder model to bring them back to life. Here we go!

And immediately I realize I’ve created something creepy. Really cool, but creepy. This makes it a lot clearer what each cluster is looking at. For this dataset and how I’ve built the autoencoder, it mostly focuses on lighting and shape, but it’s pretty obvious that this could be used to profile by skin color or other physical feature. Of course, it could also be used to separate unlabeled, non-face images into groups. Given the amount of unlabeled data in the world (which is a big deal in the autonomous vehicle world), this could be a really useful technique.

This is also a good reminder of how powerful these techniques are. I was able to build this model on my own, in a couple weeks, using only publicly-available data. The sophistication of similar techniques at Google or Facebook must be staggering.

Thanks for reading! Feel free to leave feedback or comments or links to a project you’ve done like this one. Happy clustering!

Short List for Healthy Living

Here’s what healthy living means to me, in its most basic components:

1. Put good things in my body.

Pretty much the only things that count are water and vegetables. A good protein and some fish oil. Most other things are so-so.

2. Don’t put bad things in my body.

Sugar is the main thing I try to avoid, which includes alcohol (it has a lot of sugar in it). My body does fine with grains, and I drink too much caffeine.

3. Do some kind of intentional movement everyday.

The more the better. If I can do yoga, lift weights, and go rock climbing all in one day I feel blissed out.

4. Interact with real people more than virtual people.

Depending on work, I can’t necessarily get away from screens, but limiting social media and connecting face-to-face is crucial.

5. Get enough good sleep.

I need to be in my bed for eight hours a night most nights. Doing all the other things makes the actual ‘sleep’ part a lot easier.

6. Notice how I feel.

For me, this includes physical sensations, emotions, social and spiritual reactions, and starting to understand that those are all manifestations of the same thing. It’s harder to do this when the feelings I notice are bad, but also more important.

7. Be open and have fun.

And don’t be too strict.

Dichotomy and Integration

Visual representation of Hegel's dialectic of Thesis, Antithesis, Synthesis

So many things in life feel impossibly true. Or rather, they could be true, if only their opposite didn’t also feel so obviously true. What’s the deal? Are our minds too rigid to understand nuanced things? Is there some deeper universal truth that manifests itself in two apparently incompatible ways? Like wave-particle duality in quantum mechanics? I don’t know at all. But I’ve been thinking about it a lot.

Yesterday I made a list of the apparent polar opposites I’ve been struggling with:

  • Computers vs. Nature
  • Science vs. Mysticism
  • Self Sufficiency vs. Communal Dependency
  • Solitude vs. Interconnectedness
  • Art vs. Art
  • Being Too Much vs. Being Too Little
  • Life and Everything Matters vs. Nothing Matters and We Die
  • Internet vs. No Internet
  • The Mind as a Solution vs. The Mind as the Problem
  • Accepting Life vs. Making Shit Happen

It was an fun day. But some of these things are really challenging to me, particularly when I start to think about deeper purpose in my life. As a practical example, I’ve been thinking about getting a job in tech and machine learning, doing data analysis work similar to the research I did in grad school. This is a major departure from teaching meditation, which I do now (and which I’d continue on the side). I’m excited for the intellectual possibilities, using my brain in an analytical way that I haven’t done in a while. But I’m also hesitant about the idea of being in front of a computer for long hours and at a desk again for the first time in a few years.

Is it possible to integrate these two worlds? Can I live a fluid, flexible, creative life, and also work a full-time, data-focused, tech job? I suspect that this barrier is largely self-imposed, and that maybe you don’t even see these two as being separate. And you are probably right. My specific personal experiences and biases are categorizing these two ideas of how to live, finding all the differences and ignoring the similarities and opportunities.

I have a vague memory of reading or hearing or imagining this idea one time. Maybe it was yoga. “The more it seems like two opposites can’t both be true, the more likely they both are.” Or something. But that’s the gist of it. Some of these things are hard to admit. That people can be racist and loving at the same time. That people who commit heinous crimes deserve our deepest compassion. That the world is terrible and beautiful. That our minds both destroy us and create us.

And in a practical way, that doesn’t really help me answer the question of whether I’ll be happy working at an office again. But it does remind me that there are not only as many ways to live as there are people in the world, but that every person might live a different life every day. And that, chances are, it’s probably a good bet to stay open to the world, to connection, and to new experience, and to treat everyone the best I can along the way.

#015: 20-Minute Breath/Body/Heart Meditation

If you’ve tried out some of the 10-minute meditations, see if you can find twenty minutes for this sit. If it feels daunting to do that, notice where those feelings are coming from. Does it feel like you don’t have enough time? Are you worried you’ll get impatient? Is it just scary? Try it out, and see what happens. Twenty minutes is a wonderful length for a meditation, and allows for deeper work that just ten.

I hope you’ve been enjoying the podcast! Your kind words (and of course any ratings, reviews, subscribes, etc) are greatly appreciated.

21 minutes.

Taking the Plunge Part V: Sadness and Living

Where to start. I just turned 33. I feel kind of old. I have a knee injury from playing ultimate frisbee. I live in Portland, OR, and bought a house. I saw the total solar eclipse and it was the most amazing thing I’ve ever seen (Chile/Argentina 2019 anyone??). I’ve been exploring parts of myself I haven’t explored much before. Sadness, mostly.

I just read through a lot of my old blog posts. I used to do this thing where I’d write an update on my life (Pt. 1, Pt. 2, Pt. 3, Pt. 3.5, Pt. 4), so I guess this is kind of like a continuation of that. But also kind of a different one. Because here’s the thing: I don’t really write anymore. Or rather, I haven’t written anything much for a long time. I like the things I used to write. But I only made one blog post in all of 2016. I had a few earlier this year, mostly about how I’m scared/curious about robots and economics. Very forced. Nothing about my feelings, except that short, cryptic thing about wolves. I also don’t do most of those things I wrote about being energized by in those previous posts: art, music, running (I hurt my knee), yoga (maybe once a week).

I want to get into that in this post. I’m probably going to be a little more long-winded and honest than is comfortable (Hi Mom!), but hey, if there can’t be a little honesty on the internet, what’s the point? (I’m also not going to touch on how fucked up and crazy the world and our country have been through the last year, and how awful some those things have been for so long, and especially with Harvey happening right now, but those have all affected me deeply through this process.)

I haven’t been writing, or doing those other things, because I’ve been deeply sad and hurt for a long time. It’s hard to say that. Or write it, I guess, but imagine I’m saying it. I would maybe consider saying that I’ve been depressed, but I don’t want to use a word that might belittle other people’s experiences. But I have been utterly, completely sad. I’m starting to understand it better now, and I feel like writing about it might help. And that maybe the things I’ve learned so far might resound with you, kind reader.

The catalyst for my sadness was a relationship that fell apart, repeatedly and painfully. I was sad in the relationship. I was sad when we were apart, and when we were back together. We dated for about two years, we had high highs, low lows. It was the kind of relationship that made me think about kids and family, but also there was something weird and off, and I was still sad, and then she broke things off for good a few months ago. And since then I’ve been heart-broken, a stronger word for the kind of sad I’ve been feeling, but slowly I’ve also become more understanding. I’ve cried more in the last six months than in the rest of my post-infant life combined. That has been terrible and wonderful and devastating and healing.

Some of the sadness has been acute: curling up in a ball, wailing (Men, I highly recommend you try some more wailing. It’s been trained out of us, but is the best), breaking down in front of friends or by myself or in public. Some of the sadness has been slow, lingering: feeling uninspired, bored, unmotivated, tired, unattractive. Sometimes I’ve felt like I’ve begun to rebuild my sense of self and wellbeing, and something will happen, or I’ll hear something, or I’ll see her, or nothing will happen, and it will be gone. My Lego castle of health and okay-ness will be overturned by some tantrum, the pieces scattered around the room and all the different thematic sets mixed together. I’ve begun to learn that sadness is not a bad thing, but it is a hard thing.

Part of the reason, I think, that this break up has been so hard for me, is that it has brought me so close to some of my deepest, most primal wounds. The first one I knew about, at least in some vague intellectual way: abandonment. My dad died when I was young, and this wound will always be with me. It is the kind that doesn’t fully heal but that I will slowly learn to have as part of me, I think. I don’t mind being alone (actually, I need it a lot), but I hate being left behind, or ignored, or excluded. Those things are going to continue to happen throughout my life, and now I’m realizing I need to learn what that feels like and how to be okay when it does happen. And to not try so hard to bend situations to my own liking in order to avoid having to feel that. I do this a lot, and it makes things worse over time.

The other one, which I’m only starting to have a grasp on, is feeling misunderstood. I hate it when I feel like people don’t get me, or that I’m not heard, or can’t express myself. But I also have this story I tell myself, that I’ve told myself for most of my life, that I’m complicated and hard to understand. I do things to perpetuate this story, even though it hurts me. And I’ve recently been playing with this idea that part of what drew me into this relationship in the first place (and why it’s been so painful to be out of touch since the breakup, unable to explain myself to her) was that I felt misunderstood and unheard by her, and that the relationship helped me keep this story alive. The truth is, I’m not that complicated or enigmatic. I’m smart and thoughtful and capable, and sometimes I don’t clearly vocalize my intentions or feelings, but that doesn’t mean it can’t be done. And lots of people are like this. I can learn to say what I need to say.

The sadness, the heart-break lingers, and I keep reminding myself of a few things. I’m not the only one who has felt this way. The fact that I can feel this way means my humanity works: I am capable of love. Time will help me heal, even though it’s already been so much longer than I feel is reasonable, and even though she seems to have gotten through it so much faster. Everybody is different, and this is my path. And I was sad when we were together, too.

These last two seem important, because they both point to this important thing. I am responsible for my own life. In retrospect, I can see that our relationship was destructive for me. I don’t mean to put blame on my ex-partner — she is a wonderful human and has her own path. But together we created this thing that allowed me to break myself down, to forget who I am, to fall apart. I feel less comfortable and confident and creative in my life than I did three years ago. I feel diminished. But this is something I had to learn. I’m in a hard spot, but I’m beginning to realize that I will be more compassionate, whole, and wise with the experience than without it.

The moments of calm and clarity are beginning to spring up closer together and stay for longer, even though I am occasionally (often) derailed. And this is a big part of why I say “sad” and hesitate with “depressed”: I do feel like I’m growing and, little by little, living more fully. I am realizing that feeling diminished now will allow me to thrive more deeply and thoroughly in the future. I will be more ready for future relationships. And I am slowly discovering myself again. All these threads of sadness and anger and confusion from my ex and from the relationship lead back to me. They are my things do deal with and learn from, not anyone else’s.

I am responsible for creating purpose in my own life.

I am responsible for how I interpret my experience of the world.

I am responsible for learning about myself, especially about the difficult parts.

I am responsible for how I express those difficult things in the world.

I am responsible for feeling my feelings.

Most importantly, I have loving friends and family who want to help me do all those things.

So, thank you, and I love you, and I’m sorry, and please forgive me, and thank you again.

And now I’m crying at a coffee shop again, but in a good way.