diff --git a/2011-07-12-converting-subselects-to-joins.html b/2011-07-12-converting-subselects-to-joins.html
new file mode 100644
index 0000000..49bf5fa
--- /dev/null
+++ b/2011-07-12-converting-subselects-to-joins.html
@@ -0,0 +1,144 @@
+
+
+
+
+
+
mysql> SELECT Title
+ -> FROM Articles
+ -> WHERE ArticleId IN (
+ -> SELECT ArticleId
+ -> FROM Views
+ -> );
++-------------------------+
+| Title |
++-------------------------+
+| Interesting things |
+| More interesting things |
++-------------------------+
+2 rows in set (0.00 sec)
+
+
+What's wrong with this statement? It looks like it's trying to get a list of article names that have been viewed, and it seems to be doing its job. It's easy to read and to tell what's going on, even for someone with limited SQL experience. So what's there to fix?
+
+Notice that there are two SELECT statements above. The latter is called a subselect or subquery. Just like parentheses in mathematical expressions ("5 * (2 + 8)"), you're walling off part of your statement and asking for it to be completely executed first. If that inner statement produces a huge data set (imagine this is views of, say, reddit articles), it's bad if you have to store that entire result before moving on to finding the associated articles. In reality, database engines can optimize this and be smarter than storing the whole result set, but there are no guarantees.
+
+Fortunately, most subselects can be converted directly to joins. Let's look at a few simple examples. Given the tables:
+
+CREATE TABLE Articles (
+ ArticleId bigint(20) unsigned NOT NULL AUTO_INCREMENT,
+ Title varchar(255) NOT NULL,
+ PRIMARY KEY (ArticleId)
+) ENGINE=InnoDB;
+
+CREATE TABLE Views (
+ ArticleId bigint(20) unsigned NOT NULL,
+ ViewedAt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+ KEY ArticleId (ArticleId),
+ CONSTRAINT Views_ibfk_1 FOREIGN KEY (ArticleId) REFERENCES Articles (ArticleId)
+) ENGINE=InnoDB;
+
+
+We'll start with a simple query:
+
+SELECT Title
+ FROM Articles
+ WHERE ArticleId IN (
+ SELECT ArticleId
+ FROM Views
+ );
+
+
+This is easy because it's a positive relationship: "IN" as opposed to "NOT IN". As a join, it looks like:
+
+SELECT DISTINCT Title
+ FROM Articles
+ JOIN Views USING (ArticleId);
+
+
+"DISTINCT" is required because Views -> Articles is many -> one, and we only want each article title once. We can use "USING" instead of "ON" because the column name is the same in both tables.
+
+So, what if we have a negative query? Say we're looking for unviewed articles:
+
+SELECT Title
+ FROM Articles
+ WHERE ArticleId NOT IN (
+ SELECT ArticleId
+ FROM Views
+ );
+
+
+We can turn this into a join by using something called an outer join. Outer joins give us back all the rows in one table, then matching rows from another, or NULLs if they don't exist. An outer join between these two tables would look like:
+
+mysql> SELECT Title, ViewedAt
+ -> FROM Articles
+ -> LEFT JOIN Views USING (ArticleId);
++-------------------------+---------------------+
+| Title | ViewedAt |
++-------------------------+---------------------+
+| Interesting things | 2011-07-12 14:09:28 |
+| More interesting things | 2011-07-12 14:09:29 |
+| More interesting things | 2011-07-12 14:09:31 |
+| Rather boring things | NULL |
++-------------------------+---------------------+
+4 rows in set (0.00 sec)
+
+
+We can then filter back down to just unread articles. We'll also avoid referencing any columns but the ones we're already joining on:
+
+mysql> SELECT Title
+ -> FROM Articles
+ -> LEFT JOIN Views USING (ArticleId)
+ -> WHERE Views.ArticleId IS NULL;
++----------------------+
+| Title |
++----------------------+
+| Rather boring things |
++----------------------+
+1 row in set (0.00 sec)
+
+
+Take an example query that is looking for all articles that have not been read since a certain timestamp:
+
+mysql> SELECT Title
+ -> FROM Articles
+ -> WHERE ArticleId NOT IN (
+ -> SELECT ArticleId
+ -> FROM Views
+ -> WHERE ViewedAt > '2011-07-12 14:09:30'
+ -> );
++----------------------+
+| Title |
++----------------------+
+| Interesting things |
+| Rather boring things |
++----------------------+
+2 rows in set (0.00 sec)
+
+
+This is slightly more complex to convert, because the naive conversion returns the wrong answer:
+
+mysql> SELECT Title
+ -> FROM Articles
+ -> LEFT JOIN Views USING (ArticleId)
+ -> WHERE ViewedAt > '2011-07-12 14:09:30'
+ -> AND Views.ArticleId IS NULL;
+Empty set (0.00 sec)
+
+
+To solve this, we have to remember that we want the ViewedAt condition to be before the join, while the Views.ArticleId condition should be after. We can rewrite this to:
+
+mysql> SELECT Title
+ -> FROM Articles
+ -> LEFT JOIN Views ON (Articles.ArticleId = Views.ArticleId
+ -> AND ViewedAt > '2011-07-12 14:09:30')
+ -> WHERE Views.ArticleId IS NULL;
++----------------------+
+| Title |
++----------------------+
+| Interesting things |
+| Rather boring things |
++----------------------+
+2 rows in set (0.00 sec)
+
+
+
diff --git a/index.html b/index.html
index 2737257..e305428 100644
--- a/index.html
+++ b/index.html
@@ -20,6 +20,7 @@
2016-Feb-15: Cable modem channel party
2016-Feb-01: How to enrage your cable modem
2016-Feb-01: Hall of 2.4 GHz Shame, 2016 Edition
+2011-Jul-12: Converting subselects to joins
2011-Apr-22: Avoid MySQL round trips
2011-Apr-19: Video sharing sucks
2011-Apr-01: A new generation of Google MySQL tools
diff --git a/markdown/2011-07-12-converting-subselects-to-joins.md b/markdown/2011-07-12-converting-subselects-to-joins.md
new file mode 100644
index 0000000..e4882dc
--- /dev/null
+++ b/markdown/2011-07-12-converting-subselects-to-joins.md
@@ -0,0 +1,134 @@
+
+
+
+
+
+ mysql> SELECT Title
+ -> FROM Articles
+ -> WHERE ArticleId IN (
+ -> SELECT ArticleId
+ -> FROM Views
+ -> );
+ +-------------------------+
+ | Title |
+ +-------------------------+
+ | Interesting things |
+ | More interesting things |
+ +-------------------------+
+ 2 rows in set (0.00 sec)
+
+What's wrong with this statement? It looks like it's trying to get a list of article names that have been viewed, and it seems to be doing its job. It's easy to read and to tell what's going on, even for someone with limited SQL experience. So what's there to fix?
+
+Notice that there are two SELECT statements above. The latter is called a subselect or subquery. Just like parentheses in mathematical expressions ("5 * (2 + 8)"), you're walling off part of your statement and asking for it to be completely executed first. If that inner statement produces a huge data set (imagine this is views of, say, reddit articles), it's bad if you have to store that entire result before moving on to finding the associated articles. In reality, database engines can optimize this and be smarter than storing the whole result set, but there are no guarantees.
+
+Fortunately, most subselects can be converted directly to joins. Let's look at a few simple examples. Given the tables:
+
+ CREATE TABLE Articles (
+ ArticleId bigint(20) unsigned NOT NULL AUTO_INCREMENT,
+ Title varchar(255) NOT NULL,
+ PRIMARY KEY (ArticleId)
+ ) ENGINE=InnoDB;
+
+ CREATE TABLE Views (
+ ArticleId bigint(20) unsigned NOT NULL,
+ ViewedAt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+ KEY ArticleId (ArticleId),
+ CONSTRAINT Views_ibfk_1 FOREIGN KEY (ArticleId) REFERENCES Articles (ArticleId)
+ ) ENGINE=InnoDB;
+
+We'll start with a simple query:
+
+ SELECT Title
+ FROM Articles
+ WHERE ArticleId IN (
+ SELECT ArticleId
+ FROM Views
+ );
+
+This is easy because it's a positive relationship: "IN" as opposed to "NOT IN". As a join, it looks like:
+
+ SELECT DISTINCT Title
+ FROM Articles
+ JOIN Views USING (ArticleId);
+
+"DISTINCT" is required because Views -> Articles is many -> one, and we only want each article title once. We can use "USING" instead of "ON" because the column name is the same in both tables.
+
+So, what if we have a negative query? Say we're looking for unviewed articles:
+
+ SELECT Title
+ FROM Articles
+ WHERE ArticleId NOT IN (
+ SELECT ArticleId
+ FROM Views
+ );
+
+We can turn this into a join by using something called an outer join. Outer joins give us back all the rows in one table, then matching rows from another, or NULLs if they don't exist. An outer join between these two tables would look like:
+
+ mysql> SELECT Title, ViewedAt
+ -> FROM Articles
+ -> LEFT JOIN Views USING (ArticleId);
+ +-------------------------+---------------------+
+ | Title | ViewedAt |
+ +-------------------------+---------------------+
+ | Interesting things | 2011-07-12 14:09:28 |
+ | More interesting things | 2011-07-12 14:09:29 |
+ | More interesting things | 2011-07-12 14:09:31 |
+ | Rather boring things | NULL |
+ +-------------------------+---------------------+
+ 4 rows in set (0.00 sec)
+
+We can then filter back down to just unread articles. We'll also avoid referencing any columns but the ones we're already joining on:
+
+ mysql> SELECT Title
+ -> FROM Articles
+ -> LEFT JOIN Views USING (ArticleId)
+ -> WHERE Views.ArticleId IS NULL;
+ +----------------------+
+ | Title |
+ +----------------------+
+ | Rather boring things |
+ +----------------------+
+ 1 row in set (0.00 sec)
+
+Take an example query that is looking for all articles that have not been read since a certain timestamp:
+
+ mysql> SELECT Title
+ -> FROM Articles
+ -> WHERE ArticleId NOT IN (
+ -> SELECT ArticleId
+ -> FROM Views
+ -> WHERE ViewedAt > '2011-07-12 14:09:30'
+ -> );
+ +----------------------+
+ | Title |
+ +----------------------+
+ | Interesting things |
+ | Rather boring things |
+ +----------------------+
+ 2 rows in set (0.00 sec)
+
+This is slightly more complex to convert, because the naive conversion returns the wrong answer:
+
+ mysql> SELECT Title
+ -> FROM Articles
+ -> LEFT JOIN Views USING (ArticleId)
+ -> WHERE ViewedAt > '2011-07-12 14:09:30'
+ -> AND Views.ArticleId IS NULL;
+ Empty set (0.00 sec)
+
+To solve this, we have to remember that we want the ViewedAt condition to be before the join, while the Views.ArticleId condition should be after. We can rewrite this to:
+
+ mysql> SELECT Title
+ -> FROM Articles
+ -> LEFT JOIN Views ON (Articles.ArticleId = Views.ArticleId
+ -> AND ViewedAt > '2011-07-12 14:09:30')
+ -> WHERE Views.ArticleId IS NULL;
+ +----------------------+
+ | Title |
+ +----------------------+
+ | Interesting things |
+ | Rather boring things |
+ +----------------------+
+ 2 rows in set (0.00 sec)
+
+
diff --git a/markdown/index.md b/markdown/index.md
index ce1b20b..2d35c40 100644
--- a/markdown/index.md
+++ b/markdown/index.md
@@ -19,6 +19,7 @@
1. 2016-Feb-15: [Cable modem channel party](2016-02-15-cable-modem-channel-party.html)
1. 2016-Feb-01: [How to enrage your cable modem](2016-02-01-how-to-enrage-your-cable-modem.html)
1. 2016-Feb-01: [Hall of 2.4 GHz Shame, 2016 Edition](2016-02-01-hall-of-2-4-ghz-shame-2016-edition.html)
+1. 2011-Jul-12: [Converting subselects to joins](2011-07-12-converting-subselects-to-joins.html)
1. 2011-Apr-22: [Avoid MySQL round trips](2011-04-22-avoid-mysql-round-trips.html)
1. 2011-Apr-19: [Video sharing sucks](2011-04-19-video-sharing-sucks.html)
1. 2011-Apr-01: [A new generation of Google MySQL tools](2011-04-01-a-new-generation-of-google-mysql-tools.html)