Database best practices for future scalability

This commit is contained in:
Ian Gulliver
2019-04-15 03:41:48 +00:00
parent 9bc8a771c9
commit c06104b589
5 changed files with 198 additions and 0 deletions

View File

@@ -0,0 +1,42 @@
<!--# set var="title" value="Database best practices for future scalability" -->
<!--# set var="date" value="August 8, 2011" -->
<!--# include file="include/top.html" -->
<p>Theres a perpetual debate about how much effort to put into scalability when first designing and building a modern web application. The opposing arguments are roughly: “Scalability doesnt matter if your app isnt successful; features should come first” vs. “When you know that you need to scale, you wont have time to do it.” Below are some suggestions for scalable MySQL schema design that should get you on the right path, without being onerous enough to slow you down. Lets start with things that are almost free:</p>
<ol>
<li><p>Use InnoDB. Youre eventually likely to want transactions, foreign keys, row-level locking or relative crash safety. You can Google and find lots of InnoDB vs. MyISAM comparisons, but if you dont want to dig too deeply, “just use InnoDB” is a safe place to start.</p></li>
<li><p>Dont store data that you dont intend to use relationally. Using MySQL as a key/value store or to store encoded data (XML, <a href="http://code.google.com/p/protobuf/">protocol buffers</a>, etc.) in BLOB/TEXT may work, but dedicated key/value stores are likely to be more efficient.</p></li>
<li><p>Try to design your schema with as few hierarchies as possible. For example, take the following table layout, with arrows representing many-to-one relationships:Perhaps A=”Users”, B=”DVDs Owned”, C=”Logins”, D=”Times Watched”, while E=”Administrators” and F=”Changes”. These are two hierarchies, since A and E have no links to each other. Minimizing the number of hierarchies (keeping it to just one is awesome!) makes it easier to shard later. Schemas with cross-links (say F also links to A, or a table records transfers between two different users, linking to A twice) are very difficult to shard.<br>
<img src="data:image/png;base64,<!--# include file="images/db-hierarchy.png.base64" -->" alt="" style="background: white;"></p></li>
<li><p>Use BIGINT UNSIGNED NOT NULL (64-bit required numeric) primary keys on every table. AUTO_INCREMENT is fine, at least to start. You can skip this for many-to-many link tables; just put both link columns in the primary key. Having single-column, numeric primary keys makes it easier to do things like drift checking and traversing between tables.</p></li>
<li><p>Use BIGINT instead of INT for all keys. The cost in space (4 vs. 8 bytes) and compute time is so small that youll never notice it, but the cost of a schema change later to increase the field size, or an accidental wraparound, can be enormous. As you expand shards, your key space becomes sparse and grows rapidly, so wide keys are critical.</p></li>
<li><p>Split off your data access code into an <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">ORM layer</a>, even if its very simple to begin with. This will be where your read/write split, shard lookups and change tracking will live later.</p></li>
<li><p>Don't use <a href="http://dev.mysql.com/doc/refman/5.1/en/triggers.html">triggers</a> or <a href="http://dev.mysql.com/doc/refman/5.1/en/stored-routines.html">stored routines</a>. Keep this logic in your ORM layer instead, to give yourself a single point of truth and versioning system.</p></li>
<li><p><a href="2011-07-12-converting-subselects-to-joins.html">Avoid subselects; use joins instead.</a></p></li>
<li><p>Dont use <a href="http://dev.mysql.com/doc/refman/5.1/en/views.html">views</a>, unless youre using a third-party ORM (rails, django) that mandates a schema structure that isnt ideal.</p></li>
<li><p>Avoid network round-trips whenever possible. Use the <a href="http://dev.mysql.com/doc/refman/5.5/en/insert.html">multi-row insert syntax</a> where possible. Enable the <a href="http://dev.mysql.com/doc/refman/5.1/en/mysql-real-connect.htmlhttp://dev.mysql.com/doc/refman/5.1/en/mysql-real-connect.html">CLIENT_MULTI_STATEMENTS</a> flag at connect time, then send groups of statements separated by ";". </p></li>
</ol>
<p>Then, things that cost development time, in increasing order of complexity:</p>
<ol>
<li><p>Use <a href="http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-foreign-keys.html">foreign keys</a>. Dont make references nullable; use a flag field to mark whole (sub-)hierarchies deleted instead. Combined with the hierarchy rule above, this means that you guarantee yourself that youll never end up with orphaned rows.</p></li>
<li><p><a href="http://dev.mysql.com/doc/refman/5.1/en/replication.html">Write to masters; read from slaves</a>. This can be quite complex, since you have to worry about replication delay. For example, you cant have one web page hit cause a write, then the next hit render the results by reading from the database, because the result might not have replicated. However, this enables significant scaling, because hooking up more slaves is much easier than sharding.</p></li>
<li><p>Dont store event-based data as one row per event. If you record page views or clicks in the database, aggregate that data into one row per hour, or per day. You can keep logs of events outside of the database in case you need to change aggregation and re-generate historical data, but dont keep every event in a hot table.</p></li>
<li><p>Stop using AUTO_INCREMENT. Instead, keep a table <a href="http://www.reddit.com/r/mysql/comments/jcw8o/database_best_practices_for_future_scalability/c2b2o4v">IdSequences</a>, and do something like: </p>
<p><code>
BEGIN;
UPDATE IdSequences SET LastId=LAST_INSERT_ID(LastId+Increment)
WHERE TableName='A' AND ColumnName='b';
INSERT INTO A (b, c) VALUES (LAST_INSERT_ID(), foo);
COMMIT;
</code></p>
<p>This lets you change IdSequences later to modify your sharding scheme.</p></li>
<li><p>Create an empty shard (new database, same schema, no data) and add test rows. Teach your application to choose which shard to talk to. This will require some method to look up a shard for the root of each hierarchy; keep all data linked to a particular root on the same shard, so you can JOIN it. At its simplest, the lookup can be (ID mod NumShards). If you have uneven shard growth, you may need an indirection table to map from virtual shard (determined by modular division with a large divisor) to physical database.</p></li>
</ol>
<!--# include file="include/bottom.html" -->

View File

@@ -0,0 +1,103 @@
iVBORw0KGgoAAAANSUhEUgAAAOEAAAC6CAYAAABY+ipOAAAABHNCSVQICAgIfAhkiAAAAAZiS0dE
AP8A/wD/oL2nkwAAAF96VFh0UmF3IHByb2ZpbGUgdHlwZSBBUFAxAAAImeNKT81LLcpMVigoyk/L
zEnlUgADYxMuE0sTS6NEAwMDCwMIMDQwMDYEkkZAtjlUKNEABZiYm6UBoblZspkpiM8FAE+6FWgb
LdiMAAAWDklEQVR4nO3deXQUVdrH8W9Ik4QsECDsAYNEwmYERQUNAsIxKpscNhkQRPFVQMgwiMvA
iIw6OqKig4MwOqNsCsIIEtkFohGRYQlCZBchICECEshGkk76/aMoEszWW+pWdz+fc+pYCd3Vv1j1
dFXdqroXhBBCCCGEEEIIIYQQQgghDOenOoAQLooEog3+zLPAIYM/0ysNA1YCpwCbgVMesB+YhfEb
j7cIAp4DjmLsuis9pQOzgXBX/xhf3BNGAcuBzopzAFiBV4CZqoN4kGggEWgDEF6vPi1btcbfYjEs
wKkTx8lI/0X/8SwwBPjW2eX5WhFGA8lA40ZNmjF20rN07hJHVKvWhgXIzcnh8IF9rFq6gDUrl1Fk
tQJ8DIwxLITnagzsBCKjWrXmhVfe4o67eygJcmDfHma/Op1d25MBcoA4YK8zy/KlIrSgrcCOnbt2
493/LCc4JERpoP9tSyLhsaG2K3l5fsBY4N9KA5nfSuChTrd35b2FK5WvvyKrlRnPjGPN50sBjgFt
0Y5uHOJLRTgCWNw86kaWrvtO+QrUrV25jGkJY0E7rGkJXFGbyLQ6AinBISF8vmU3jZo0U50H0Apx
4L23cerEcYCHgWWOLqOG21OZ16MAj42fYpoCBHhw4DCiY9qBdqjVXXEcMxsE0H/ISNMUIIC/xcKI
xyfoPw52Zhm+VIRdAHo9OEB1jjK69bpfn+2iMofJtQHo3KWb6hxl3BnXQ5/t6Mz7fakIQ4NDQgir
XUd1jjJaRl9rGIpSGMPsIgDq1K2nOkcZAYFB+qxTTbS+VISE162vOoIQZfhUEQphRlKEQigmRSiE
YlKEQihm3A13XqSgIJ/7Ot/EpcyL1Amvy8ZdRwkICFQdS1ShU4swu16XkpZVzUmuJ3tCJ2xdn8il
zIsAXMq8yNYNXypOJDyZ7AmdsGrZIgD6DR5B4oolfLFsEfH9BilOJexl9J6uKrIndNDZX07xv21J
1K0fwQuvvE14vfrs+HYrZ8+cVh1NeCgpQgd98dliiouL6TPwYWoFB/PAgKEUFxez+rPFqqMJDyVF
6ACbzcbq5Vqx9R8yAoABQ0cCsHr5Ymw2m7JswnPJOaEDdn73NWdOpxHTPpab2nYAIKZ9LDHtYzn8
4z52bv+GO+6SByHMrrJWUhXni7IndIDeIKPvBXUPDX1E+/elCw3PJDyf7AntlHX5EpvXrQZg1kvP
Meul58q8Zsv6RLKzLhMaVtvoeMIB0jrqodat+oyC/Mofes+/kse6VZ8ZlEh4CylCO+mHov9c+Dkp
aVllpn8u/Py61wlhLylCOxw9mMrB/Sm0jI6ha/fe5b6ma/feRLW6iQP79nD0YKrBCYUnkyK0w8qr
DS5/eGwcfn7l943l5+fH8DHjANkbml2nFmGVTkaTIqxCYWEBa1cto054XfoOGl7pa/sNHkFY7Tqs
XbWMwsICgxIKTyeto1WoWTOApB9O2vXaWsHBfJMqt6+ZldlaRXWyJxRCMSlCIRSTIhRCMSlCIRST
IhRCMSlCIRTzqSLMOHtGdYRy5efn67MOD6vlQ64A5OXlqs7hdr5UhJlFVmvpEVZN48SxI9dmFcYw
u2MAB37YozpHGQf3peizx5x5vy8V4VcAGxP/qzrHdYqsVrZuSNR//FplFpP7EmDNyqVVPs1itKUL
5uuzTnW750tF+CHAB/94gzOn01RnuebDObP0PKm4MO65D9gM7D114jizX52uOss1q5Yu1IfMzgQW
OLMMf7cmMrefgM4F+fmtv0v6is5du1EvooHSQJ/8Zy7/eH0GaOeCI4GflQYyt2LgB+DR1L27a/x6
9gy33nk3gSXDkhmqIP8KH86ZxTuv/QVbcTHAU8AOZ5blS8NlA4QCyUDHmjUDbH0HD/frGd+PWrWC
DQ3x05GDrFy6gMM/7tN/9UfgXUNDeK5BaHuckOCQELr1eoCWrVpX9R63OnP6JNuSvuLCuQz9V88A
bxkawsOFAh8AhYBN8ZQOmG/oYPPrAGxF/fpLAXq5+sf42p6wtA5o36pdAEeOaUKASCALaAAUABeu
TkV2LuM0sB74L1eb3oVTooHuaOvDXq2Bu9DO4VY5+bnngW3AXiffL5zUC0hE+wY8h1a4bSjZo+YB
64DHgcaKMoryWYDRaHsuG9o6i1aaSNjNAoygZOXp04xSr5lB+YcqycAUZGWrFI62DtK5ft2sVBlK
2KeilWcDLl79d50FOFjO60pP+4GXgc7GxPd5kcBstHVV3vroqC6aqEoU2srLouKCmlHO+7pU8vrS
0+xqTS86AouovNEtscJ3C+WmUHWLaR4QUcH751Tx3u1IdyLVqTP2tXj3UJRP2EG/hujsniwcOFXB
+zJxrAVPOGcQlRfiVnXRhL1Cqfj8Lo+qWz37VPBeK+Ufxgr3G4R2qai89RCvMJdwwL9w7XxuOWW/
ffVv53VUfDgr3COBii+sCw8wG22FZaNdUNdXYCFao409GlPSKrcJ7TwwHu3aog3t/lBpJa0eegEW
ArO4/tBU7kwyOQvwKdrKygLi0IpJPzT9yMHljUY7Pyy914tEa5zRN5IJrkUWv6N/gRaiHZKCth5s
wFFVoYR9LJQcQuoFqGuMdo3PmQvubSr4LH1jsaEVfqgTyxbXK68AdQloxShMqrIC1FVHkQyi5Frk
QbT7WIVzKitAYXL2FGB1iqbkdrgs5NvaGVKAHiwU7f5BG9qtaaoaSkLRzjf1w9P3ceyJDl8mBejB
Sl+UT6f8czejjUa7Dqk3pUcpTWN+egFmIa2eHseMBajriNaKZ0O7xCEbV1kWSm4PVHEKIVxk5gLU
lT5MtgGvIfeb6lSfwwsXlS7AU5izAEtLoORCczLykLAUoIfTr/XplwM8ZYOOo+SG8HTc0JeJh5IC
9HCl73rxpALURaDd+qa3Ak5TG8dwUoAeztMLUGdBewLD124C95kCNHtvax2AG3B8o6sDPA80QTuU
ex24ZOd7T6L1hn3ewc+sbvHAYrT/FyeAIcAulYHsEAe0x7kvwMFo6z8b+ASwdzSfbLQxITZfnTc9
MxahBe3m5mdQ9yCsFUgCXqB6N/Q44G60jTS8iteC1t1iD7SuFovRNjRHR7jJRHtCZDPV12XfaOAl
1F7vvALMA2ai/c2mZbYibIzWRN8FoFGTZrSKaUdEg4YOL8hq1UYZs1jsb+EvKioi7cRPHPlxP/na
oCNWYDrwd4cDVK4X8A7q7xndBUwEvnfT8ixodwCNBGga2YLOXbvRNPIGpxZWZLXi78D6A8i6fIkD
+/awL2UnRdo2cAzoBxxyKoQBzFSE17qobxrZgqkvvUGP+/ooCZJ1+RIfzX2bj+a+rf/qKWB+JW9x
xBTgTdC+ZHrG96VpZAvC69V3aCEF+fkEBAY6/OGZv13g55+O8M1X6/Vu3K3A07jn75sDPB1Uq5bt
uZlv+j308Cg3LNI5h3/cx4xnxulDDRwDbseke0QzFeHLwPTmUTfy0X83Ur9BI9V5WLtyGdMSxoK2
obbFyfHnShkBLPa3WHhi4rOMnTjV4W96dynIv8K82a/pXzRWoC+wwYVFdgG2BwQGMe+T1XS6vas7
YrokNyeHJ4Y9yIF9e0A78pisOFK5zDIqUyhal+SW9xev4oYbb1KdB4Cb2nYg/XQahw/srwEEAGtc
WFw4sBEIevH1OYx84mlq1FA3Mp2/xcKdcT3xw49d3yfXQCuiD3B+tOA5QJsnJj1L30HD3ZbTFTUD
Auhwy22sXLYQW3FxR7RBdwpU5/o9s4xPGA8Edbq9K+1ib1Wd5Tp/eGy8PvuQi4saAYR37toNlYdp
vzd24lRi2seC9miVK/er9gZM9bcBxLSP1ffKQZi0oyezFGEbgM5du6nOUUZM+1iCQ0LA/hbMitwP
MPBhcz026G+x0GfgMP3H3k4uJgoIbdSkGY2aNHNLLncq9cVuyiEJzFKEFgB/f3Peqxxe91qjiStF
GA3Qut3NLudxt7Y3d9JnXdpI/f3NcnZzveDgEH3WlM9qmqUIfUEQQHCIdCkjridFKIRiUoRCKCZF
KIRi5mwJET6pU4uwKl+TkpZlQBJjeU0RlrcCawUHE9MulrETp3J3z/sUpHK/LesT+WLZQlL37iY7
O4u2HW6h1wP9eejh0YTVrqM6nnCC1xRhefJyc9m763smjRnC+0u+4I67e6iO5LSc7GyeHT+K75I2
Xff7H3bv4IfdO/j3e28yY9Zcesb3VZTQfbxxb1cZrzsnTEnLIiUtiz0nL7M55TgPDXuE4uJi5s9+
TXU0l/x54hi+S9pEw8ZNmfnWPDanHGfX8Yt8uS2VMeP/RNblS0wd9wj79+xUHVU4yOuKUOfn50e9
+g1IeOFlAFL37lacyHnJWzbwzeb11KvfgIVfbKH/kBHUq98Af4uFZs1vYNLzMxn/zF8oslp5742Z
quMKB3n14SiA39WbpEPDaitO4rzVny0GYPRTCRXeFvbwo09SXFxM2w63GBlNuIFXF+HFC+d552/T
AfPdWOyI/SnaIWbcvRXffxwSGsYTk541KlK1qqiV1FvPFb2uCMtbgQOGjuSpP/1ZQRr3OK89fEuz
5s49oS7MzeuKsDyHftzHlvWJxPfzzHFD/Gv4U4SV4mKb6iiG8NY9XkW8rmFGbx1NScvi2wNneHPe
Yi5d/I3nJzzK8kUfqo7nlIZNmgJw5vTJCl9js9nIzrpsVCThRl5XhKWFhIbR68EBzP/0SwA++Mcb
ihM555bbugCUuUZYWure3dzTIZJRA3rqHRwJD+HVRagLCdPOEy9fMmU/P1V6cOBQABZ98B6/XThX
5t+Li4t5/61XsNls9B30B2X91gjneHURFhcVcfL4Mf46dQIAHTrepjiRc+7q3pvbusRxLiOdUf17
smH1CrIuX6KwsIDUvbuZNGYI27/ZTNy98Qx5ZKzquMJBXveVWVHzds2aAUyY+qLBadzn73MX8NTw
fhw7fIDnnx5T5t/vvb8fr777IX5+ZupAT9jD64qwNIulJnXrRxB72x2MfvKP3NxJ1ajYrqsf0ZDF
iUksW/AvNq1Zyc/HDgN+dL2nF4NHPsadcT1VR3SZr7WK6rymCH1hBQYG1WLUkwmMejJBdRThRl59
TiiEJ5AiFEIxKUIhFJMiFEIxKUIhFJMiFEIxUxVhUZE573nMunxtpG2PGH5ZgSsAubk5qnOUq9T6
M+V9i2YpwlSAfSbsH+XUieP6SjyPa+PYnwe48GuGO2K5VV5erj57xclFnAXOZ/52gRM/HXFPKDfa
9X2yPmvK0XrNUoRfA9Zd3yebbiUuW/AvfXa9i4tKBdixLcnFxbjfjuSt+qwrY9ivAEqPbmwKu7Yn
66P1ZqJtZ6ZjliI8D8wrslqZlvAEuTnmOKz537Ykli6YD9rAma6OW78CYOnH803z9wFcOJfB6uWL
9R+/cGFRbwHW1cuXkLhiievB3CAj/ZfS99m+jvN7+mplprGstgODz2Wk19uWtJGbO91OREM1Q2YX
Wa0s/Xge0yf/n/5s3l+B5S4u9ijQNy83p+mxQ6n0eqC/8keOcnNymPz4ME7+fAzgS2CWC4v7DfgV
6Lt1w5dkX75Mu9hOBNUKdkdUhxRZraz5/FOeHT+aC1rXIEnAOKDY8DB2MNst922AlVf/S7vYW4lp
fzONGhsz8GRubg4Z6b+w49utZP52Qf+1O8c6jwZ2AuHtYm8l4YWZSjokLrJaSd6ygXdfe1E//D8N
3I52bueqKWh7HYu/xUK72E4EBho7LODhH/eVboxZDwzBxI1qZitC0AbifA54Gm0se1VSgedxbZz6
8nQAEtFGtyWsdh2aR7XSRwOudlmXL3Hm1MnSG2kqMBA45saP6Qi8jDbyr6qBOfeifYEuQTudMC0z
FqEuCOgORF6d7FEPuAdoCsx14jOvoO0NdlC9LWlBaF80I1E3hPMhYB4wn+o7VwpFK0ijj7uPoe3d
PYKZi9AR0cA0tI3agrYX/afSRPbTv2SM2mNko22g7jj0FILGwPtAIWC7Ol1E7WGsED6hMTAbyKOk
+PRpjsJcQni9yopPn9ooSyeEl5tF5cVnA9YpSyeED+iB1qhQWRH2URVOCF8RB5yj/AL8WWEuIXzK
XMovwikqQwnhKxLQCs6Kdq1LL8A8tDtthBDVaDQlRTcMraX04NWf31eYSwifMIiSC/Kle76NAJLR
7scUQlSTHpQU4LRy/l3VDcJC+IQ4IAutAGcrziKEz2mDdh+oDVikOIsQPqcNkI5WgMvxooFrhPAE
pVs9k5ECFMJQEVxfgPJIkhAGCkUrPBtaIUaojSOEb7FwfQE2VhtHCN9iQWt8saE1xsjzgEIYbBFa
AZ5DClAIw81GK8AstAvzQvgcV3pbs6B1SdgD+7skLO0m4G60XpE34ljvX+eBb4ENmLRrcyGqWy9K
LiWonNKBx6v5bxWiWjmzJ0wA3gQsTSNbEN9/MC2jW7s5VuUy0s/w1dpV+mg7oHViO87QEEK4iaNF
2AdYBVgmTH2RMeMmKx3UJHHFEv42bbLtSl6eH1qX9a6OnCSE4RwpwiBgPxA9YeqLjJ04tZoiOSZ5
ywYmPToYtHPDlkjP0sLDODI0Wh9gfHRMO1595wNq1DDH0IY3tIzmp8MHOX70kAWtNzZTDgQpREUc
qaT7AeL7D1Y+rt7vDRw+Wp+9X2UOIZzhSBFGAcS0j62eJC6IanWtYUhudxMex5EiDAKopWDkVSG8
mTlO7ITwYVKEQigmRSiEYlKEQihm2LWGTi3CyvwuICCQiIaNuKt7b8ZMmELTyBZGxRHCNJTuCQsK
8jlzOo0VS/7D0Pvu5IfdO1TGEUIJw4swJS3r2vT9kV/5ZG0yXe/pRU52NlOfeoS83FyjIwmhlNI9
YWBQLdp26Mi7H31GTPtYzmWks3r5YpWRhDCcKRpmatYMYMy4yQB8vWmt4jRCGMsURQjQ6Y67ADhy
MFVxEiGMZZoirBfRAIDMixcUJxHCWKYpwiKrFQCLpabiJEIYyzRFeO7XDADq1peOt4VvMU0R7tmx
DYC2HW5RnEQIY5miCAsLC1j84XsA9IzvqziNEMZSWoT5V/LYn7KLSY8O4ejBVCJbRHH/gCEqIwlh
OMP7qSjvHlKA2nXCeWPeImrWDDA4kRBqKe0sJiAwiKaRzel6T29GPTmJxk2d6chbCM9mWBGmpGUZ
9VFCeBRTNMwI4cukCIVQTIpQCMWkCIVQTIpQCMWkCIVQTIpQCMUcKcKzAOm/pFVTFOfl5mTrszJ0
tvA4jhThXoAd3yZVTxIXbP/6K312l8ocQjjDkUFCo4Cj/haLZcWmHaVHQlIqNyeH/vfcwoVzGQB9
gTWKIwnhEEf2hCeAj4usViaPHa5v9Erl5uQwddwjepZdSAEKD+TISL0AW4H+mRcvNFy9YgmhYbVp
HnUjgYFB1ZGtQgX5V0jauIbpfxyrdxh8FogHMg0NIoQbOHI4qosAPgV6679o1KQZ/v6O1rPzMs6e
udYnDZAKDAEOGRZACJMYAGwCCgGbgmknkIDix7GEcJUze8LyRLlpOfY6DVirfJUQQgghhBBCCCGE
EEIIIYQQojz/DzjLZaigEmagAAAAAElFTkSuQmCC

View File

@@ -20,6 +20,7 @@
<li>2016-Feb-15: <a href="2016-02-15-cable-modem-channel-party.html">Cable modem channel party</a></li>
<li>2016-Feb-01: <a href="2016-02-01-how-to-enrage-your-cable-modem.html">How to enrage your cable modem</a></li>
<li>2016-Feb-01: <a href="2016-02-01-hall-of-2-4-ghz-shame-2016-edition.html">Hall of 2.4 GHz Shame, 2016 Edition</a></li>
<li>2011-Aug-08: <a href="2011-08-08-database-best-practices-for-future-scalability.html">Database best practices for future scalability</a></li>
<li>2011-Jul-12: <a href="2011-07-12-converting-subselects-to-joins.html">Converting subselects to joins</a></li>
<li>2011-Apr-22: <a href="2011-04-22-avoid-mysql-round-trips.html">Avoid MySQL round trips</a></li>
<li>2011-Apr-19: <a href="2011-04-19-video-sharing-sucks.html">Video sharing sucks</a></li>

View File

@@ -0,0 +1,51 @@
<!--# set var="title" value="Database best practices for future scalability" -->
<!--# set var="date" value="August 8, 2011" -->
<!--# include file="include/top.html" -->
Theres a perpetual debate about how much effort to put into scalability when first designing and building a modern web application. The opposing arguments are roughly: “Scalability doesnt matter if your app isnt successful; features should come first” vs. “When you know that you need to scale, you wont have time to do it.” Below are some suggestions for scalable MySQL schema design that should get you on the right path, without being onerous enough to slow you down. Lets start with things that are almost free:
1. Use InnoDB. Youre eventually likely to want transactions, foreign keys, row-level locking or relative crash safety. You can Google and find lots of InnoDB vs. MyISAM comparisons, but if you dont want to dig too deeply, “just use InnoDB” is a safe place to start.
1. Dont store data that you dont intend to use relationally. Using MySQL as a key/value store or to store encoded data (XML, [protocol buffers](http://code.google.com/p/protobuf/), etc.) in BLOB/TEXT may work, but dedicated key/value stores are likely to be more efficient.
1. Try to design your schema with as few hierarchies as possible. For example, take the following table layout, with arrows representing many-to-one relationships:Perhaps A=”Users”, B=”DVDs Owned”, C=”Logins”, D=”Times Watched”, while E=”Administrators” and F=”Changes”. These are two hierarchies, since A and E have no links to each other. Minimizing the number of hierarchies (keeping it to just one is awesome!) makes it easier to shard later. Schemas with cross-links (say F also links to A, or a table records transfers between two different users, linking to A twice) are very difficult to shard.<br>
<img src="data:image/png;base64,<!--# include file="images/db-hierarchy.png.base64" -->" alt="" style="background: white;">
1. Use BIGINT UNSIGNED NOT NULL (64-bit required numeric) primary keys on every table. AUTO\_INCREMENT is fine, at least to start. You can skip this for many-to-many link tables; just put both link columns in the primary key. Having single-column, numeric primary keys makes it easier to do things like drift checking and traversing between tables.
1. Use BIGINT instead of INT for all keys. The cost in space (4 vs. 8 bytes) and compute time is so small that youll never notice it, but the cost of a schema change later to increase the field size, or an accidental wraparound, can be enormous. As you expand shards, your key space becomes sparse and grows rapidly, so wide keys are critical.
1. Split off your data access code into an [ORM layer](http://en.wikipedia.org/wiki/Object-relational_mapping), even if its very simple to begin with. This will be where your read/write split, shard lookups and change tracking will live later.
1. Don't use [triggers](http://dev.mysql.com/doc/refman/5.1/en/triggers.html) or [stored routines](http://dev.mysql.com/doc/refman/5.1/en/stored-routines.html). Keep this logic in your ORM layer instead, to give yourself a single point of truth and versioning system.
1. [Avoid subselects; use joins instead.](2011-07-12-converting-subselects-to-joins.html)
1. Dont use [views](http://dev.mysql.com/doc/refman/5.1/en/views.html), unless youre using a third-party ORM (rails, django) that mandates a schema structure that isnt ideal.
1. Avoid network round-trips whenever possible. Use the [multi-row insert syntax](http://dev.mysql.com/doc/refman/5.5/en/insert.html) where possible. Enable the [CLIENT\_MULTI\_STATEMENTS](http://dev.mysql.com/doc/refman/5.1/en/mysql-real-connect.htmlhttp://dev.mysql.com/doc/refman/5.1/en/mysql-real-connect.html) flag at connect time, then send groups of statements separated by ";".
Then, things that cost development time, in increasing order of complexity:
1. Use [foreign keys](http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-foreign-keys.html). Dont make references nullable; use a flag field to mark whole (sub-)hierarchies deleted instead. Combined with the hierarchy rule above, this means that you guarantee yourself that youll never end up with orphaned rows.
1. [Write to masters; read from slaves](http://dev.mysql.com/doc/refman/5.1/en/replication.html). This can be quite complex, since you have to worry about replication delay. For example, you cant have one web page hit cause a write, then the next hit render the results by reading from the database, because the result might not have replicated. However, this enables significant scaling, because hooking up more slaves is much easier than sharding.
1. Dont store event-based data as one row per event. If you record page views or clicks in the database, aggregate that data into one row per hour, or per day. You can keep logs of events outside of the database in case you need to change aggregation and re-generate historical data, but dont keep every event in a hot table.
1. Stop using AUTO\_INCREMENT. Instead, keep a table [IdSequences](http://www.reddit.com/r/mysql/comments/jcw8o/database_best_practices_for_future_scalability/c2b2o4v), and do something like:
```
BEGIN;
UPDATE IdSequences SET LastId=LAST_INSERT_ID(LastId+Increment)
WHERE TableName='A' AND ColumnName='b';
INSERT INTO A (b, c) VALUES (LAST_INSERT_ID(), foo);
COMMIT;
```
This lets you change IdSequences later to modify your sharding scheme.
1. Create an empty shard (new database, same schema, no data) and add test rows. Teach your application to choose which shard to talk to. This will require some method to look up a shard for the root of each hierarchy; keep all data linked to a particular root on the same shard, so you can JOIN it. At its simplest, the lookup can be (ID mod NumShards). If you have uneven shard growth, you may need an indirection table to map from virtual shard (determined by modular division with a large divisor) to physical database.
<!--# include file="include/bottom.html" -->

View File

@@ -19,6 +19,7 @@
1. 2016-Feb-15: [Cable modem channel party](2016-02-15-cable-modem-channel-party.html)
1. 2016-Feb-01: [How to enrage your cable modem](2016-02-01-how-to-enrage-your-cable-modem.html)
1. 2016-Feb-01: [Hall of 2.4 GHz Shame, 2016 Edition](2016-02-01-hall-of-2-4-ghz-shame-2016-edition.html)
1. 2011-Aug-08: [Database best practices for future scalability](2011-08-08-database-best-practices-for-future-scalability.html)
1. 2011-Jul-12: [Converting subselects to joins](2011-07-12-converting-subselects-to-joins.html)
1. 2011-Apr-22: [Avoid MySQL round trips](2011-04-22-avoid-mysql-round-trips.html)
1. 2011-Apr-19: [Video sharing sucks](2011-04-19-video-sharing-sucks.html)