3.6. Design for optimised query resolution

  • Skillswap talk
  • 28th May 2008

3.6.01. Introduction

  • Pace of talk
  • Principals not query syntax
  • Online notes
  • Questions welcome anytime
  • Prompt thought about the issues

3.6.02. Terms

3.6.03. Domains and sets

  • Domains
    • Set of possible data
    • 8-bit unsigned integer example
  • Powersets
    • Set
      • Odd numbers less than 10
      • (1,3,5,7,9)
    • Powerset
      • Set of sets
        • 5 digit collection of non-repeating integers less than 10
        • ((1,3,5,7,9),(2,3,5,7,9),(1,4,5,7,9),(1,3,6,7,9)...)

3.6.04. Query and snapshot

  • Query + database snapshot returns one set of actual data
    • but a query alone defines a powerset of possible data
  • Load
    • Metrics for query performance
    • Time is easy to measure
    • Load is better in the real world
      • Query might be fast to execute, but require a lot of memory

3.6.05. Design in context

  • Query design needs context
    • Hardware information
    • Competition information (other Application Servers)
      • MySQL, Apache, Tomcat, SVN
    • Data information
      • Likely number of rows, lifetime of information, interval distribution within domains
    • Database design information
    • Database implementation information
      • Transaction granularity
    • Use information
      • Balance between read (SELECT) and write (INSERT, UPDATE, DELETE) ops

3.6.06. Elementary steps

  • Ask MySQL what it's doing
    • Query analysis with EXPLAIN
  • Cache the query
    • no free lunch, there's almost always a cost
    • Simple
      • Repetitive queries
      • Rare since requires constant snapshot
      • What's the point of the query

3.6.07. Query caching

  • Complex - Summary table
    • Multiple queries drawing data from subset of DB data
    • Cache subset
      • Query subset rather than whole database
  • Complex - Age analyses
    • Webstats, meaningful even if not last 24 hours
      • Run the heavyweight analysis queries during downtime
      • Cache results

3.6.08. Avoid the query

  • Cache the object
    • Higher level primitives
    • Object persistence
  • Get only the critical data
    • What's critical now
    • What's required in 10s, 1m, 5m

3.6.09. Simplify

  • Route/path to information
    • Introduction
      • Join is an arc, join time is arc length
      • Table is a node
    • Graph traversal problem
      • Dijkstra's shortest path
        • Heuristic
          • Search for optimal
          • General, each node defines a population of tuples

3.6.10. Traversal

  • Traversal of graph
    • (aka. positive closure)
    • gives intermediate relation
  • Goal
    • Intermediate relation minimisation
      • Space (memory, disk triangle)
      • Time (fast joins versus slow joins)
        • Fewer relations
        • Fewer active attributes (horizontal cropping)
        • Fewer rows (vertical cropping)
          • Role of single-relation WHERE statements

3.6.11. Optimise (Indexing)

  • Giving the Optimiser hints
  • Indexing
    • Simple - single attribute
      • Index(A)
      • Reduce sparse domains (e.g. text)
    • Composite - multiple attributes (typically pairs)
      • Index (A,B)
      • Some subtleties in here

3.6.12. Optimise (Limit)

  • Limit
    • Valueable on sorts (0,10 faster than 0,1000!)
    • Beware limit referencing
      • Limit 1000,10 will fetch the first 1000, then give you the 10 you want
        • Search engines
      • Better to use WHERE restriction if possible

3.6.13. Optimise (Delayed Joins)

  • SELECT FROM (SELECT id FROM ... LIMIT 0,10)
    • Subqueries not the silver bullet
    • Often not cached or indexed
  • Views help manage complex relationships but not performance
  • Changing picture
    • Even between MySQL 5.0 and 5.1
    • MySQL 6.0
      • Falcon (transactional) Storage
      • Better subquery optimisation

3.6.14. References