3.6. Design for optimised query resolution
- Skillswap talk
- 28th May 2008
3.6.01. Introduction
- Pace of talk
- Principals not query syntax
- Online notes
- Questions welcome anytime
- Prompt thought about the issues
3.6.02. Terms
- Tables/relations
- Rows/tuples
- Columns/attributes
- Joins
- Indexes
3.6.03. Domains and sets
- Domains
- Set of possible data
- 8-bit unsigned integer example
- Powersets
- Set
- Odd numbers less than 10
- (1,3,5,7,9)
- Powerset
- Set of sets
- 5 digit collection of non-repeating integers less than 10
- ((1,3,5,7,9),(2,3,5,7,9),(1,4,5,7,9),(1,3,6,7,9)...)
3.6.04. Query and snapshot
- Query + database snapshot returns one set of actual data
- but a query alone defines a powerset of possible data
- Load
- Metrics for query performance
- Time is easy to measure
- Load is better in the real world
- Query might be fast to execute, but require a lot of memory
3.6.05. Design in context
- Query design needs context
- Hardware information
- Competition information (other Application Servers)
- MySQL, Apache, Tomcat, SVN
- Data information
- Likely number of rows, lifetime of information, interval distribution within domains
- Database design information
- Database implementation information
- Use information
- Balance between read (SELECT) and write (INSERT, UPDATE, DELETE) ops
3.6.06. Elementary steps
- Ask MySQL what it's doing
- Query analysis with EXPLAIN
- Cache the query
- no free lunch, there's almost always a cost
- Simple
- Repetitive queries
- Rare since requires constant snapshot
- What's the point of the query
3.6.07. Query caching
- Complex - Summary table
- Multiple queries drawing data from subset of DB data
- Cache subset
- Query subset rather than whole database
- Complex - Age analyses
- Webstats, meaningful even if not last 24 hours
- Run the heavyweight analysis queries during downtime
- Cache results
3.6.08. Avoid the query
- Cache the object
- Higher level primitives
- Object persistence
- Get only the critical data
- What's critical now
- What's required in 10s, 1m, 5m
3.6.09. Simplify
- Route/path to information
- Introduction
- Join is an arc, join time is arc length
- Table is a node
- Graph traversal problem
- Dijkstra's shortest path
- Heuristic
- Search for optimal
- General, each node defines a population of tuples
3.6.10. Traversal
- Traversal of graph
- (aka. positive closure)
- gives intermediate relation
- Goal
- Intermediate relation minimisation
- Space (memory, disk triangle)
- Time (fast joins versus slow joins)
- Fewer relations
- Fewer active attributes (horizontal cropping)
- Fewer rows (vertical cropping)
- Role of single-relation WHERE statements
3.6.11. Optimise (Indexing)
- Giving the Optimiser hints
- Indexing
- Simple - single attribute
- Index(A)
- Reduce sparse domains (e.g. text)
- Composite - multiple attributes (typically pairs)
- Index (A,B)
- Some subtleties in here
3.6.12. Optimise (Limit)
- Limit
- Valueable on sorts (0,10 faster than 0,1000!)
- Beware limit referencing
- Limit 1000,10 will fetch the first 1000, then give you the 10 you want
- Better to use WHERE restriction if possible
3.6.13. Optimise (Delayed Joins)
- SELECT FROM (SELECT id FROM ... LIMIT 0,10)
- Subqueries not the silver bullet
- Often not cached or indexed
- Views help manage complex relationships but not performance
- Changing picture
- Even between MySQL 5.0 and 5.1
- MySQL 6.0
- Falcon (transactional) Storage
- Better subquery optimisation
3.6.14. References
- External References for Section 3.6 only