Query performance considerations

Content Advanced Search queries can become complex, and certain patterns may lead to performance degradation or increased resource consumption. This document highlights common pitfalls and best practices to avoid them.

Performance considerations

Wildcard field expansion

Consideration: Wildcards that expand to many fields can significantly impact query performance and resource usage.

Potential issues:

  • Increased query execution time
  • Higher memory consumption
  • Possible query rejection if expansion is excessive
  • Expanded fields may include unintended data

Best practices:

PracticeReason
Prefer explicit field lists over wildcardsPredictable performance and clearer intent
Use wildcards only when the field count is known to be reasonablePrevents unexpected expansions
Test wildcard queries with explain mode to understand field expansionVisibility into actual fields being searched
Consider refactoring data structure if wildcards are required for many fieldsBetter data modelling can eliminate need for broad wildcards

Example: Avoid:

{
  "text": {
    "query": "championship",
    "path": "content.*" // Unknown expansion count
  }
}

Example: Preferred:

{
  "text": {
    "query": "championship",
    "path": ["content.text", "content.altText", "heroMedia.title"] // Explicit, predictable
  }
}

Query nesting depth

Consideration: Deeply nested compound queries can be difficult to understand, maintain, and may impact performance.

Potential issues:

  • Increased query complexity and execution time
  • Harder to debug and maintain
  • Possible stack or recursion limits in implementation

Best practices:

PracticeReason
Keep nesting shallow (prefer 2-3 levels maximum)Easier to understand and maintain
Flatten nested queries when possibleSimpler structure with same logical outcome
Use multiple clauses at the same levelTakes advantage of Boolean logic without deep nesting
Consider breaking complex queries into simpler partsMay be easier to test and optimise individually

Example: Avoid (deeply nested):

{
  "compound": {
    "must": [
      {
        "compound": {
          "must": [
            {
              "compound": {
                // Too many levels of nesting
              }
            }
          ]
        }
      }
    ]
  }
}

Example: Preferred (flattened):

{
  "compound": {
    "must": [
      { "text": { ... } },
      { "equal": { ... } },
      { "range": { ... } }
    ],
    "filter": [
      { "equal": { ... } },
      { "range": { ... } }
    ]
  }
}

Number of clauses

Consideration: Queries with many clauses in a compound operator can become unwieldy and may impact performance.

Potential issues:

  • Longer execution time as more conditions are evaluated
  • Increased memory usage
  • Difficult to maintain and debug
  • May indicate the need for better data modelling

Best practices:

PracticeReason
Use in operator for multiple value matchingMore efficient than many equal clauses in should
Consider restructuring data if many clauses are neededBetter schema design can simplify queries
Break complex queries into multiple simpler queriesEasier to test, debug, and optimise
Group related conditions in separate clause typesUse must, should, filter, mustNot appropriately

Example: Avoid (many clauses):

{
  "compound": {
    "should": [
      { "equal": { "path": "tags", "value": "tag1" } },
      { "equal": { "path": "tags", "value": "tag2" } }
      // Many more clauses...
    ]
  }
}

Example: Preferred (single operator):

{
  "in": {
    "path": "tags",
    "values": ["tag1", "tag2", "tag3", "tag4", "tag5"]
  }
}

Large value arrays

Consideration: Using the in operator with very large arrays of values can impact performance and maintainability.

Potential issues:

  • Query payload size increases
  • More values to evaluate during query execution
  • May indicate better data modelling is needed
  • Difficult to maintain and test

Best practices:

PracticeReason
Keep in operator value arrays reasonably sizedEasier to maintain and better performance
Consider data model changes for very large value setse.g., use categories instead of individual tags
Use range or prefix queries when applicableMore efficient for certain types of data
Break into multiple queries if value set is very largeMay be more maintainable

Query complexity

Consideration: Complex queries should be monitored to ensure they perform well in production.

Best practices:

PracticeReason
Use explain mode during developmentUnderstand query cost and performance characteristics
Monitor query performance in productionIdentify slow queries early
Set reasonable timeoutsPrevent runaway queries from consuming resources
Test queries with production-like data volumesEnsures performance is acceptable at scale
Document complex queriesMakes maintenance easier

Combining filters and searches

Consideration: Combining multiple filters and search conditions can lead to complex queries that may affect performance.

Best practices:

PracticeReason
Use filters for non-scoring conditionsFilters are generally more efficient than search clauses
Combine related conditions into single clausesReduces overall query complexity

Example: Avoid:

{
  "compound": {
    "must": [
      {
        "range": {
          "path": "publishDate",
          "gte": "now-7d"
        }
      },
      {
        "text": {
          "query": "football",
          "path": "heroMedia.title"
        }
      }
    ]
  }
}

Example: Preferred:

{
  "compound": {
    "must": [
      {
        "text": {
          "query": "football",
          "path": "heroMedia.title"
        }
      }
    ],
    "filter": [
      {
        "range": {
          "path": "publishDate",
          "gte": "now-7d"
        }
      }
    ]
  }
}

Isolated filters allow the search engine to optimise execution.

Monitoring query performance

Use explain mode to understand how your queries perform:

{
  "query": {
    // your query
  },
  "explain": true
}

Key metrics to review:

MetricWhat to look for
_explain.cost.totalOverall query cost - lower is better
_explain.cost.breakdown.fieldsNumber of fields being searched - check wildcard expansions
Response timeHow long the query takes to execute

See Explain mode for detailed information on query diagnostics.