AI/ML .mdc

Automl Hyperparameter Optimization

AutoML and hyperparameter optimization rules for Python ML projects using Ray Tune, Optuna, PyCaret, and time-series AutoML libraries

How to use
  1. Copy the rule content.
  2. In your project root, create .cursorrules or .cursor/rules/automl-hyperparameter-optimization.mdc
  3. Paste the content and save.

AutoML and Hyperparameter Optimization Rules

Scope

  • Use AutoML to accelerate model exploration, not to bypass problem framing, validation design, or explainability.
  • Start with a simple baseline model and fixed metric before launching a search.
  • Keep training, evaluation, feature generation, and search configuration separate.
  • Record datasets, splits, metric definitions, random seeds, library versions, and search spaces for every run.

Experiment Design

  • Define the target metric before selecting tooling.
  • Use nested validation or a final untouched test split for model selection claims.
  • Use time-aware splits for time-series problems; never shuffle across time boundaries.
  • Prevent leakage by fitting preprocessing only on training folds.
  • Include simple baselines such as linear models, random forests, or naive time-series forecasts.
  • Use early stopping and resource limits for expensive searches.
  • Prefer structured search spaces with domain-informed ranges over arbitrary broad grids.

Tooling

  • Use Ray Tune or Optuna for custom training loops, distributed trials, pruning, and scheduler control.
  • Use PyCaret for quick low-code comparisons when the dataset and metric are straightforward.
  • Use AutoTS, Merlion, PyAF, or project-approved time-series tooling when forecast-specific validation, seasonality, and horizon handling matter.
  • Store run metadata in MLflow, Weights & Biases, TensorBoard, or a project-approved tracker.
  • Use uv or the existing project package manager for reproducible environments.

Search Spaces

  • Keep search spaces explicit and reviewed.
  • Use log-scale sampling for learning rates, regularization, tree counts, and other scale-sensitive values.
  • Constrain model complexity to avoid unrealistic training time or memory use.
  • Include preprocessing choices only when they can be applied without leakage.
  • Do not tune on the test set.

Reporting

  • Report the selected model, metric, confidence interval or variance, validation scheme, and final test result.
  • Include the best parameters and the search budget.
  • Compare the chosen model against the baseline and at least one non-AutoML alternative.
  • Document operational constraints such as inference latency, memory use, retraining cost, and explainability.

Common Mistakes

  • Do not treat leaderboard rank as proof of production readiness.
  • Do not mix train/test data during feature engineering.
  • Do not run massive searches before validating labels and data quality.
  • Do not ignore class imbalance, calibration, or business cost asymmetry.
  • Do not deploy an AutoML model without reproducible training code and pinned dependencies.

Similar rules

More in AI/ML →