AutoML and hyperparameter optimization rules for Python ML projects using Ray Tune, Optuna, PyCaret, and time-series AutoML libraries
.cursorrules veya .cursor/rules/automl-hyperparameter-optimization.mdc # AutoML and Hyperparameter Optimization Rules ## Scope - Use AutoML to accelerate model exploration, not to bypass problem framing, validation design, or explainability. - Start with a simple baseline model and fixed metric before launching a search. - Keep training, evaluation, feature generation, and search configuration separate. - Record datasets, splits, metric definitions, random seeds, library versions, and search spaces for every run. ## Experiment Design - Define the target metric before selecting tooling. - Use nested validation or a final untouched test split for model selection claims. - Use time-aware splits for time-series problems; never shuffle across time boundaries. - Prevent leakage by fitting preprocessing only on training folds. - Include simple baselines such as linear models, random forests, or naive time-series forecasts. - Use early stopping and resource limits for expensive searches. - Prefer structured search spaces with domain-informed ranges over arbitrary broad grids. ## Tooling - Use Ray Tune or Optuna for custom training loops, distributed trials, pruning, and scheduler control. - Use PyCaret for quick low-code comparisons when the dataset and metric are straightforward. - Use AutoTS, Merlion, PyAF, or project-approved time-series tooling when forecast-specific validation, seasonality, and horizon handling matter. - Store run metadata in MLflow, Weights & Biases, TensorBoard, or a project-approved tracker. - Use `uv` or the existing project package manager for reproducible environments. ## Search Spaces - Keep search spaces explicit and reviewed. - Use log-scale sampling for learning rates, regularization, tree counts, and other scale-sensitive values. - Constrain model complexity to avoid unrealistic training time or memory use. - Include preprocessing choices only when they can be applied without leakage. - Do not tune on the test set. ## Reporting - Report the selected model, metric, confidence interval or variance, validation scheme, and final test result. - Include the best parameters and the search budget. - Compare the chosen model against the baseline and at least one non-AutoML alternative. - Document operational constraints such as inference latency, memory use, retraining cost, and explainability. ## Common Mistakes - Do not treat leaderboard rank as proof of production readiness. - Do not mix train/test data during feature engineering. - Do not run massive searches before validating labels and data quality. - Do not ignore class imbalance, calibration, or business cost asymmetry. - Do not deploy an AutoML model without reproducible training code and pinned dependencies.
uv or the existing project package manager for reproducible environments.Cursor rules for TypeScript, React, Node.js, clean architecture, testing, and WHY-oriented engineering guidance.
Cursor rules for WordPress development on macOS.
Cursor rules for manifest development with YAML integration.
Cursor rules for Pandas development with scikit-learn guide integration.
Cursor rules for Python LLM & ML development with workflow integration.
Cursor rules for PyTorch development with scikit-learn integration.