Skip to content

Data Science & Machine Learning

Comprehensive reference covering statistics, machine learning, deep learning, computer vision, NLP, and applied data science. From mathematical foundations through production deployment.

Foundations

  • [[math-precalculus]] - number systems, equations, functions, sets, combinatorics
  • [[math-logic]] - propositional logic, first-order logic, proof techniques, computability
  • [[math-for-ml]] - calculus, optimization, gradient descent, backpropagation
  • [[math-linear-algebra]] - vectors, matrices, eigenvalues, SVD
  • [[math-probability-statistics]] - probability theory, estimation, MLE, confidence intervals

Statistics & Probability

  • [[descriptive-statistics]] - central tendency, spread, shape, correlation, z-scores
  • [[probability-distributions]] - Bernoulli, binomial, Poisson, normal, exponential, CLT
  • [[hypothesis-testing]] - A/B testing, statistical tests, CUPED, experiment design
  • [[causal-inference]] - DiD, propensity score matching, synthetic control, DAGs
  • [[bias-variance-tradeoff]] - overfitting, underfitting, regularization, ensemble tradeoffs

Tools & Languages

  • [[python-for-ds]] - Python fundamentals for data science, Jupyter/Colab
  • [[numpy-fundamentals]] - array operations, linear algebra, random generation
  • [[pandas-eda]] - DataFrame manipulation, groupby, filtering, EDA workflow
  • [[data-visualization]] - matplotlib, seaborn, plotly, chart selection
  • [[sql-for-data-science]] - queries, window functions, CTEs, analytics patterns

Classical Machine Learning

  • [[linear-models]] - linear/logistic regression, gradient descent, regularization
  • [[gradient-boosting]] - CatBoost, XGBoost, LightGBM, Random Forest, hyperparameters
  • [[knn-and-classical-ml]] - KNN, SVM, decision trees, algorithm selection guide
  • [[unsupervised-learning]] - K-Means, DBSCAN, PCA, t-SNE, UMAP, SVD
  • [[bayesian-methods]] - Bayes' theorem, Naive Bayes, Bayesian inference

Deep Learning

  • [[neural-networks]] - architecture, training, activation functions, optimizers, regularization
  • [[cnn-computer-vision]] - convolutions, architectures (ResNet, YOLO), detection, segmentation
  • [[nlp-text-processing]] - tokenization, TF-IDF, embeddings, transformers, BERT
  • [[rnn-sequences]] - LSTM, GRU, bidirectional, sequence-to-sequence
  • [[generative-models]] - GANs, VAEs, diffusion models, CycleGAN
  • [[transfer-learning]] - pre-trained models, fine-tuning strategies, domain adaptation
  • [[data-augmentation]] - image/text/tabular augmentation, SMOTE

Techniques & Evaluation

  • [[feature-engineering]] - scaling, encoding, imputation, selection, pipelines
  • [[model-evaluation]] - metrics (MAE, ROC AUC, F1), cross-validation, confusion matrix
  • [[time-series-analysis]] - stationarity, ARIMA, seasonality, feature engineering for time
  • [[monte-carlo-simulation]] - simulation, portfolio optimization, risk metrics
  • [[recommender-systems]] - collaborative filtering, content-based, evaluation

Applied & Production

  • [[ds-workflow]] - end-to-end project methodology, pitfalls, reproducibility
  • [[bi-dashboards]] - BI systems, dashboard design, KPIs, analytics SQL
  • [[ml-production]] - model serialization, serving, monitoring, drift detection
  • [[financial-data-science]] - portfolio theory, derivatives, risk metrics, financial ratios
  • [[ai-video-production]] - AI video pipeline, tool chain, prompt engineering for video
  • [[python:python-fundamentals]] - general Python beyond DS
  • [[sql-databases:sql-fundamentals]] - database theory and administration
  • [[algorithms:algorithm-complexity]] - computational complexity
  • [[data-engineering:etl-pipelines]] - data pipeline infrastructure
  • [[llm-agents:prompt-engineering]] - prompt engineering for LLMs