Data Science & Machine Learning¶

Comprehensive reference covering statistics, machine learning, deep learning, computer vision, NLP, and applied data science. From mathematical foundations through production deployment.

Foundations¶

[[math-precalculus]] - number systems, equations, functions, sets, combinatorics
[[math-logic]] - propositional logic, first-order logic, proof techniques, computability
[[math-for-ml]] - calculus, optimization, gradient descent, backpropagation
[[math-linear-algebra]] - vectors, matrices, eigenvalues, SVD
[[math-probability-statistics]] - probability theory, estimation, MLE, confidence intervals

Statistics & Probability¶

[[descriptive-statistics]] - central tendency, spread, shape, correlation, z-scores
[[probability-distributions]] - Bernoulli, binomial, Poisson, normal, exponential, CLT
[[hypothesis-testing]] - A/B testing, statistical tests, CUPED, experiment design
[[causal-inference]] - DiD, propensity score matching, synthetic control, DAGs
[[bias-variance-tradeoff]] - overfitting, underfitting, regularization, ensemble tradeoffs

Tools & Languages¶

[[python-for-ds]] - Python fundamentals for data science, Jupyter/Colab
[[numpy-fundamentals]] - array operations, linear algebra, random generation
[[pandas-eda]] - DataFrame manipulation, groupby, filtering, EDA workflow
[[data-visualization]] - matplotlib, seaborn, plotly, chart selection
[[sql-for-data-science]] - queries, window functions, CTEs, analytics patterns

Classical Machine Learning¶

[[linear-models]] - linear/logistic regression, gradient descent, regularization
[[gradient-boosting]] - CatBoost, XGBoost, LightGBM, Random Forest, hyperparameters
[[knn-and-classical-ml]] - KNN, SVM, decision trees, algorithm selection guide
[[unsupervised-learning]] - K-Means, DBSCAN, PCA, t-SNE, UMAP, SVD
[[bayesian-methods]] - Bayes' theorem, Naive Bayes, Bayesian inference

Deep Learning¶

[[neural-networks]] - architecture, training, activation functions, optimizers, regularization
[[cnn-computer-vision]] - convolutions, architectures (ResNet, YOLO), detection, segmentation
[[nlp-text-processing]] - tokenization, TF-IDF, embeddings, transformers, BERT
[[rnn-sequences]] - LSTM, GRU, bidirectional, sequence-to-sequence
[[generative-models]] - GANs, VAEs, diffusion models, CycleGAN
[[transfer-learning]] - pre-trained models, fine-tuning strategies, domain adaptation
[[data-augmentation]] - image/text/tabular augmentation, SMOTE

Techniques & Evaluation¶

[[feature-engineering]] - scaling, encoding, imputation, selection, pipelines
[[model-evaluation]] - metrics (MAE, ROC AUC, F1), cross-validation, confusion matrix
[[time-series-analysis]] - stationarity, ARIMA, seasonality, feature engineering for time
[[monte-carlo-simulation]] - simulation, portfolio optimization, risk metrics
[[recommender-systems]] - collaborative filtering, content-based, evaluation

Applied & Production¶

[[ds-workflow]] - end-to-end project methodology, pitfalls, reproducibility
[[bi-dashboards]] - BI systems, dashboard design, KPIs, analytics SQL
[[ml-production]] - model serialization, serving, monitoring, drift detection
[[financial-data-science]] - portfolio theory, derivatives, risk metrics, financial ratios
[[ai-video-production]] - AI video pipeline, tool chain, prompt engineering for video

Cross-Topic Links¶

[[python:python-fundamentals]] - general Python beyond DS
[[sql-databases:sql-fundamentals]] - database theory and administration
[[algorithms:algorithm-complexity]] - computational complexity
[[data-engineering:etl-pipelines]] - data pipeline infrastructure
[[llm-agents:prompt-engineering]] - prompt engineering for LLMs