Data Science & Machine Learning¶
Comprehensive reference covering statistics, machine learning, deep learning, computer vision, NLP, and applied data science. From mathematical foundations through production deployment.
Foundations¶
- [[math-precalculus]] - number systems, equations, functions, sets, combinatorics
- [[math-logic]] - propositional logic, first-order logic, proof techniques, computability
- [[math-for-ml]] - calculus, optimization, gradient descent, backpropagation
- [[math-linear-algebra]] - vectors, matrices, eigenvalues, SVD
- [[math-probability-statistics]] - probability theory, estimation, MLE, confidence intervals
Statistics & Probability¶
- [[descriptive-statistics]] - central tendency, spread, shape, correlation, z-scores
- [[probability-distributions]] - Bernoulli, binomial, Poisson, normal, exponential, CLT
- [[hypothesis-testing]] - A/B testing, statistical tests, CUPED, experiment design
- [[causal-inference]] - DiD, propensity score matching, synthetic control, DAGs
- [[bias-variance-tradeoff]] - overfitting, underfitting, regularization, ensemble tradeoffs
Tools & Languages¶
- [[python-for-ds]] - Python fundamentals for data science, Jupyter/Colab
- [[numpy-fundamentals]] - array operations, linear algebra, random generation
- [[pandas-eda]] - DataFrame manipulation, groupby, filtering, EDA workflow
- [[data-visualization]] - matplotlib, seaborn, plotly, chart selection
- [[sql-for-data-science]] - queries, window functions, CTEs, analytics patterns
Classical Machine Learning¶
- [[linear-models]] - linear/logistic regression, gradient descent, regularization
- [[gradient-boosting]] - CatBoost, XGBoost, LightGBM, Random Forest, hyperparameters
- [[knn-and-classical-ml]] - KNN, SVM, decision trees, algorithm selection guide
- [[unsupervised-learning]] - K-Means, DBSCAN, PCA, t-SNE, UMAP, SVD
- [[bayesian-methods]] - Bayes' theorem, Naive Bayes, Bayesian inference
Deep Learning¶
- [[neural-networks]] - architecture, training, activation functions, optimizers, regularization
- [[cnn-computer-vision]] - convolutions, architectures (ResNet, YOLO), detection, segmentation
- [[nlp-text-processing]] - tokenization, TF-IDF, embeddings, transformers, BERT
- [[rnn-sequences]] - LSTM, GRU, bidirectional, sequence-to-sequence
- [[generative-models]] - GANs, VAEs, diffusion models, CycleGAN
- [[transfer-learning]] - pre-trained models, fine-tuning strategies, domain adaptation
- [[data-augmentation]] - image/text/tabular augmentation, SMOTE
Techniques & Evaluation¶
- [[feature-engineering]] - scaling, encoding, imputation, selection, pipelines
- [[model-evaluation]] - metrics (MAE, ROC AUC, F1), cross-validation, confusion matrix
- [[time-series-analysis]] - stationarity, ARIMA, seasonality, feature engineering for time
- [[monte-carlo-simulation]] - simulation, portfolio optimization, risk metrics
- [[recommender-systems]] - collaborative filtering, content-based, evaluation
Applied & Production¶
- [[ds-workflow]] - end-to-end project methodology, pitfalls, reproducibility
- [[bi-dashboards]] - BI systems, dashboard design, KPIs, analytics SQL
- [[ml-production]] - model serialization, serving, monitoring, drift detection
- [[financial-data-science]] - portfolio theory, derivatives, risk metrics, financial ratios
- [[ai-video-production]] - AI video pipeline, tool chain, prompt engineering for video
Cross-Topic Links¶
- [[python:python-fundamentals]] - general Python beyond DS
- [[sql-databases:sql-fundamentals]] - database theory and administration
- [[algorithms:algorithm-complexity]] - computational complexity
- [[data-engineering:etl-pipelines]] - data pipeline infrastructure
- [[llm-agents:prompt-engineering]] - prompt engineering for LLMs