연구 아카이브
논문과 책, 웹사이트 등을 통해 공부하고 연구한 것들을 아카이브합니다.
참고 문헌과 스터디 노트, 그리고 재현가능한 소스코드를 함께 제공하고자 합니다.
Experimentation
- Bao, W. (2023, March 28). How to Size For Online Experiments With Ratio Metrics. Expedia Group Technology. https://medium.com/expedia-group-tech/how-to-size-for-online-experiments-with-ratio-metrics-3d57362f1967
- Blocker, C., Conway, J., Demortier, L., Heinrich, J., Junk, T., Lyons, L., & Punzi, G. (2006). Simple Facts about P -Values.
- Deng, A., Lu, J., & Litz, J. (2017). Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 641–649. https://doi.org/10.1145/3018661.3018677
- Fabijan, A., Dmitriev, P., Arai, B., Drake, A., Kohlmeier, S., & Kwong, A. (2023). A/B Integrations: 7 Lessons Learned from Enabling A/B testing as a Product Feature. 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 304–314. https://doi.org/10.1109/ICSE-SEIP58684.2023.00033
- Gupta, S., Kohavi, R., Tang, D., Xu, Y., Andersen, R., Bakshy, E., Cardin, N., Chandran, S., Chen, N., Coey, D., Curtis, M., Deng, A., Duan, W., Forbes, P., Frasca, B., Guy, T., Imbens, G. W., Saint Jacques, G., Kantawala, P., … Yashkov, I. (2019). Top Challenges from the first Practical Online Controlled Experiments Summit. ACM SIGKDD Explorations Newsletter, 21(1), 20–35. https://doi.org/10.1145/3331651.3331655
- Gupta, S., Ulanova, L., Bhardwaj, S., Dmitriev, P., Raff, P., & Fabijan, A. (2018). The Anatomy of a Large-Scale Experimentation Platform. 2018 IEEE International Conference on Software Architecture (ICSA), 1–109. https://doi.org/10.1109/ICSA.2018.00009 Huang, C., Tang, Y., & Tang, C. H. and Y. (2022, May 24). Meet Dash-AB — The Statistics Engine of Experimentation at DoorDash. DoorDash Engineering Blog. https://doordash.engineering/2022/05/24/meet-dash-ab-the-statistics-engine-of-experimentation-at-doordash/
- Kohavi, R. (2023, October). Trustworthy A/B Tests: Causality and Pitfalls. Google Docs. https://drive.google.com/file/d/1mbwrCkR52kIfcjkQibI4lRMwOSJJjdv6/view?usp=sharing&usp=embed_facebook
- Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., & Xu, Y. (2012). Trustworthy online controlled experiments: Five puzzling outcomes explained. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 786–794. https://doi.org/10.1145/2339530.2339653
- Kohavi, R., Deng, A., & Vermeer, L. (2022). A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3168–3177. https://doi.org/10.1145/3534678.3539160
- Kohavi, R., & Longbotham, R. (2011). Unexpected results in online controlled experiments. ACM SIGKDD Explorations Newsletter, 12(2), 31–35. https://doi.org/10.1145/1964897.1964905
- Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press. https://experimentguide.com/
- Kohavi, R., Tang, D., Xu, Y., Hemkens, L. G., & Ioannidis, J. P. A. (2020). Online randomized controlled experiments at scale: Lessons and extensions to medicine. Trials, 21(1), 150. https://doi.org/10.1186/s13063-020-4084-y Machmouchi, W., Gupta, S., Zhang, R., & Fabijan, A. (2020, July 31). Patterns of Trustworthy Experimentation: Pre-Experiment Stage. Microsoft Research. https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/patterns-of-trustworthy-experimentation-pre-experiment-stage/
- Microsoft. (2021, January 25). Patterns of Trustworthy Experimentation: During-Experiment Stage. Microsoft Research. https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/patterns-of-trustworthy-experimentation-during-experiment-stage/
- Schultzberg, M., Kjellin, O., & Rydberg, J. (2020). Statistical Properties of Exclusive and Non-exclusive Online Randomized Experiments using Bucket Reuse (arXiv:2012.10202). arXiv. https://doi.org/10.48550/arXiv.2012.10202
- Schultzberg, M., Kjellin, O., & Rydberg, J. (2021, March 10). Spotify’s New Experimentation Coordination Strategy. Spotify Engineering. https://engineering.atspotify.com/2021/03/spotifys-new-experimentation-coordination-strategy/
- Thumbtack, E. (2020, June 1). Seedfinder—Infrastructure to Improve Sample Balance in Online A/B Tests. Thumbtack Engineering. https://medium.com/thumbtack-engineering/seedfinder-infrastructure-to-improve-sample-balance-in-online-a-b-tests-1b8c3ae7dbe8
- Xie, H., & Aurisset, J. (2016). Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 645–654. https://doi.org/10.1145/2939672.2939733
Time Series
1 추론 모델링 · Regression
Spurious regression
Regression with ARIMA errors
Distributed lag model
- 나종화. R 응용 시계열분석. 자유아카데미. 2020.
- 🔗 스터디 노트
Distributed lag non-linear model
- Gasparrini, Antonio, Benedict Armstrong, and M.G. Kenward. “Distributed Lag Non-Linear Models.” Statistics in Medicine 29 (September 20, 2010): 2224–34. https://doi.org/10.1002/sim.3940.
- Gasparrini, Antonio. “Distributed Lag Linear and Non-Linear Models in R: The Package Dlnm.” Journal of Statistical Software 43 (July 1, 2011): 1–20. https://doi.org/10.18637/jss.v043.i08.
- 🔗 스터디 노트
- 🔗 PPT
- 🔗 R 튜토리얼
2 예측모델링 · Forecasting
Exponential Smoothing
- 나종화. R 응용 시계열분석. 자유아카데미. 2020.
- 🔗 스터디 노트
- 🔗 R 튜토리얼: tidyverse principle로 시계열 자료분석하기
ARIMA model
- 나종화. R 응용 시계열분석. 자유아카데미. 2020.
- 🔗 스터디 노트
Prophet
Hierarchical Time Series Forecasting
- Athanasopoulos, George, Roman A. Ahmed, and Rob J. Hyndman. “Hierarchical Forecasts for Australian Domestic Tourism.” International Journal of Forecasting 25, no. 1 (January 1, 2009): 146–66. https://doi.org/10.1016/j.ijforecast.2008.07.004.
- Athanasopoulos, George, Rob Hyndman, Roman Ahmed, and Han Lin Shang. “Optimal Combination Forecasts for Hierarchical.” Computational Statistics & Data Analysis 55 (September 1, 2011): 2579–89. https://doi.org/10.1016/j.csda.2011.03.006.
- Hyndman, Rob J, George Athanasopoulos, and Han Lin Shang. “Hts: An R Package for Forecasting Hierarchical or Grouped Time Series,” n.d., 12.
- 🔗 스터디 노트
- 🔗 R 튜토리얼
3 Other techniques
Intervention analysis (Interrupted Time Series)
- Slides. “Intervention Analysis.” Accessed April 17, 2022. https://slides.com/tonyg/intervention-analysis.
- 🔗 참고 자료
- 🔗 스터디 노트
- 🔗 R 코드
- 🔗 R 코드: arimax() 튜토리얼
Dynamic Time Warping (DTW)
- Berndt, Donald J., and James Clifford. “Using Dynamic Time Warping to Find Patterns in Time Series.” In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 359–70. AAAIWS’94. Seattle, WA: AAAI Press, 1994.
- 선행 또는 후행하는 시계열, 시차가 존재하나 유사한 패턴이 존재하는 두 시계열을 잡아낼 수 있게끔 해주는 비유사성 측도(거리 측도) 알고리즘
- DTW distance를 이용해 계층적 군집 분석 수행 가능
- 🔗 스터디 노트
- 🔗 R 튜토리얼
Discrete Wavelet Transform (DWT)
- Graps, Amara. “An Introduction to Wavelets.” IEEE Comp. Sci. Engi. 2 (February 1, 1995): 50–61. https://doi.org/10.1109/99.388960.
- Li, Daoyuan, Tegawendé F. Bissyandé, Jacques Klein, and Y. L. Traon. “Time Series Classification with Discrete Wavelet Transformed Data: Insights from an Empirical Study.” In SEKE, 2016. https://doi.org/10.18293/SEKE2016-067.
- 시계열들을 데이터의 열로 나열하여 classification을 수행할 때, 효과적인 차원 감소 방법
- 일종의 시계열 Feature engineering 기법에 해당
- 🔗 스터디 노트
- 🔗 R 튜토리얼
Statistical/Machine Learning
Prerequisite
- Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
- 🔗 스터디 노트: Prerequisite 1 머신러닝 용어 정리
Ensemble methods
- Chen, Tianqi, and Carlos Guestrin. “XGBoost: A Scalable Tree Boosting System.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 13, 2016, 785–94. https://doi.org/10.1145/2939672.2939785.
- Chen, Lilly. “Basic Ensemble Learning (Random Forest, AdaBoost, Gradient Boosting)- Step by Step Explained.” Medium, January 2, 2019. https://towardsdatascience.com/basic-ensemble-learning-random-forest-adaboost-gradient-boosting-step-by-step-explained-95d49d1e2725.
- Morde, Vishal. “XGBoost Algorithm: Long May She Reign!” Medium, April 8, 2019. https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-may-rein-edd9f99be63d.
- “Light GBM vs XGBOOST: Which Algorithm Takes the Crown.” Accessed April 17, 2022. https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/.
- Random Forest, AdaBoost, Gradient Boosting, XGBoost, Light GBM
- 🔗 스터디 노트
- 🔗 R 튜토리얼: tidyverse principle로 머신러닝하기
Logistic regression
- James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. “An Introduction to Statistical Learning.” An Introduction to Statistical Learning. Accessed April 17, 2022. https://www.statlearning.com.
- Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd ed. Springer, 2009. http://www-stat.stanford.edu/~tibs/ElemStatLearn/.
- StatQuest with Josh Starmer. Logistic Regression Details Pt 2: Maximum Likelihood, 2018. https://www.youtube.com/watch?v=BfKanl1aSG0.
- Chatterjee, Samprit, and Ali S. Hadi. “Regression Analysis by Example, Fifth Edition.”
- 🔗 스터디 노트
Generalized Linear Model (GLM) and Generalized Additive Model (GAM)
- Hayes, Genevieve. “Beyond Linear Regression: An Introduction to GLMs.” Medium, December 24, 2019. https://towardsdatascience.com/beyond-linear-regression-an-introduction-to-glms-7ae64a8fad9c.
- James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. “An Introduction to Statistical Learning.” An Introduction to Statistical Learning. Accessed April 17, 2022. https://www.statlearning.com.
- GLM
- 🔗 스터디 노트
- GAM
Deep Learning
Prerequisites
- Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
- 🔗 스터디 노트: Prerequisite 1 딥러닝의 모티베이션과 역사
- 🔗 스터디 노트: Prerequisite 2 선형대수의 여러 객체 소개
- 🔗 스터디 노트: Prerequisite 3 행렬의 전치와 브로드캐스팅
- 🔗 스터디 노트: Prerequisite 4 행렬과 벡터의 곱연산
- 🔗 스터디 노트: Prerequisite 5 선형방정식과 선형종속,span
- 🔗 스터디 노트: Prerequisite 6 norms
- 🔗 스터디 노트: Prerequisite 7 특별한 종류의 행렬과 벡터
- 🔗 스터디 노트: Prerequisite 8 고윳값 분해
- 🔗 스터디 노트: Prerequisite 9 특잇값 분해와 일반화 역행렬
- 🔗 스터디 노트: Prerequisite 10 Trace 연산자와 행렬식
- 🔗 스터디 노트: Prerequisite 11 선형대수를 이용한 주성분 유도
- 🔗 스터디 노트: Prerequisite 12 머신러닝 용어 정리
High-Dimensional Data Analysis
- Breheny, Patrick. High-Dimensional Data Analysis. The University of Iowa, 2016. https://myweb.uiowa.edu/pbreheny/7600/s16/index.html.
- 일반적인 기계학습 기반의 예측 모델링으로 접근하기 어려운 n -> p 또는 n < p 인 자료의 예측 모델링에 관한 방법론(여기서 n은 관측치의 수, p는 예측변수의 수)
- 꼭 고차원 자료가 아닌, 회귀모형의 예측 성능을 높이기 위해서도 사용되는 방법론들에 해당
- 통계적 가설검정 관점에서 가설 검정시 발생하는 고차원 문제에 관한 솔루션 또한 제공함
1 고차원 자료에 관한 예측 모델링
Prerequisites
Ridge regression
- 🔗 스터디 노트
Lasso regression
- 🔗 스터디 노트
Bias reduction of Lasso estimator
- 🔗 스터디 노트
Variance reduction of Lasso eistimator
- 🔗 스터디 노트
Penalized logistic regression
- 🔗 스터디 노트
Penalized robust regression
- 🔗 스터디 노트
2 통계적 가설검정 관점의 고차원 문제
Prerequisites
Family-Wise Error Rates (FWER)
- 🔗 스터디 노트
False Discovery Rates (FDR)
- 🔗 스터디 노트
Statistics
- 통계학, 통계적 가설검정과 관련한 것들을 아카이브 합니다.
구간추정의 해석에 대한 고전적 관점(Frequentist)과 베이지안 관점
- 🔗 스터디 노트
검정력(power)과 검정력 함수에 대해
- 🔗 스터디 노트
자유도(Degrees of Freedom)
- 🔗 스터디 노트
표준편차와 표준오차
- 🔗 스터디 노트
“대립가설이 옳다.”라는 식의 주장을 지양해야하는 이유
- 🔗 스터디 노트
중심극한정리의 의미
Fixed effect와 random effect
- 🔗 스터디 노트