References

707 distinct citations across the book (1498 total occurrences). Click an entry to open the verified paper (or a Scholar search if not yet verified).

  1. Hume, D. (1748). An Enquiry Concerning Human Understanding. London: A. Millar. https://davidhume.org/texts/e/ ↗
    ×1
  2. Mill, J. S. (1843). A System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence and the Methods of Scientific Investigation. London: John W. Parker. https://www.gutenberg.org/ebooks/27942 ↗
    ×1
  3. Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver & Boyd, Edinburgh. https://psychclassics.yorku.ca/Fisher/Methods/ ↗
    ×1
  4. Rosenblatt (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. https://doi.org/10.1037/h0042519 ↗
    ×1
  5. Robinson, J. A. (1965). A Machine-Oriented Logic Based on the Resolution Principle. Journal of the ACM, 12(1), 23–41. https://dl.acm.org/doi/10.1145/321250.321253 ↗
    ×1
  6. Fikes, R. E., & Nilsson, N. J. (1971). STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 2(3–4), 189–208. https://www.sciencedirect.com/science/article/abs/pii/0004370271900105 ↗ DOI: 10.1016/0004-3702(71)90010-5
    ×1
  7. Vapnik, Chervonenkis (1971). On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. https://epubs.siam.org/doi/10.1137/1116025 ↗
    ×1
  8. Colmerauer, A., Kanoui, H., Pasero, R., & Roussel, P. (1972). Un système de communication homme-machine en français. Rapport préliminaire de fin de contrat IRIA, Groupe d'Intelligence Artificielle, Université d'Aix-Marseille II, Luminy. https://softwarepreservation.computerhistory.org/prolog/ ↗
    ×1
  9. Sauer (1972). On the density of families of sets. https://doi.org/10.1016/0097-3165(72)90019-2 ↗
    ×1
  10. Shelah (1972). A combinatorial problem; stability and order for models and theories in infinitary languages. https://shelah.logic.at/papers/16/ ↗
    ×1
  11. Goodhart, C. A. E. (1975). Problems of Monetary Management: The U.K. Experience. Papers in Monetary Economics, Volume I, Reserve Bank of Australia. https://www.semanticscholar.org/paper/Problems-of-Monetary-Management:-The-UK-Experience-Goodhart/0ae623749b30de53a39cf05813f5f3842e422c01 ↗
    ×1
  12. Rosenbaum, P. R., & Rubin, D. B. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41 ↗
    ×3
  13. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A Learning Algorithm for Boltzmann Machines. Cognitive Science, 9(1), 147–169. https://www.cs.toronto.edu/~fritz/absps/cogscibm.pdf ↗ DOI: 10.1207/s15516709cog0901_7
    ×1
  14. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. https://www.nature.com/articles/323533a0 ↗ DOI: 10.1038/323533a0
    ×3
  15. Robinson, P. M. (1988). Root-N-Consistent Semiparametric Regression. Econometrica, 56(4), 931–954. https://www.jstor.org/stable/1912705 ↗ DOI: 10.2307/1912705
    ×1
  16. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis Dimension. Journal of the ACM, 36(4), 929–965. https://dl.acm.org/doi/10.1145/76359.76371 ↗
    ×1
  17. Cybenko (1989). Approximation by superpositions of a sigmoidal function. https://doi.org/10.1007/BF02551274 ↗
    ×1
  18. Hornik et al. (1989). Multilayer feedforward networks are universal approximators. https://doi.org/10.1016/0893-6080(89)90020-8 ↗
    ×1
  19. LeCun et al. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. https://direct.mit.edu/neco/article/1/4/541/5515/Backpropagation-Applied-to-Handwritten-Zip-Code ↗
    ×1
  20. Watkins (1989). Learning from Delayed Rewards. https://www.cs.rhul.ac.uk/~chrisw/thesis.html ↗
    ×2
  21. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic Local Alignment Search Tool. Journal of Molecular Biology, 215(3), 403–410. https://pubmed.ncbi.nlm.nih.gov/2231712/ ↗ DOI: 10.1016/S0022-2836(05)80360-2
    ×1
  22. Spirtes, P., & Glymour, C. (1991). An Algorithm for Fast Recovery of Sparse Causal Graphs. Social Science Computer Review, 9(1), 62–72. https://journals.sagepub.com/doi/10.1177/089443939100900106 ↗
    ×1
  23. Sutton, R. S. (1991). Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bulletin, 2(4), 160–163. https://dl.acm.org/doi/10.1145/122344.122377 ↗
    ×2
  24. Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning, 8(3-4), 257–277. https://link.springer.com/article/10.1007/BF00992697 ↗
    ×1
  25. Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning, 8(3–4), 257–277. https://link.springer.com/article/10.1007/BF00992697 ↗
    ×1
  26. Watkins and Dayan (1992). Q-Learning. https://link.springer.com/article/10.1007/BF00992698 ↗
    ×3
  27. Williams (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. https://doi.org/10.1007/BF00992696 ↗
    ×3
  28. Pearl, J. (1993). Comment: Graphical Models, Causality and Intervention. Statistical Science, 8(3), 266–269. https://projecteuclid.org/euclid.ss/1177010894 ↗ DOI: 10.1214/ss/1177010894
    ×1
  29. Gage (1994). A New Algorithm for Data Compression. https://dl.acm.org/doi/10.5555/177910.177914 ↗
    ×1
  30. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1994). Okapi at TREC-3. Proceedings of the Third Text REtrieval Conference (TREC-3), NIST Special Publication 500-225, 109–126. https://trec.nist.gov/pubs/trec3/papers/city.ps.gz ↗
    ×1
  31. Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. Journal of the American Statistical Association, 89(427), 846–866. https://www.jstor.org/stable/2290910 ↗ DOI: 10.1080/01621459.1994.10476818
    ×1
  32. State-Action-Reward-State-Action, Rummery and Niranjan (1994). On-line Q-learning using connectionist systems. https://www.semanticscholar.org/paper/On-line-Q-learning-using-connectionist-systems-Rummery-Niranjan/7a09464f26e18a25a948baaa736270bfb84b5e12 ↗
    ×1
  33. Cortes and Vapnik (1995). Support-Vector Networks. https://link.springer.com/article/10.1007/BF00994018 ↗
    ×1
  34. Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–688. https://academic.oup.com/biomet/article-abstract/82/4/669/251647 ↗ DOI: 10.1093/biomet/82.4.669
    ×5
  35. Spirtes, P., Meek, C., & Richardson, T. (1995). Causal Inference in the Presence of Latent Variables and Selection Bias. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI 1995), 499–506. https://arxiv.org/abs/1302.4983 ↗
    ×1
  36. Kavraki, L. E., Švestka, P., Latombe, J.-C., & Overmars, M. H. (1996). Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4), 566–580. https://ieeexplore.ieee.org/document/508439/ ↗ DOI: 10.1109/70.508439
    ×1
  37. Hochreiter and Schmidhuber (1997). Long Short-Term Memory. https://direct.mit.edu/neco/article/9/8/1735/6109/Long-Short-Term-Memory ↗
    ×5
  38. Tsitsiklis and Van Roy (1997). An Analysis of Temporal-Difference Learning with Function Approximation. https://ieeexplore.ieee.org/document/580874/ ↗
    ×1
  39. Bartlett et al. (1998). The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network. https://ieeexplore.ieee.org/document/661502/ ↗
    ×3
  40. Brin, S., & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30(1–7), 107–117. http://infolab.stanford.edu/~backrub/google.html ↗ DOI: 10.1016/S0169-7552(98)00110-X
    ×1
  41. LaValle, S. M. (1998). Rapidly-Exploring Random Trees: A New Tool for Path Planning. Technical Report No. 98-11, Computer Science Department, Iowa State University. http://msl.cs.illinois.edu/~lavalle/papers/Lav98c.pdf ↗
    ×1
  42. LeCun, Bottou, Bengio, Haffner (1998). Gradient-based learning applied to document recognition. https://ieeexplore.ieee.org/document/726791 ↗
    ×1
  43. Schapire, Freund, Bartlett, Lee (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. https://projecteuclid.org/journals/annals-of-statistics/volume-26/issue-5/Boosting-the-margin--a-new-explanation-for-the-effectiveness/10.1214/aos/1024691352.full ↗
    ×2
  44. McAllester (1999). PAC-Bayesian Model Averaging. https://dl.acm.org/doi/10.1145/307400.307435 ↗
    ×2
  45. Rao and Ballard (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. https://www.nature.com/articles/nn0199_79 ↗
    ×2
  46. Sutton, McAllester, Singh, Mansour (1999). Policy Gradient Methods for Reinforcement Learning with Function Approximation. https://proceedings.neurips.cc/paper/1999/hash/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html ↗
    ×3
  47. Tishby, Pereira, Bialek (1999). The Information Bottleneck Method. https://arxiv.org/abs/physics/0004057 ↗
    ×2
  48. Bicchi, A., & Kumar, V. (2000). Robotic grasping and contact: a review. Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation, 348–353. https://ieeexplore.ieee.org/document/844081/ ↗ DOI: 10.1109/ROBOT.2000.844081
    ×1
  49. Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. https://doi.org/10.1017/CBO9780511803161 ↗
    ×1
  50. Kakade (2001). A Natural Policy Gradient. https://dblp.org/rec/conf/nips/Kakade01.html ↗
    ×2
  51. Mason, M. T. (2001). Mechanics of Robotic Manipulation. MIT Press (Intelligent Robotics and Autonomous Agents series). https://mitpress.mit.edu/9780262133968/mechanics-of-robotic-manipulation/ ↗ DOI: 10.7551/mitpress/4527.001.0001
    ×1
  52. Bousquet and Elisseeff (2002). Stability and Generalization. https://www.jmlr.org/papers/v2/bousquet02a.html ↗
    ×1
  53. Brafman, R. I., & Tennenholtz, M. (2002). R-MAX – A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research, 3, 213–231. https://www.jmlr.org/papers/v3/brafman02a.html ↗ DOI: 10.1162/153244303765208377
    ×1
  54. Chickering, D. M. (2002). Optimal Structure Identification With Greedy Search. Journal of Machine Learning Research, 3, 507–554. https://jmlr.org/papers/v3/chickering02b.html ↗
    ×1
  55. Chickering, D. M. (2002). Optimal Structure Identification With Greedy Search. Journal of Machine Learning Research, 3, 507–554. https://jmlr.org/papers/v3/chickering02b.html ↗ DOI: 10.1162/153244303321897717
    ×1
  56. Kearns and Singh (2002). Near-Optimal Reinforcement Learning in Polynomial Time. https://link.springer.com/article/10.1023/A:1017984413808 ↗
    ×2
  57. Koltchinskii and Panchenko (2002). Empirical Margin Distributions and Bounding the Generalization Error of Combined Classifiers. https://projecteuclid.org/journals/annals-of-statistics/volume-30/issue-1/Empirical-Margin-Distributions-and-Bounding-the-Generalization--Error-of/10.1214/aos/1015362183.full ↗
    ×2
  58. Tian, J., & Pearl, J. (2002). A General Identification Condition for Causal Effects. Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI 2002), 567–573. https://cdn.aaai.org/AAAI/2002/AAAI02-085.pdf ↗
    ×2
  59. Bengio et al. (2003). A Neural Probabilistic Language Model. https://www.jmlr.org/papers/v3/bengio03a.html ↗
    ×1
  60. Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic Robotics. MIT Press (Intelligent Robotics and Autonomous Agents series). https://mitpress.mit.edu/9780262201629/probabilistic-robotics/ ↗
    ×3
  61. Coulom, R. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. Computers and Games: 5th International Conference, CG 2006, Lecture Notes in Computer Science, vol. 4630, pp. 72–83. https://link.springer.com/chapter/10.1007/978-3-540-75538-8_7 ↗
    ×1
  62. Dwork et al. (2006). Calibrating Noise to Sensitivity in Private Data Analysis. https://link.springer.com/chapter/10.1007/11681878_14 ↗
    ×1
  63. Hinton and Salakhutdinov (2006). Reducing the Dimensionality of Data with Neural Networks. https://www.science.org/doi/10.1126/science.1127647 ↗
    ×4
  64. Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A Linear Non-Gaussian Acyclic Model for Causal Discovery. Journal of Machine Learning Research, 7, 2003–2030. https://jmlr.org/papers/v7/shimizu06a.html ↗
    ×1
  65. Shpitser, I., & Pearl, J. (2006). Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models. Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006). https://cdn.aaai.org/AAAI/2006/AAAI06-191.pdf ↗
    ×3
  66. Strehl, A. L., & Littman, M. L. (2008). An analysis of model-based Interval Estimation for Markov Decision Processes. Journal of Computer and System Sciences, 74(8), 1309–1331. https://doi.org/10.1016/j.jcss.2007.08.009 ↗
    ×1
  67. Villani, C. (2008). Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften, Vol. 338, Springer-Verlag, Berlin. https://link.springer.com/book/10.1007/978-3-540-71050-9 ↗
    ×1
  68. Vincent et al. (2008). Extracting and composing robust features with denoising autoencoders. https://dl.acm.org/doi/10.1145/1390156.1390294 ↗
    ×4
  69. Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. Advances in Neural Information Processing Systems 21 (NIPS 2008). https://papers.nips.cc/paper/3548-nonlinear-causal-discovery-with-additive-noise-models ↗
    ×1
  70. Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press (2nd ed.). https://doi.org/10.1017/CBO9780511803161 ↗
    ×1
  71. Schmidt, M., & Lipson, H. (2009). Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923), 81–85. https://www.science.org/doi/10.1126/science.1165893 ↗
    ×1
  72. Glorot and Bengio (2010). Understanding the difficulty of training deep feedforward neural networks. https://proceedings.mlr.press/v9/glorot10a.html ↗
    ×1
  73. Jaksch, Ortner, Auer (2010). Near-optimal Regret Bounds for Reinforcement Learning. https://jmlr.org/papers/v11/jaksch10a.html ↗
    ×1
  74. Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. Interspeech 2010. https://www.isca-archive.org/interspeech_2010/mikolov10_interspeech.html ↗ DOI: 10.21437/Interspeech.2010-343
    ×1
  75. Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011), PMLR 15, 315–323. https://proceedings.mlr.press/v15/glorot11a.html ↗
    ×1
  76. Karaman, S., & Frazzoli, E. (2011). Sampling-based Algorithms for Optimal Motion Planning. The International Journal of Robotics Research, 30(7), 846–894. https://arxiv.org/abs/1105.1186 ↗ DOI: 10.1177/0278364911406761
    ×1
  77. Ross, S., Gordon, G. J., & Bagnell, J. A. (2011). A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011), PMLR 15, 627–635. https://arxiv.org/abs/1011.0686 ↗
    ×1
  78. AlexNet (2012). ImageNet Classification with Deep Convolutional Neural Networks. https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html ↗
    ×3
  79. Merck & Co. (2012). Merck Molecular Activity Challenge. Kaggle Competition. https://www.kaggle.com/c/MerckActivity ↗
    ×1
  80. Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., Mao, M. Z., Ranzato, M. A., Senior, A., Tucker, P., Yang, K., & Ng, A. Y. (2012). Large Scale Distributed Deep Networks. Advances in Neural Information Processing Systems 25 (NeurIPS 2012). https://papers.nips.cc/paper/2012/hash/6aca97005c68f1206823815f66102863-Abstract.html ↗
    ×1
  81. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25 (NeurIPS 2012). https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks ↗
    ×2
  82. Krizhevsky, Sutskever, Hinton (2012). ImageNet Classification with Deep Convolutional Neural Networks. https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html ↗
    ×5
  83. Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., & Mooij, J. (2012). On Causal and Anticausal Learning. Proceedings of the 29th International Conference on Machine Learning (ICML 2012). https://arxiv.org/abs/1206.6471 ↗
    ×3
  84. Schuster, M., & Nakajima, K. (2012). Japanese and Korean voice search. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5149–5152. https://research.google/pubs/japanese-and-korean-voice-search/ ↗ DOI: 10.1109/ICASSP.2012.6289079
    ×1
  85. Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv:1312.6114. https://arxiv.org/abs/1312.6114 ↗
    ×3
  86. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/abs/1301.3781 ↗
    ×2
  87. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602. https://arxiv.org/abs/1312.5602 ↗
    ×2
  88. Russo and Van Roy (2013). Eluder Dimension and the Sample Complexity of Optimistic Exploration. https://proceedings.neurips.cc/paper/2013/hash/41bfd20a38bb1b0bec75acf0845530a7-Abstract.html ↗
    ×1
  89. Bahdanau et al. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. https://arxiv.org/abs/1409.0473 ↗
    ×1
  90. Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. https://global.oup.com/academic/product/superintelligence-9780199678112 ↗
    ×4
  91. Cho et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. https://arxiv.org/abs/1406.1078 ↗
    ×2
  92. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems 27 (NeurIPS 2014). https://arxiv.org/abs/1406.2661 ↗
    ×2
  93. Kingma and Ba (2014). Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980 ↗
    ×2
  94. Pennington, Socher, Manning (2014). GloVe: Global Vectors for Word Representation. https://aclanthology.org/D14-1162/ ↗
    ×3
  95. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), PMLR 32, 387–395. https://proceedings.mlr.press/v32/silver14.html ↗
    ×2
  96. Simonyan and Zisserman (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556 ↗
    ×2
  97. Srivastava et al. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. https://jmlr.org/papers/v15/srivastava14a.html ↗
    ×1
  98. Sutskever et al. (2014). Sequence to Sequence Learning with Neural Networks. https://arxiv.org/abs/1409.3215 ↗
    ×1
  99. Szegedy et al. (2014). Going Deeper with Convolutions. https://arxiv.org/abs/1409.4842 ↗
    ×2
  100. Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014, Lecture Notes in Computer Science, vol. 8689, pp. 818–833. https://arxiv.org/abs/1311.2901 ↗ DOI: 10.1007/978-3-319-10590-1_53
    ×1
  101. Alipanahi, B., Delong, A., Weirauch, M. T., & Frey, B. J. (2015). Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nature Biotechnology, 33(8), 831–838. https://www.nature.com/articles/nbt.3300 ↗ DOI: 10.1038/nbt.3300
    ×1
  102. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual Question Answering. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), 2425–2433. https://arxiv.org/abs/1505.00468 ↗ DOI: 10.1109/ICCV.2015.279
    ×1
  103. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., & Samek, W. (2015). On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE, 10(7), e0130140. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140 ↗
    ×1
  104. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015), 1721–1730. https://dl.acm.org/doi/10.1145/2783258.2788613 ↗
    ×3
  105. DQN (2015). Human-level control through deep reinforcement learning. https://www.nature.com/articles/nature14236 ↗
    ×1
  106. He et al. (2015). Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385 ↗
    ×5
  107. Hinton et al. (2015). Distilling the Knowledge in a Neural Network. https://arxiv.org/abs/1503.02531 ↗
    ×1
  108. Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. https://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB ↗ DOI: 10.1017/CBO9781139025751
    ×1
  109. Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), PMLR 37, 448–456. https://arxiv.org/abs/1502.03167 ↗
    ×1
  110. Karpathy, A., & Fei-Fei, L. (2015). Deep Visual-Semantic Alignments for Generating Image Descriptions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 3128–3137. https://arxiv.org/abs/1412.2306 ↗ DOI: 10.1109/CVPR.2015.7298932
    ×1
  111. Kiros et al. (2015). Skip-Thought Vectors. https://arxiv.org/abs/1506.06726 ↗
    ×1
  112. Lillicrap et al. (2015). Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971 ↗
    ×2
  113. ResNet (2015). Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385 ↗
    ×1
  114. Rezende, D. J., & Mohamed, S. (2015). Variational Inference with Normalizing Flows. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), PMLR 37, 1530–1538. https://arxiv.org/abs/1505.05770 ↗
    ×1
  115. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science, vol. 9351, 234–241. https://arxiv.org/abs/1505.04597 ↗ DOI: 10.1007/978-3-319-24574-4_28
    ×1
  116. Russakovsky et al. (2015). ImageNet Large Scale Visual Recognition Challenge. https://arxiv.org/abs/1409.0575 ↗
    ×3
  117. Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015). Trust Region Policy Optimization. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), PMLR 37, 1889–1897. https://arxiv.org/abs/1502.05477 ↗
    ×1
  118. Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), PMLR 37, 2256–2265. https://arxiv.org/abs/1503.03585 ↗
    ×2
  119. Tishby and Zaslavsky (2015). Deep Learning and the Information Bottleneck Principle. https://arxiv.org/abs/1503.02406 ↗
    ×2
  120. VanderWeele, T. J. (2015). Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press. https://global.oup.com/academic/product/explanation-in-causal-inference-9780199325870 ↗
    ×1
  121. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and Tell: A Neural Image Caption Generator. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 3156–3164. https://arxiv.org/abs/1411.4555 ↗ DOI: 10.1109/CVPR.2015.7298935
    ×1
  122. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete Problems in AI Safety. arXiv:1606.06565. https://arxiv.org/abs/1606.06565 ↗
    ×2
  123. Ba et al. (2016). Layer Normalization. https://arxiv.org/abs/1607.06450 ↗
    ×1
  124. Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying Count-Based Exploration and Intrinsic Motivation. Advances in Neural Information Processing Systems 29 (NeurIPS 2016). https://arxiv.org/abs/1606.01868 ↗
    ×1
  125. Chen, T., Xu, B., Zhang, C., & Guestrin, C. (2016). Training Deep Nets with Sublinear Memory Cost. arXiv:1604.06174. https://arxiv.org/abs/1604.06174 ↗
    ×1
  126. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2016). Double Machine Learning for Treatment and Causal Parameters. arXiv:1608.00060. https://arxiv.org/abs/1608.00060 ↗ DOI: 10.48550/arXiv.1608.00060
    ×1
  127. Goodfellow, Bengio, Courville (2016). Deep Learning. https://www.deeplearningbook.org/ ↗
    ×1
  128. Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. Advances in Neural Information Processing Systems 29 (NIPS 2016), 3323–3331. https://arxiv.org/abs/1610.02413 ↗
    ×1
  129. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 770–778. https://arxiv.org/abs/1512.03385 ↗ DOI: 10.1109/CVPR.2016.90
    ×1
  130. Hendrycks, D., & Gimpel, K. (2016). Gaussian Error Linear Units (GELUs). arXiv:1606.08415. https://arxiv.org/abs/1606.08415 ↗
    ×1
  131. Huang et al. (2016). Deep Networks with Stochastic Depth. https://arxiv.org/abs/1603.09382 ↗
    ×1
  132. Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5), 947–1012. https://arxiv.org/abs/1501.01332 ↗ DOI: 10.1111/rssb.12167
    ×1
  133. Kelley, D. R., Snoek, J., & Rinn, J. L. (2016). Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Research, 26(7), 990–999. https://genome.cshlp.org/content/26/7/990 ↗ DOI: 10.1101/gr.200535.115
    ×1
  134. Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., & Farhadi, A. (2016). A Diagram Is Worth A Dozen Images. European Conference on Computer Vision (ECCV 2016). https://arxiv.org/abs/1603.07396 ↗ DOI: 10.1007/978-3-319-46493-0_15
    ×1
  135. Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improving Variational Inference with Inverse Autoregressive Flow. Advances in Neural Information Processing Systems 29 (NeurIPS 2016). https://arxiv.org/abs/1606.04934 ↗
    ×1
  136. Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End-to-End Training of Deep Visuomotor Policies. Journal of Machine Learning Research, 17(39), 1–40. https://arxiv.org/abs/1504.00702 ↗
    ×2
  137. Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5), 947–1012. https://arxiv.org/abs/1501.01332 ↗ DOI: 10.1111/rssb.12167
    ×1
  138. Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. 4th International Conference on Learning Representations (ICLR 2016); arXiv:1511.06434. https://arxiv.org/abs/1511.06434 ↗
    ×2
  139. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), 1135–1144. https://arxiv.org/abs/1602.04938 ↗ DOI: 10.1145/2939672.2939778
    ×1
  140. Russo and Zou (2016). Controlling Bias in Adaptive Data Analysis Using Information Theory. https://proceedings.mlr.press/v51/russo16.html ↗
    ×2
  141. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved Techniques for Training GANs. Advances in Neural Information Processing Systems 29 (NeurIPS 2016). https://arxiv.org/abs/1606.03498 ↗
    ×1
  142. Schaul et al. (2016). Prioritized Experience Replay. https://arxiv.org/abs/1511.05952 ↗
    ×1
  143. Schulman, Moritz, Levine, Jordan, Abbeel (2016). High-Dimensional Continuous Control Using Generalized Advantage Estimation. https://arxiv.org/abs/1506.02438 ↗
    ×2
  144. Sennrich, Haddow, Birch (2016). Neural Machine Translation of Rare Words with Subword Units. https://arxiv.org/abs/1508.07909 ↗
    ×2
  145. Theis, L., van den Oord, A., & Bethge, M. (2016). A Note on the Evaluation of Generative Models. 4th International Conference on Learning Representations (ICLR 2016). https://arxiv.org/abs/1511.01844 ↗
    ×1
  146. Thomas, P. S., & Brunskill, E. (2016). Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), PMLR 48, 2139–2148. https://proceedings.mlr.press/v48/thomasa16.html ↗
    ×1
  147. van den Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel Recurrent Neural Networks. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), PMLR 48, 1747–1756. https://arxiv.org/abs/1601.06759 ↗
    ×4
  148. van Hasselt, Guez, Silver (2016). Deep Reinforcement Learning with Double Q-learning. https://arxiv.org/abs/1509.06461 ↗
    ×1
  149. Wang et al. (2016). Dueling Network Architectures for Deep Reinforcement Learning. https://arxiv.org/abs/1511.06581 ↗
    ×1
  150. AlphaZero (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. https://arxiv.org/abs/1712.01815 ↗
    ×1
  151. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), PMLR 70, 214–223. https://arxiv.org/abs/1701.07875 ↗
    ×2
  152. Bartlett, Foster, Telgarsky (2017). Spectrally-normalized margin bounds for neural networks. https://arxiv.org/abs/1706.08498 ↗
    ×2
  153. Bellemare, Dabney, Munos (2017). A Distributional Perspective on Reinforcement Learning. https://arxiv.org/abs/1707.06887 ↗
    ×1
  154. Carleo, G., & Troyer, M. (2017). Solving the quantum many-body problem with artificial neural networks. Science, 355(6325), 602–606. https://arxiv.org/abs/1606.02318 ↗ DOI: 10.1126/science.aag2302
    ×1
  155. Chouldechova, A. (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153–163. https://doi.org/10.1089/big.2016.0047 ↗
    ×2
  156. Christiano, Leike, Brown, Martic, Legg, Amodei (2017). Deep Reinforcement Learning from Human Preferences. https://arxiv.org/abs/1706.03741 ↗
    ×12
  157. Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2017). Density estimation using Real NVP. 5th International Conference on Learning Representations (ICLR 2017). https://arxiv.org/abs/1605.08803 ↗
    ×1
  158. Dziugaite and Roy (2017). Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. https://arxiv.org/abs/1703.11008 ↗
    ×5
  159. Elfwing et al. (2017). Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. https://arxiv.org/abs/1702.03118 ↗
    ×1
  160. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural Message Passing for Quantum Chemistry. Proceedings of the 34th International Conference on Machine Learning (ICML 2017). https://arxiv.org/abs/1704.01212 ↗
    ×1
  161. Goyal et al. (2017). Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. https://arxiv.org/abs/1706.02677 ↗
    ×3
  162. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved Training of Wasserstein GANs. Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://arxiv.org/abs/1704.00028 ↗
    ×1
  163. Gunasekar et al. (2017). Implicit Regularization in Matrix Factorization. https://arxiv.org/abs/1705.09280 ↗
    ×1
  164. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://arxiv.org/abs/1706.08500 ↗
    ×1
  165. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., & Lerchner, A. (2017). beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. 5th International Conference on Learning Representations (ICLR 2017). https://openreview.net/forum?id=Sy2fzU9gl ↗
    ×1
  166. Jiang et al. (2017). Contextual Decision Processes with Low Bellman Rank are PAC-Learnable. https://arxiv.org/abs/1610.09512 ↗
    ×1
  167. Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., & Girshick, R. (2017). CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). https://arxiv.org/abs/1612.06890 ↗ DOI: 10.1109/CVPR.2017.215
    ×1
  168. Keskar et al. (2017). On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. https://arxiv.org/abs/1609.04836 ↗
    ×2
  169. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). Inherent Trade-Offs in the Fair Determination of Risk Scores. Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), 67, 43:1–43:23. https://arxiv.org/abs/1609.05807 ↗ DOI: 10.4230/LIPIcs.ITCS.2017.43
    ×3
  170. Kusner, M. J., Loftus, J. R., Russell, C., & Silva, R. (2017). Counterfactual Fairness. Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://arxiv.org/abs/1703.06856 ↗
    ×3
  171. Loshchilov and Hutter (2017). Decoupled Weight Decay Regularization. https://arxiv.org/abs/1711.05101 ↗
    ×2
  172. Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://arxiv.org/abs/1705.07874 ↗
    ×1
  173. McCann et al. (2017). Learned in Translation: Contextualized Word Vectors. https://arxiv.org/abs/1708.00107 ↗
    ×1
  174. Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., Ginsburg, B., Houston, M., Kuchaiev, O., Venkatesh, G., & Wu, H. (2017). Mixed Precision Training. arXiv:1710.03740 (later published at ICLR 2018). https://arxiv.org/abs/1710.03740 ↗
    ×3
  175. Neyshabur, B., Bhojanapalli, S., & Srebro, N. (2017). A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks. International Conference on Learning Representations (ICLR 2018); arXiv:1707.09564. https://arxiv.org/abs/1707.09564 ↗
    ×2
  176. Olah, C., Mordvintsev, A., & Schubert, L. (2017). Feature Visualization. Distill, 2(11). https://distill.pub/2017/feature-visualization/ ↗ DOI: 10.23915/distill.00007
    ×1
  177. Papamakarios, G., Pavlakou, T., & Murray, I. (2017). Masked Autoregressive Flow for Density Estimation. Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://arxiv.org/abs/1705.07057 ↗
    ×1
  178. Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven Exploration by Self-supervised Prediction. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), PMLR 70, 2778–2787. https://arxiv.org/abs/1705.05363 ↗
    ×1
  179. Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). https://arxiv.org/abs/1612.00593 ↗ DOI: 10.1109/CVPR.2017.16
    ×1
  180. Ramachandran et al. (2017). Searching for Activation Functions. https://arxiv.org/abs/1710.05941 ↗
    ×1
  181. Schulman, Wolski, Dhariwal, Radford, Klimov (2017). Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347 ↗
    ×5
  182. Schütt, K. T., Kindermans, P.-J., Sauceda, H. E., Chmiela, S., Tkatchenko, A., & Müller, K.-R. (2017). SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://arxiv.org/abs/1706.08566 ↗
    ×1
  183. Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating individual treatment effect: generalization bounds and algorithms. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), PMLR 70, 3076–3085. https://arxiv.org/abs/1606.03976 ↗
    ×1
  184. Shazeer et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. https://arxiv.org/abs/1701.06538 ↗
    ×2
  185. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815. https://arxiv.org/abs/1712.01815 ↗
    ×1
  186. Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic Attribution for Deep Networks. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), PMLR 70, 3319–3328. https://arxiv.org/abs/1703.01365 ↗
    ×3
  187. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017). https://arxiv.org/abs/1703.06907 ↗ DOI: 10.1109/IROS.2017.8202133
    ×3
  188. Transformer (2017). Attention Is All You Need. https://arxiv.org/abs/1706.03762 ↗
    ×1
  189. van den Oord, A., Vinyals, O., & Kavukcuoglu, K. (2017). Neural Discrete Representation Learning. Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://arxiv.org/abs/1711.00937 ↗
    ×1
  190. Vaswani et al. (2017). Attention Is All You Need. https://arxiv.org/abs/1706.03762 ↗
    ×8
  191. NVIDIA Corporation (2017). NVIDIA Tesla V100 GPU Architecture: The World's Most Advanced Data Center GPU. NVIDIA Whitepaper WP-08608-001_v1.1, August 2017. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf ↗
    ×1
  192. Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology, 31(2), 841–887. https://arxiv.org/abs/1711.00399 ↗ DOI: 10.2139/ssrn.3063289
    ×2
  193. Wager, S., & Athey, S. (2017). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 1228–1242. https://arxiv.org/abs/1510.04342 ↗ DOI: 10.1080/01621459.2017.1319839
    ×1
  194. Xu and Raginsky (2017). Information-theoretic analysis of generalization capability of learning algorithms. https://arxiv.org/abs/1705.07809 ↗
    ×2
  195. Zhang et al. (2017). Understanding deep learning requires rethinking generalization. https://arxiv.org/abs/1611.03530 ↗
    ×3
  196. Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2018). Sanity Checks for Saliency Maps. Advances in Neural Information Processing Systems 31 (NeurIPS 2018). https://arxiv.org/abs/1810.03292 ↗
    ×2
  197. Arora et al. (2018). Stronger Generalization Bounds for Deep Nets via a Compression Approach. https://arxiv.org/abs/1802.05296 ↗
    ×1
  198. Sergeev, A., & Del Balso, M. (2018). Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799. https://arxiv.org/abs/1802.05799 ↗
    ×1
  199. Barratt, S., & Sharma, R. (2018). A Note on the Inception Score. arXiv:1801.01973 (ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models). https://arxiv.org/abs/1801.01973 ↗
    ×1
  200. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68. https://onlinelibrary.wiley.com/doi/abs/10.1111/ectj.12097 ↗
    ×2
  201. Chizat and Bach (2018). On Lazy Training in Differentiable Programming. https://arxiv.org/abs/1812.07956 ↗
    ×2
  202. Christiano, P., Shlegeris, B., & Amodei, D. (2018). Supervising Strong Learners by Amplifying Weak Experts. arXiv:1810.08575. https://arxiv.org/abs/1810.08575 ↗
    ×3
  203. Conneau, A., Kruszewski, G., Lample, G., Barrault, L., & Baroni, M. (2018). What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Volume 1: Long Papers, 2126–2136. https://arxiv.org/abs/1805.01070 ↗ DOI: 10.18653/v1/P18-1198
    ×2
  204. Devlin et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805 ↗
    ×7
  205. Frankle and Carbin (2018). The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. https://arxiv.org/abs/1803.03635 ↗
    ×1
  206. Garipov et al. (2018). Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs. https://arxiv.org/abs/1802.10026 ↗
    ×1
  207. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. https://arxiv.org/abs/1804.07461 ↗ DOI: 10.18653/v1/W18-5446
    ×1
  208. GPT (2018). Improving Language Understanding by Generative Pre-Training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ↗
    ×1
  209. Ha and Schmidhuber (2018). World Models. https://arxiv.org/abs/1803.10122 ↗
    ×2
  210. Hessel et al. (2018). Rainbow: Combining Improvements in Deep Reinforcement Learning. https://arxiv.org/abs/1710.02298 ↗
    ×2
  211. Howard & Ruder (2018). Universal Language Model Fine-tuning for Text Classification. https://arxiv.org/abs/1801.06146 ↗
    ×1
  212. Irving, G., Christiano, P., & Amodei, D. (2018). AI safety via debate. arXiv:1805.00899. https://arxiv.org/abs/1805.00899 ↗
    ×3
  213. Jacot, A., Gabriel, F., & Hongler, C. (2018). Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems 31 (NeurIPS 2018). https://arxiv.org/abs/1806.07572 ↗
    ×3
  214. Jin, W., Barzilay, R., & Jaakkola, T. (2018). Junction Tree Variational Autoencoder for Molecular Graph Generation. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), PMLR 80, 2323–2332. https://arxiv.org/abs/1802.04364 ↗
    ×1
  215. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive Growing of GANs for Improved Quality, Stability, and Variation. International Conference on Learning Representations (ICLR 2018). https://arxiv.org/abs/1710.10196 ↗
    ×2
  216. Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative Flow with Invertible 1x1 Convolutions. Advances in Neural Information Processing Systems 31 (NeurIPS 2018). https://arxiv.org/abs/1807.03039 ↗
    ×1
  217. Kudo and Richardson (2018). Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. https://arxiv.org/abs/1804.10959 ↗
    ×4
  218. Kurutach, T., Tamar, A., Yang, G., Russell, S., & Abbeel, P. (2018). Learning Plannable Representations with Causal InfoGAN. Advances in Neural Information Processing Systems 31 (NeurIPS 2018). https://arxiv.org/abs/1807.09341 ↗
    ×1
  219. Leike, J., Krueger, D., Everitt, T., Martic, M., Maini, V., & Legg, S. (2018). Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871. https://arxiv.org/abs/1811.07871 ↗ DOI: 10.48550/arXiv.1811.07871
    ×2
  220. Manheim, D., & Garrabrant, S. (2018). Categorizing Variants of Goodhart's Law. arXiv:1803.04585. https://arxiv.org/abs/1803.04585 ↗
    ×2
  221. NLP (2018).
    ×1
    not_found: details

    'NLP' is a field abbreviation, not an author; '(2018)' here is a year reference to BERT's release, not a formal author-year citation to a specific paper

  222. Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., & Mordvintsev, A. (2018). The Building Blocks of Interpretability. Distill, 3(3), e10. https://distill.pub/2018/building-blocks/ ↗ DOI: 10.23915/distill.00010
    ×1
  223. Oord, Li, Vinyals (2018). Representation Learning with Contrastive Predictive Coding. https://arxiv.org/abs/1807.03748 ↗
    ×4
  224. Achiam, J. (2018). Spinning Up in Deep RL. OpenAI. https://spinningup.openai.com/en/latest/ ↗
    ×1
  225. Pearl and Mackenzie (2018). Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution. https://arxiv.org/abs/1801.04016 ↗
    ×2
  226. Peters et al. (2018). Deep contextualized word representations. https://arxiv.org/abs/1802.05365 ↗
    ×2
  227. Radford et al. (2018). Improving Language Understanding by Generative Pre-Training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ↗
    ×6
  228. Roch et al. (2018). search Scholar ↗
    ×1
  229. Sajjadi et al. (2018). search Scholar ↗
    ×2
  230. Soudry et al. (2018). The Implicit Bias of Gradient Descent on Separable Data. https://arxiv.org/abs/1710.10345 ↗
    ×2
  231. Thomas et al. (2018). search Scholar ↗
    ×2
  232. Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 1228–1242. https://arxiv.org/abs/1510.04342 ↗ DOI: 10.1080/01621459.2017.1319839
    ×2
  233. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. https://arxiv.org/abs/1804.07461 ↗ DOI: 10.18653/v1/W18-5446
    ×1
  234. Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K., & Pande, V. (2018). MoleculeNet: A Benchmark for Molecular Machine Learning. Chemical Science, 9(2), 513–530. https://arxiv.org/abs/1703.00564 ↗ DOI: 10.1039/C7SC02664A
    ×1
  235. Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2018b). Soft Actor-Critic Algorithms and Applications. arXiv:1812.05905. https://arxiv.org/abs/1812.05905 ↗
    ×1
  236. Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant Risk Minimization. arXiv:1907.02893. https://arxiv.org/abs/1907.02893 ↗
    ×3
  237. Arora et al. (2019). Implicit Regularization in Deep Matrix Factorization. https://arxiv.org/abs/1905.13655 ↗
    ×1
  238. Belkin, Hsu, Ma, Mandal (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. https://www.pnas.org/doi/10.1073/pnas.1903070116 ↗
    ×3
  239. Borji, A. (2019). Pros and Cons of GAN Evaluation Measures. Computer Vision and Image Understanding, 179, 41–65. https://arxiv.org/abs/1802.03446 ↗ DOI: 10.1016/j.cviu.2018.10.009
    ×1
  240. Brock, A., Donahue, J., & Simonyan, K. (2019). Large Scale GAN Training for High Fidelity Natural Image Synthesis. International Conference on Learning Representations (ICLR 2019). https://arxiv.org/abs/1809.11096 ↗
    ×2
  241. Chollet, F. (2019). On the Measure of Intelligence. arXiv:1911.01547. https://arxiv.org/abs/1911.01547 ↗
    ×3
  242. Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., Lee, H., Ngiam, J., Le, Q. V., Wu, Y., & Chen, Z. (2019). GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). https://arxiv.org/abs/1811.06965 ↗
    ×2
  243. Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2019). Dream to Control: Learning Behaviors by Latent Imagination. arXiv:1912.01603 (later ICLR 2020). https://arxiv.org/abs/1912.01603 ↗
    ×1
  244. Hewitt, J., & Liang, P. (2019). Designing and Interpreting Probes with Control Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019). https://arxiv.org/abs/1909.03368 ↗ DOI: 10.18653/v1/D19-1275
    ×1
  245. Houlsby et al. (2019). Parameter-Efficient Transfer Learning for NLP. https://arxiv.org/abs/1902.00751 ↗
    ×3
  246. Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., Lee, H., Ngiam, J., Le, Q. V., Wu, Y., & Chen, Z. (2019). GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). https://arxiv.org/abs/1811.06965 ↗
    ×1
  247. Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., & Garrabrant, S. (2019). Risks from Learned Optimization in Advanced Machine Learning Systems. arXiv:1906.01820. https://arxiv.org/abs/1906.01820 ↗
    ×1
  248. Hudson, D. A., & Manning, C. D. (2019). GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019). https://arxiv.org/abs/1902.09506 ↗
    ×1
  249. Hyvärinen, A., Sasaki, H., & Turner, R. E. (2019). Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019), PMLR 89, 859–868. https://arxiv.org/abs/1805.08651 ↗
    ×2
  250. Jain, S., & Wallace, B. C. (2019). Attention is not Explanation. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019). https://arxiv.org/abs/1902.10186 ↗ DOI: 10.18653/v1/N19-1357
    ×2
  251. Karras, T., Laine, S., & Aila, T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), 4401–4410. https://arxiv.org/abs/1812.04948 ↗ DOI: 10.1109/CVPR.2019.00453
    ×2
  252. Künzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 116(10), 4156–4165. https://www.pnas.org/doi/10.1073/pnas.1804597116 ↗
    ×1
  253. Kynkäänniemi et al. (2019). search Scholar ↗
    ×1
  254. Li et al. (2019). search Scholar ↗
    ×1
  255. Locatello et al. (2019). search Scholar ↗
    ×3
  256. Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). https://arxiv.org/abs/1908.02265 ↗
    ×1
  257. Marcus and Davis (2019). Rebooting AI: Building Artificial Intelligence We Can Trust. https://www.penguinrandomhouse.com/books/603982/rebooting-ai-by-gary-marcus-and-ernest-davis/ ↗
    ×1
  258. Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2019). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053. https://arxiv.org/abs/1909.08053 ↗
    ×1
  259. Mei and Montanari (2019). The generalization error of random features regression: Precise asymptotics and double descent curve. https://arxiv.org/abs/1908.05355 ↗
    ×1
  260. Nagarajan and Kolter (2019). Uniform convergence may be unable to explain generalization in deep learning. https://arxiv.org/abs/1902.04742 ↗
    ×6
  261. Negrea et al. (2019). Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates. https://arxiv.org/abs/1911.02151 ↗
    ×2
  262. Peyré, G., & Cuturi, M. (2019). Computational Optimal Transport. Foundations and Trends in Machine Learning, 11(5–6), 355–607. https://arxiv.org/abs/1803.00567 ↗ DOI: 10.1561/2200000073
    ×1
  263. Radford et al. (2019). Language Models are Unsupervised Multitask Learners. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf ↗
    ×5
  264. Raffel et al. (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. https://arxiv.org/abs/1910.10683 ↗
    ×4
  265. Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. https://www.sciencedirect.com/science/article/abs/pii/S0021999118307125 ↗ DOI: 10.1016/j.jcp.2018.10.045
    ×1
  266. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019). https://arxiv.org/abs/1908.10084 ↗ DOI: 10.18653/v1/D19-1410
    ×1
  267. Saunshi et al. (2019). A Theoretical Analysis of Contrastive Unsupervised Representation Learning. https://arxiv.org/abs/1902.09229 ↗
    ×1
  268. Schreck, J. S., Coley, C. W., & Bishop, K. J. M. (2019). Learning Retrosynthetic Planning through Simulated Experience. ACS Central Science, 5(6), 970–981. https://pubs.acs.org/doi/10.1021/acscentsci.9b00055 ↗
    ×1
  269. Shi, C., Blei, D. M., & Veitch, V. (2019). Adapting Neural Networks for the Estimation of Treatment Effects. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). https://arxiv.org/abs/1906.02120 ↗
    ×1
  270. Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2019). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053. https://arxiv.org/abs/1909.08053 ↗
    ×1
  271. Song, Y., & Ermon, S. (2019). Generative Modeling by Estimating Gradients of the Data Distribution. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). https://arxiv.org/abs/1907.05600 ↗
    ×2
  272. Wang, A., Pruksachatkun, Y., Nangia, N., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2019). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). https://arxiv.org/abs/1905.00537 ↗
    ×1
  273. Tan and Bansal (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. https://arxiv.org/abs/1905.11946 ↗
    ×2
  274. Zhang and Sennrich (2019). Root Mean Square Layer Normalization. https://arxiv.org/abs/1910.07467 ↗
    ×5
  275. Abnar, S., & Zuidema, W. (2020). Quantifying Attention Flow in Transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 4190–4197. https://aclanthology.org/2020.acl-main.385/ ↗ DOI: 10.18653/v1/2020.acl-main.385
    ×2
  276. NVIDIA Corporation (2020). NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Acceleration at Every Scale. NVIDIA Whitepaper, v1.0. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf ↗
    ×1
  277. Bartlett, Long, Lugosi, Tsigler (2020). Benign Overfitting in Linear Regression. https://arxiv.org/abs/1906.11300 ↗
    ×2
  278. Belkin, Hsu, Xu (2020). Two Models of Double Descent for Weak Features. https://epubs.siam.org/doi/10.1137/20M1336072 ↗
    ×1
  279. Beltagy, Peters, Cohan (2020). Longformer: The Long-Document Transformer. https://arxiv.org/abs/2004.05150 ↗
    ×1
  280. Brown et al. (2020). Language Models are Few-Shot Learners. https://arxiv.org/abs/2005.14165 ↗
    ×7
  281. Bu, Zou, Veeravalli (2020). Tightening Mutual Information Based Bounds on Generalization Error. https://arxiv.org/abs/1901.04609 ↗
    ×2
  282. Burger, B., Maffettone, P. M., Gusev, V. V., Aitchison, C. M., Bai, Y., Wang, X., Li, X., Alston, B. M., Li, B., Clowes, R., Rankin, N., Harris, B., Sprick, R. S., & Cooper, A. I. (2020). A mobile robotic chemist. Nature, 583(7815), 237–241. https://www.nature.com/articles/s41586-020-2442-2 ↗ DOI: 10.1038/s41586-020-2442-2
    ×1
  283. Caron et al. (2020). Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. https://arxiv.org/abs/2006.09882 ↗
    ×3
  284. Chen et al. (2020). A Simple Framework for Contrastive Learning of Visual Representations. https://arxiv.org/abs/2002.05709 ↗
    ×4
  285. Chithrananda, S., Grand, G., & Ramsundar, B. (2020). ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. arXiv:2010.09885. https://arxiv.org/abs/2010.09885 ↗
    ×1
  286. Cranmer, M., Greydanus, S., Hoyer, S., Battaglia, P., Spergel, D., & Ho, S. (2020). Lagrangian Neural Networks. arXiv:2003.04630 (ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations). https://arxiv.org/abs/2003.04630 ↗
    ×1
  287. Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020). ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). https://arxiv.org/abs/1910.02054 ↗ DOI: 10.1109/SC41405.2020.00024
    ×1
  288. Dosovitskiy et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://arxiv.org/abs/2010.11929 ↗
    ×4
  289. Dunn, A., Wang, Q., Ganose, A., Dopp, D., & Jain, A. (2020). Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Computational Materials, 6, 138. https://arxiv.org/abs/2005.00707 ↗ DOI: 10.1038/s41524-020-00406-3
    ×1
  290. Fang, H.-S., Wang, C., Gou, M., & Lu, C. (2020). GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 11444–11453. https://openaccess.thecvf.com/content_CVPR_2020/html/Fang_GraspNet-1Billion_A_Large-Scale_Benchmark_for_General_Object_Grasping_CVPR_2020_paper.html ↗ DOI: 10.1109/CVPR42600.2020.01146
    ×1
  291. Fu et al. (2020). search Scholar ↗
    ×1
  292. Gao et al. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. https://arxiv.org/abs/2101.00027 ↗
    ×2
  293. Gehman et al. (2020). search Scholar ↗
    ×2
  294. GPUs (2020). search Scholar ↗
    ×1
  295. Grill et al. (2020). Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. https://arxiv.org/abs/2006.07733 ↗
    ×3
  296. Gu et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections. https://arxiv.org/abs/2008.07669 ↗
    ×2
  297. Hafner, Lillicrap, Norouzi, Ba (2020). search Scholar ↗
    ×2
  298. He et al. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. https://arxiv.org/abs/1911.05722 ↗
    ×3
  299. Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2020). Measuring Massive Multitask Language Understanding. International Conference on Learning Representations (ICLR 2021). https://arxiv.org/abs/2009.03300 ↗
    ×1
  300. Hermann, J., Schätzle, Z., & Noé, F. (2020). Deep-neural-network solution of the electronic Schrödinger equation. Nature Chemistry, 12(10), 891–897. https://arxiv.org/abs/1909.08423 ↗ DOI: 10.1038/s41557-020-0544-y
    ×1
  301. Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ ↗
    ×2
  302. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2006.11239 ↗
    ×1
  303. Holtzman et al. (2020). The Curious Case of Neural Text Degeneration. https://arxiv.org/abs/1904.09751 ↗
    ×2
  304. Jin et al. (2020). Provably Efficient Reinforcement Learning with Linear Function Approximation. https://arxiv.org/abs/1907.05388 ↗
    ×1
  305. June (2020).
    ×1
    not_found: details

    'June' is not an author surname; the parenthetical 'June 2020' in the context denotes GPT-3's release month, not an author-year citation

  306. Kaplan et al. (2020). Scaling Laws for Neural Language Models. https://arxiv.org/abs/2001.08361 ↗
    ×13
  307. Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense Passage Retrieval for Open-Domain Question Answering. https://arxiv.org/abs/2004.04906 ↗ DOI: 10.18653/v1/2020.emnlp-main.550
    ×1
  308. Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). https://arxiv.org/abs/2004.12832 ↗ DOI: 10.1145/3397271.3401075
    ×2
  309. Khemakhem, I., Kingma, D. P., Monti, R. P., & Hyvärinen, A. (2020). Variational Autoencoders and Nonlinear ICA: A Unifying Framework. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020), PMLR 108. https://arxiv.org/abs/1907.04809 ↗
    ×1
  310. Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., & Legg, S. (2020). Specification gaming: the flip side of AI ingenuity. DeepMind Blog, 21 April 2020. https://deepmind.google/blog/specification-gaming-the-flip-side-of-ai-ingenuity/ ↗
    ×1
  311. Kumar, A., Zhou, A., Tucker, G., & Levine, S. (2020). Conservative Q-Learning for Offline Reinforcement Learning. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2006.04779 ↗
    ×2
  312. Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., & Hutter, M. (2020). Learning Quadrupedal Locomotion over Challenging Terrain. Science Robotics, 5(47), eabc5986. https://arxiv.org/abs/2010.11251 ↗ DOI: 10.1126/scirobotics.abc5986
    ×4
  313. Lepikhin, D., Lee, H., Xu, Y., Chen, D., Firat, O., Huang, Y., Krikun, M., Shazeer, N., & Chen, Z. (2020). GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv:2006.16668. https://arxiv.org/abs/2006.16668 ↗
    ×1
  314. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2005.11401 ↗
    ×1
  315. Chanussot, L., Das, A., Goyal, S., Lavril, T., Shuaibi, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho, C., Hu, W., Palizhati, A., Sriram, A., Wood, B., Yoon, J., Parikh, D., Zitnick, C. L., & Ulissi, Z. (2020). The Open Catalyst 2020 (OC20) Dataset and Community Challenges. ACS Catalysis, 11(10), 6059–6072 (arXiv:2010.09990, 2020). https://arxiv.org/abs/2010.09990 ↗ DOI: 10.1021/acscatal.0c04525
    ×1
  316. Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. European Conference on Computer Vision (ECCV 2020). https://arxiv.org/abs/2003.08934 ↗ DOI: 10.1007/978-3-030-58452-8_24
    ×2
  317. MuZero (2020). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. https://arxiv.org/abs/1911.08265 ↗
    ×1
  318. Nair, A., Gupta, A., Dalal, M., & Levine, S. (2020). AWAC: Accelerating Online Reinforcement Learning with Offline Datasets. arXiv:2006.09359. https://arxiv.org/abs/2006.09359 ↗
    ×1
  319. Nakkiran et al. (2020). Deep Double Descent: Where Bigger Models and More Data Hurt. https://arxiv.org/abs/1912.02292 ↗
    ×2
  320. November (2020).
    ×1
    not_found: details

    'November 2020' is a date reference to the CASP 14 event, not a bibliographic citation; no paper to verify

  321. Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., & Carter, S. (2020). Zoom In: An Introduction to Circuits. Distill, 5(3), e00024.001. https://distill.pub/2020/circuits/zoom-in/ ↗ DOI: 10.23915/distill.00024.001
    ×1
  322. Pfau, D., Spencer, J. S., Matthews, A. G. D. G., & Foulkes, W. M. C. (2020). Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Physical Review Research, 2(3), 033429. https://arxiv.org/abs/1909.02487 ↗ DOI: 10.1103/PhysRevResearch.2.033429
    ×1
  323. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140), 1–67. https://arxiv.org/abs/1910.10683 ↗
    ×1
  324. Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020). ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). https://arxiv.org/abs/1910.02054 ↗ DOI: 10.1109/SC41405.2020.00024
    ×1
  325. Rasp, S., Dueben, P. D., Scher, S., Weyn, J. A., Mouatadid, S., & Thuerey, N. (2020). WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting. Journal of Advances in Modeling Earth Systems, 12(11), e2020MS002203. https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020MS002203 ↗
    ×1
  326. Schrittwieser et al. (2020). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. https://arxiv.org/abs/1911.08265 ↗
    ×2
  327. Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., Žídek, A., Nelson, A. W. R., Bridgland, A., Penedones, H., Petersen, S., Simonyan, K., Crossan, S., Kohli, P., Jones, D. T., Silver, D., Kavukcuoglu, K., & Hassabis, D. (2020). Improved protein structure prediction using potentials from deep learning. Nature, 577(7792), 706–710. https://www.nature.com/articles/s41586-019-1923-7 ↗ DOI: 10.1038/s41586-019-1923-7
    ×2
  328. Shazeer (2020). GLU Variants Improve Transformer. https://arxiv.org/abs/2002.05202 ↗
    ×5
  329. Song, Y., & Ermon, S. (2020). Improved Techniques for Training Score-Based Generative Models. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2006.09011 ↗
    ×2
  330. Steinke and Zakynthinou (2020). Reasoning About Generalization via Conditional Mutual Information. https://arxiv.org/abs/2001.09122 ↗
    ×2
  331. Stiennon et al. (2020). Learning to summarize from human feedback. https://arxiv.org/abs/2009.01325 ↗
    ×4
  332. Tschannen et al. (2020). On Mutual Information Maximization for Representation Learning. https://arxiv.org/abs/1907.13625 ↗
    ×5
  333. Udrescu, S.-M., & Tegmark, M. (2020). AI Feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16), eaay2631. https://arxiv.org/abs/1905.11481 ↗ DOI: 10.1126/sciadv.aay2631
    ×1
  334. Vig, J., Gehrmann, S., Belinkov, Y., Qian, S., Nevo, D., Sakenis, S., Huang, J., Singer, Y., & Shieber, S. (2020). Investigating Gender Bias in Language Models Using Causal Mediation Analysis. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2004.12265 ↗
    ×1
  335. Xiong et al. (2020). On Layer Normalization in the Transformer Architecture. https://arxiv.org/abs/2002.04745 ↗
    ×1
  336. Yang and Wang (2020). Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound. https://arxiv.org/abs/1905.10389 ↗
    ×1
  337. ZeRO (2020). search Scholar ↗
    ×1
  338. Akbari et al. (2021). VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. https://arxiv.org/abs/2104.11178 ↗
    ×1
  339. Anthropic (2021). search Scholar ↗
    ×1
  340. Austin et al. (2021). search Scholar ↗
    ×1
  341. Avsec et al. (2021). search Scholar ↗
    ×1
  342. Bender et al. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. https://dl.acm.org/doi/10.1145/3442188.3445922 ↗
    ×5
  343. Bommasani et al. (2021). On the Opportunities and Risks of Foundation Models. https://arxiv.org/abs/2108.07258 ↗
    ×5
  344. Cao, S. (2021). Choose a Transformer: Fourier or Galerkin. Advances in Neural Information Processing Systems 34 (NeurIPS 2021). https://arxiv.org/abs/2105.14995 ↗
    ×1
  345. Caron et al. (2021). Emerging Properties in Self-Supervised Vision Transformers. https://arxiv.org/abs/2104.14294 ↗
    ×4
  346. Chanussot, L., Das, A., Goyal, S., Lavril, T., Shuaibi, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho, C., Hu, W., Palizhati, A., Sriram, A., Wood, B., Yoon, J., Parikh, D., Zitnick, C. L., & Ulissi, Z. (2021). Open Catalyst 2020 (OC20) Dataset and Community Challenges. ACS Catalysis, 11(10), 6059–6072. https://arxiv.org/abs/2010.09990 ↗ DOI: 10.1021/acscatal.0c04525
    ×1
  347. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., ... Zaremba, W. (2021). Evaluating Large Language Models Trained on Code. arXiv:2107.03374. https://arxiv.org/abs/2107.03374 ↗
    ×2
  348. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning (ICML 2021). https://arxiv.org/abs/2103.00020 ↗
    ×1
  349. Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, Ł., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems. arXiv:2110.14168. https://arxiv.org/abs/2110.14168 ↗
    ×4
  350. Cohen et al. (2021). Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability. https://arxiv.org/abs/2103.00065 ↗
    ×1
  351. Cotra, A. (2021). The case for aligning narrowly superhuman models. AI Alignment Forum (blog post). https://www.alignmentforum.org/posts/PZtsoaoSLpKjjbMqM/the-case-for-aligning-narrowly-superhuman-models ↗
    ×2
  352. Davies et al. (2021). search Scholar ↗
    ×2
  353. Dhariwal and Nichol (2021). search Scholar ↗
    ×1
  354. Du et al. (2021). Bilinear Classes: A Structural Framework for Provable Generalization in RL. https://arxiv.org/abs/2103.10897 ↗
    ×1
  355. Elhage et al. (2021). A Mathematical Framework for Transformer Circuits. https://transformer-circuits.pub/2021/framework/index.html ↗
    ×4
  356. Esser, Rombach, Ommer (2021). search Scholar ↗
    ×2
  357. Geiger et al. (2021). search Scholar ↗
    ×1
  358. Geva, M., Schuster, R., Berant, J., & Levy, O. (2021). Transformer Feed-Forward Layers Are Key-Value Memories. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), 5484–5495. https://arxiv.org/abs/2012.14913 ↗ DOI: 10.18653/v1/2021.emnlp-main.446
    ×1
  359. Goh, G., Cammarata, N., Voss, C., Carter, S., Petrov, M., Schubert, L., Radford, A., & Olah, C. (2021). Multimodal Neurons in Artificial Neural Networks. Distill, 6(3). https://distill.pub/2021/multimodal-neurons/ ↗ DOI: 10.23915/distill.00030
    ×3
  360. Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q. V., Sung, Y., Li, Z., & Duerig, T. (2021). Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. Proceedings of the 38th International Conference on Machine Learning (ICML 2021); arXiv:2102.05918. https://arxiv.org/abs/2102.05918 ↗
    ×2
  361. Gu, Goel, and Ré (2021). Efficiently Modeling Long Sequences with Structured State Spaces. https://arxiv.org/abs/2111.00396 ↗
    ×2
  362. HaoChen et al. (2021). Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss. https://arxiv.org/abs/2106.04156 ↗
    ×2
  363. Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). Measuring Mathematical Problem Solving With the MATH Dataset. Advances in Neural Information Processing Systems 34, Datasets and Benchmarks Track (NeurIPS 2021). https://arxiv.org/abs/2103.03874 ↗
    ×7
  364. Hernandez et al. (2021). Scaling Laws for Transfer. https://arxiv.org/abs/2102.01293 ↗
    ×1
  365. Ho, J., & Salimans, T. (2021). Classifier-Free Diffusion Guidance. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications (later arXiv:2207.12598). https://arxiv.org/abs/2207.12598 ↗ DOI: 10.48550/arXiv.2207.12598
    ×2
  366. Hu et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. https://arxiv.org/abs/2106.09685 ↗
    ×3
  367. HumanEval (2021). search Scholar ↗
    ×1
  368. Izacard, G., Caron, M., Hosseini, L., Riedel, S., Bojanowski, P., Joulin, A., & Grave, E. (2021). Unsupervised Dense Information Retrieval with Contrastive Learning. arXiv:2112.09118. https://arxiv.org/abs/2112.09118 ↗
    ×2
  369. Jia et al. (2021). Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. https://arxiv.org/abs/2102.05918 ↗
    ×2
  370. Jin, Liu, Yang (2021). Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms. https://arxiv.org/abs/2102.00815 ↗
    ×3
  371. Jumper et al. (2021). Highly accurate protein structure prediction with AlphaFold. https://doi.org/10.1038/s41586-021-03819-2 ↗
    ×8
  372. Karimi, A.-H., Schölkopf, B., & Valera, I. (2021). Algorithmic Recourse: from Counterfactual Explanations to Interventions. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). https://dl.acm.org/doi/10.1145/3442188.3445899 ↗
    ×2
  373. Khalifa, M., Elsahar, H., & Dymetman, M. (2021). A Distributional Approach to Controlled Text Generation. International Conference on Learning Representations (ICLR 2021). https://arxiv.org/abs/2012.11635 ↗
    ×1
  374. Lester et al. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. https://arxiv.org/abs/2104.08691 ↗
    ×1
  375. Li and Liang (2021). Prefix-Tuning: Optimizing Continuous Prompts for Generation. https://arxiv.org/abs/2101.00190 ↗
    ×3
  376. Liu, X., Ji, K., Fu, Y., Tam, W. L., Du, Z., Yang, Z., & Tang, J. (2021). P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. arXiv:2110.07602. https://arxiv.org/abs/2110.07602 ↗
    ×1
  377. Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3), 218–229. https://www.nature.com/articles/s42256-021-00302-5 ↗ DOI: 10.1038/s42256-021-00302-5
    ×1
  378. Mathew, M., Karatzas, D., & Jawahar, C. V. (2021). DocVQA: A Dataset for VQA on Document Images. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2021, 2200–2209. https://arxiv.org/abs/2007.00398 ↗ DOI: 10.1109/WACV48630.2021.00225
    ×1
  379. Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. International Conference on Learning Representations (ICLR 2021). https://arxiv.org/abs/2009.03300 ↗
    ×1
  380. Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., Saunders, W., Jiang, X., Cobbe, K., Eloundou, T., Krueger, G., Button, K., Knight, M., Chess, B., & Schulman, J. (2021). WebGPT: Browser-assisted question-answering with human feedback. arXiv:2112.09332. https://arxiv.org/abs/2112.09332 ↗
    ×1
  381. Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., Patwary, M., Korthikanti, V. A., Vainbrand, D., Kashinkunti, P., Bernauer, J., Catanzaro, B., Phanishayee, A., & Zaharia, M. (2021). Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21). https://arxiv.org/abs/2104.04473 ↗ DOI: 10.1145/3458817.3476209
    ×1
  382. Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2021). GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. arXiv:2112.10741. https://arxiv.org/abs/2112.10741 ↗
    ×1
  383. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning (ICML 2021), PMLR 139, 8748–8763. https://arxiv.org/abs/2103.00020 ↗
    ×2
  384. Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., & Dean, J. (2021). Carbon Emissions and Large Neural Network Training. arXiv:2104.10350. https://arxiv.org/abs/2104.10350 ↗
    ×1
  385. Pérez-Ortiz, M., Rivasplata, O., Shawe-Taylor, J., & Szepesvári, C. (2021). Tighter Risk Certificates for Neural Networks. Journal of Machine Learning Research, 22(227), 1–40. https://www.jmlr.org/papers/v22/20-879.html ↗
    ×1
  386. Press, Smith, Lewis (2021). Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. https://arxiv.org/abs/2108.12409 ↗
    ×3
  387. Radford et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. https://arxiv.org/abs/2103.00020 ↗
    ×11
  388. Rajbhandari, S., Ruwase, O., Rasley, J., Smith, S., & He, Y. (2021). ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21). https://arxiv.org/abs/2104.07857 ↗ DOI: 10.1145/3458817.3476205
    ×1
  389. Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-Shot Text-to-Image Generation. Proceedings of the 38th International Conference on Machine Learning (ICML 2021), PMLR 139, 8821–8831. https://arxiv.org/abs/2102.12092 ↗
    ×1
  390. Rosenfeld, E., Ravikumar, P., & Risteski, A. (2021). The Risks of Invariant Risk Minimization. International Conference on Learning Representations (ICLR 2021). https://arxiv.org/abs/2010.05761 ↗
    ×2
  391. Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., & Bengio, Y. (2021). Toward Causal Representation Learning. Proceedings of the IEEE, 109(5), 612–634. https://ieeexplore.ieee.org/document/9363924/ ↗ DOI: 10.1109/JPROC.2021.3058954
    ×2
  392. Song, J., Meng, C., & Ermon, S. (2021). Denoising Diffusion Implicit Models. 9th International Conference on Learning Representations (ICLR 2021). https://arxiv.org/abs/2010.02502 ↗
    ×4
  393. Su et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. https://arxiv.org/abs/2104.09864 ↗
    ×5
  394. Tian, Y., Chen, X., & Ganguli, S. (2021). Understanding self-supervised Learning Dynamics without Contrastive Pairs. Proceedings of the 38th International Conference on Machine Learning (ICML 2021), PMLR 139, 10268–10278. https://arxiv.org/abs/2102.06810 ↗
    ×1
  395. Wang et al. (2021). search Scholar ↗
    ×1
  396. Wei et al. (2021). Finetuned Language Models Are Zero-Shot Learners. https://arxiv.org/abs/2109.01652 ↗
    ×1
  397. Yang and Hu (2021). Feature Learning in Infinite-Width Neural Networks. https://arxiv.org/abs/2011.14522 ↗
    ×2
  398. Zaken et al. (2021). search Scholar ↗
    ×1
  399. Zimmermann et al. (2021). search Scholar ↗
    ×1
  400. Akyürek et al. (2022). What learning algorithm is in-context learning? Investigations with linear models. https://arxiv.org/abs/2211.15661 ↗
    ×1
  401. Alayrac et al. (2022). Flamingo: a Visual Language Model for Few-Shot Learning. https://arxiv.org/abs/2204.14198 ↗
    ×3
  402. Anthropic (2022). search Scholar ↗
    ×12
  403. April (2022). search Scholar ↗
    ×1
  404. Bai et al. (2022). Constitutional AI: Harmlessness from AI Feedback. https://arxiv.org/abs/2212.08073 ↗
    ×5
  405. Batatia, I., Kovács, D. P., Simm, G. N. C., Ortner, C., & Csányi, G. (2022). MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). https://arxiv.org/abs/2206.07697 ↗
    ×2
  406. Batzner, S., Musaelian, A., Sun, L., Geiger, M., Mailoa, J. P., Kornbluth, M., Molinari, N., Smidt, T. E., & Kozinsky, B. (2022). E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Communications, 13(1), 2453. https://www.nature.com/articles/s41467-022-29939-5 ↗ DOI: 10.1038/s41467-022-29939-5
    ×2
  407. Betker, J. (2022). Better speech synthesis through scaling. arXiv:2305.07243. https://arxiv.org/abs/2305.07243 ↗
    ×1
  408. Burns, C., Ye, H., Klein, D., & Steinhardt, J. (2022). Discovering Latent Knowledge in Language Models Without Supervision. arXiv:2212.03827. https://arxiv.org/abs/2212.03827 ↗
    ×6
  409. Chen and Ong (2022). Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. https://arxiv.org/abs/2211.12588 ↗
    ×3
  410. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., ... Sifre, L. (2022). Training Compute-Optimal Large Language Models. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). https://arxiv.org/abs/2203.15556 ↗
    ×2
  411. Clark et al. (2022). Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation. https://aclanthology.org/2022.tacl-1.5/ ↗
    ×1
  412. Dai et al. (2022). search Scholar ↗
    ×1
  413. Dao et al. (2022). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. https://arxiv.org/abs/2205.14135 ↗
    ×4
  414. Dauparas et al. (2022). search Scholar ↗
    ×1
  415. DeepMind (2022). search Scholar ↗
    ×1
  416. Dettmers et al. (2022). search Scholar ↗
    ×1
  417. Elhage et al. (2022). search Scholar ↗
    ×2
  418. Fedus et al. (2022). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. https://arxiv.org/abs/2101.03961 ↗
    ×3
  419. Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., Ring, R., Rutherford, E., Cabi, S., Han, T., Gong, Z., Samangooei, S., Monteiro, M., Menick, J., Borgeaud, S., ... Simonyan, K. (2022). Flamingo: a Visual Language Model for Few-Shot Learning. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). https://arxiv.org/abs/2204.14198 ↗
    ×1
  420. Frantar, E., Ashkboos, S., Hoefler, T., & Alistarh, D. (2022). GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arXiv:2210.17323. https://arxiv.org/abs/2210.17323 ↗
    ×2
  421. Frohberg, J., & Binder, F. (2022). CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models. Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC 2022), 2126–2140. https://aclanthology.org/2022.lrec-1.229/ ↗
    ×1
  422. Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., Jones, A., Bowman, S., Chen, A., Conerly, T., DasSarma, N., Drain, D., Elhage, N., El-Showk, S., Fort, S., ... Clark, J. (2022). Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. arXiv:2209.07858. https://arxiv.org/abs/2209.07858 ↗
    ×1
  423. Garg et al. (2022). What Can Transformers Learn In-Context? A Case Study of Simple Function Classes. https://arxiv.org/abs/2208.01066 ↗
    ×1
  424. Geiger and Smidt (2022). search Scholar ↗
    ×2
  425. Google (2022). search Scholar ↗
    ×3
  426. Hartvigsen et al. (2022). search Scholar ↗
    ×2
  427. He et al. (2022). Masked Autoencoders Are Scalable Vision Learners. https://arxiv.org/abs/2111.06377 ↗
    ×3
  428. HELM (2022). search Scholar ↗
    ×1
  429. Hoffmann et al. (2022). Training Compute-Optimal Large Language Models. https://arxiv.org/abs/2203.15556 ↗
    ×12
  430. Hoogeboom et al. (2022). search Scholar ↗
    ×1
  431. Hopper (2022). search Scholar ↗
    ×1
  432. Ilharco, G., Wortsman, M., Wightman, R., Gordon, C., Carlini, N., Taori, R., Dave, A., Shankar, V., Namkoong, H., Miller, J., Hajishirzi, H., Farhadi, A., & Schmidt, L. (2022). OpenCLIP. Zenodo (Software). https://zenodo.org/records/6496083 ↗ DOI: 10.5281/zenodo.5143773
    ×1
  433. Irwin, R., Dimitriadis, S., He, J., & Bjerrum, E. J. (2022). Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology, 3(1), 015022. https://iopscience.iop.org/article/10.1088/2632-2153/ac3ffb ↗
    ×1
  434. Kojima et al. (2022). Large Language Models are Zero-Shot Reasoners. https://arxiv.org/abs/2205.11916 ↗
    ×4
  435. Kostrikov, I., Nair, A., & Levine, S. (2022). Offline Reinforcement Learning with Implicit Q-Learning. International Conference on Learning Representations (ICLR 2022). https://arxiv.org/abs/2110.06169 ↗
    ×2
  436. Wu, Y., Chen, K., Zhang, T., Hui, Y., Nezhurina, M., Berg-Kirkpatrick, T., & Dubnov, S. (2022). Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation. arXiv:2211.06687. https://arxiv.org/abs/2211.06687 ↗
    ×1
  437. LeCun (2022). A Path Towards Autonomous Machine Intelligence. https://openreview.net/forum?id=BZ5a1r-kVsf ↗
    ×2
  438. Lee et al. (2022). Deduplicating Training Data Makes Language Models Better. https://arxiv.org/abs/2107.06499 ↗
    ×4
  439. Leviathan et al. (2022). search Scholar ↗
    ×1
  440. Li, Li, Xiong, Hoi, Salesforce (2022). BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. https://arxiv.org/abs/2201.12086 ↗
    ×3
  441. Liang et al. (2022). Holistic Evaluation of Language Models. https://arxiv.org/abs/2211.09110 ↗
    ×3
  442. Lin, Hilton, Evans (2022). search Scholar ↗
    ×4
  443. Lipman, Chen, Ben-Hamu, Nickel, Le (2022). search Scholar ↗
    ×2
  444. Lippe et al. (2022). search Scholar ↗
    ×1
  445. Liu, Gong, Liu (2022). A ConvNet for the 2020s. https://arxiv.org/abs/2201.03545 ↗
    ×7
  446. Lotfi et al. (2022). PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization. https://arxiv.org/abs/2211.13609 ↗
    ×2
  447. Lu et al. (2022). Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. https://arxiv.org/abs/2104.08786 ↗
    ×3
  448. Magar and Schwartz (2022). search Scholar ↗
    ×1
  449. Masry et al. (2022). search Scholar ↗
    ×1
  450. Mees, O., Hermann, L., Rosete-Beas, E., & Burgard, W. (2022). CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks. IEEE Robotics and Automation Letters, 7(3), 7327–7334. https://arxiv.org/abs/2112.03227 ↗ DOI: 10.1109/LRA.2022.3180108
    ×1
  451. Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and Editing Factual Associations in GPT. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). https://arxiv.org/abs/2202.05262 ↗
    ×4
  452. Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P. S., Sridhar, A., Wang, T., & Zettlemoyer, L. (2022). OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068. https://arxiv.org/abs/2205.01068 ↗ DOI: 10.48550/arXiv.2205.01068
    ×1
  453. Min et al. (2022). Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. https://arxiv.org/abs/2202.12837 ↗
    ×2
  454. Margolis, G. B., & Agrawal, P. (2022). Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior. Proceedings of the 6th Conference on Robot Learning (CoRL 2022). https://arxiv.org/abs/2212.03238 ↗
    ×2
  455. Nov (2022). search Scholar ↗
    ×1
  456. November (2022). search Scholar ↗
    ×2
  457. NVIDIA (2022). search Scholar ↗
    ×1
  458. October (2022). search Scholar ↗
    ×2
  459. Olsson et al. (2022). In-context Learning and Induction Heads. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html ↗
    ×6
  460. OpenAI (2022). search Scholar ↗
    ×2
  461. Ouyang et al. (2022). Training language models to follow instructions with human feedback. https://arxiv.org/abs/2203.02155 ↗
    ×8
  462. Pan, Bhatia, Steinhardt (2022). search Scholar ↗
    ×1
  463. Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., Hassanzadeh, P., Kashinath, K., & Anandkumar, A. (2022). FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators. arXiv:2202.11214. https://arxiv.org/abs/2202.11214 ↗
    ×2
  464. Patterson, D., Gonzalez, J., Hölzle, U., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., & Dean, J. (2022). The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer, 55(7), 18–28. https://arxiv.org/abs/2204.05149 ↗ DOI: 10.1109/MC.2022.3148714
    ×1
  465. Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Red Teaming Language Models with Language Models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), 3419–3448. https://arxiv.org/abs/2202.03286 ↗ DOI: 10.18653/v1/2022.emnlp-main.225
    ×5
  466. Poole, B., Jain, A., Barron, J. T., & Mildenhall, B. (2022). DreamFusion: Text-to-3D using 2D Diffusion. arXiv:2209.14988. https://arxiv.org/abs/2209.14988 ↗
    ×1
  467. Radford-Kim et al. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. https://arxiv.org/abs/2212.04356 ↗
    ×4
  468. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125. https://arxiv.org/abs/2204.06125 ↗ DOI: 10.48550/arXiv.2204.06125
    ×1
  469. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 (later published at ICLR 2023). https://arxiv.org/abs/2210.03629 ↗
    ×1
  470. Reed et al. (2022). A Generalist Agent. https://arxiv.org/abs/2205.06175 ↗
    ×2
  471. Rombach, Blattmann, Lorenz, Esser, Ommer (2022). search Scholar ↗
    ×3
  472. Ross et al. (2022). search Scholar ↗
    ×1
  473. Saharia et al. (2022). search Scholar ↗
    ×1
  474. Salesforce (2022). search Scholar ↗
    ×1
  475. Saunshi et al. (2022). Understanding Contrastive Learning Requires Incorporating Inductive Biases. https://arxiv.org/abs/2202.14037 ↗
    ×2
  476. Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M., Abid, A., Fisch, A., Brown, A. R., Santoro, A., Gupta, A., Garriga-Alonso, A., Kluska, A., Lewkowycz, A., Agarwal, A., Power, A., Ray, A., Warstadt, A., Kocurek, A. W., Safaya, A., Tazarv, A., ... Wu, Z. (2022). Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. arXiv:2206.04615. https://arxiv.org/abs/2206.04615 ↗ DOI: 10.48550/arXiv.2206.04615
    ×2
  477. Subramani, N., Suresh, N., & Peters, M. (2022). Extracting Latent Steering Vectors from Pretrained Language Models. Findings of the Association for Computational Linguistics: ACL 2022. https://aclanthology.org/2022.findings-acl.48/ ↗ DOI: 10.18653/v1/2022.findings-acl.48
    ×3
  478. Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., Chowdhery, A., Le, Q. V., Chi, E. H., Zhou, D., & Wei, J. (2022). Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them. arXiv:2210.09261. https://arxiv.org/abs/2210.09261 ↗ DOI: 10.48550/arXiv.2210.09261
    ×1
  479. Uesato, J., Kushman, N., Kumar, R., Song, F., Siegel, N., Wang, L., Creswell, A., Irving, G., & Higgins, I. (2022). Solving math word problems with process- and outcome-based feedback. arXiv:2211.14275. https://arxiv.org/abs/2211.14275 ↗
    ×1
  480. Wang, Variengien, Conmy, Shlegeris, Steinhardt (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. https://arxiv.org/abs/2203.11171 ↗
    ×14
  481. Wei et al. (2022). Emergent Abilities of Large Language Models. https://arxiv.org/abs/2206.07682 ↗
    ×10
  482. Wu, P., Escontrela, A., Hafner, D., Goldberg, K., & Abbeel, P. (2022). DayDreamer: World Models for Physical Robot Learning. arXiv:2206.14176 (also Proceedings of the 6th Conference on Robot Learning, PMLR 205, 2023). https://arxiv.org/abs/2206.14176 ↗
    ×1
  483. Xiao et al. (2022). search Scholar ↗
    ×1
  484. Xie et al. (2022). An Explanation of In-context Learning as Implicit Bayesian Inference. https://arxiv.org/abs/2111.02080 ↗
    ×1
  485. Xue et al. (2022). ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models. https://arxiv.org/abs/2105.13626 ↗
    ×1
  486. Yao, Zhao, Yu, Du, Shafran, Narasimhan, Cao (2022). search Scholar ↗
    ×4
  487. Yu et al. (2022). search Scholar ↗
    ×1
  488. Zheng, Han, Polu (2022). search Scholar ↗
    ×1
  489. Zhou et al. (2022). Mixture-of-Experts with Expert Choice Routing. https://arxiv.org/abs/2202.09368 ↗
    ×3
  490. Ainslie et al. (2023). GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. https://arxiv.org/abs/2305.13245 ↗
    ×4
  491. Anthropic, October (2023). search Scholar ↗
    ×3
  492. Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), Datasets and Benchmarks Track. https://arxiv.org/abs/2306.05685 ↗
    ×1
  493. Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv:2310.11511 (later published at ICLR 2024). https://arxiv.org/abs/2310.11511 ↗
    ×1
  494. Assran et al. (2023). Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. https://arxiv.org/abs/2301.08243 ↗
    ×2
  495. Azar et al. (2023). A General Theoretical Paradigm to Understand Learning from Human Preferences. https://arxiv.org/abs/2310.12036 ↗
    ×1
  496. Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., Lin, J., Zhou, C., & Zhou, J. (2023). Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv:2308.12966. https://arxiv.org/abs/2308.12966 ↗
    ×1
  497. Szymanski, N. J., Rendy, B., Fei, Y., Kumar, R. E., He, T., Milsted, D., McDermott, M. J., Gallant, M., Cubuk, E. D., Merchant, A., Kim, H., Jain, A., Bartel, C. J., Persson, K., Zeng, Y., & Ceder, G. (2023). An autonomous laboratory for the accelerated synthesis of novel materials. Nature, 624(7990), 86–91. https://www.nature.com/articles/s41586-023-06734-w ↗ DOI: 10.1038/s41586-023-06734-w
    ×1
  498. Bi et al. (2023). search Scholar ↗
    ×2
  499. Biderman et al. (2023). Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. https://arxiv.org/abs/2304.01373 ↗
    ×1
  500. Bills et al. (2023). search Scholar ↗
    ×1
  501. Boiko et al. (2023). search Scholar ↗
    ×2
  502. Bran et al. (2023). search Scholar ↗
    ×2
  503. Bricken et al. (2023). search Scholar ↗
    ×1
  504. Brohan et al. (2023). search Scholar ↗
    ×1
  505. Brooks, T., Holynski, A., & Efros, A. A. (2023). InstructPix2Pix: Learning to Follow Image Editing Instructions. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). https://arxiv.org/abs/2211.09800 ↗ DOI: 10.1109/CVPR52729.2023.01764
    ×1
  506. Burns, C., Izmailov, P., Kirchner, J. H., Baker, B., Gao, L., Aschenbrenner, L., Chen, Y., Ecoffet, A., Joglekar, M., Leike, J., Sutskever, I., & Wu, J. (2023). Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. arXiv:2312.09390. https://arxiv.org/abs/2312.09390 ↗
    ×1
  507. Carlini et al. (2023). Extracting Training Data from Diffusion Models. https://arxiv.org/abs/2301.13188 ↗
    ×4
  508. Casper, S., Davies, X., Shi, C., Gilbert, T. K., Scheurer, J., Rando, J., Freedman, R., Korbak, T., Lindner, D., Freire, P., Wang, T. T., Marks, S., Ségerie, C., Carroll, M., Peng, A., Christoffersen, P. J. K., Damani, M., Slocum, S., Anwar, U., . . . Hadfield-Menell, D. (2023). Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Transactions on Machine Learning Research (TMLR), 2023; arXiv:2307.15217. https://arxiv.org/abs/2307.15217 ↗
    ×1
  509. Chen et al. (2023). Symbolic Discovery of Optimization Algorithms. https://arxiv.org/abs/2302.06675 ↗
    ×3
  510. Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., Pritzel, A., Wong, L. H., Zielinski, M., Sargeant, T., Schneider, R. G., Senior, A. W., Jumper, J., Hassabis, D., Kohli, P., & Avsec, Ž. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science, 381(6664), eadg7492. https://www.science.org/doi/10.1126/science.adg7492 ↗
    ×2
  511. Chi, C., Feng, S., Du, Y., Xu, Z., Cousineau, E., Burchfiel, B., & Song, S. (2023). Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. Proceedings of Robotics: Science and Systems (RSS 2023). https://arxiv.org/abs/2303.04137 ↗ DOI: 10.15607/RSS.2023.XIX.026
    ×2
  512. Open X-Embodiment Collaboration, O'Neill, A., Rehman, A., Gupta, A., Maddukuri, A., Gupta, A., Padalkar, A., Lee, A., Pooley, A., Gupta, A., Mandlekar, A., Jain, A., Tung, A., Bewley, A., Herzog, A., Irpan, A., Khazatsky, A., Rai, A., Gupta, A., ... Lin, Z. (2023). Open X-Embodiment: Robotic Learning Datasets and RT-X Models. 2024 IEEE International Conference on Robotics and Automation (ICRA 2024); arXiv:2310.08864. https://arxiv.org/abs/2310.08864 ↗ DOI: 10.48550/arXiv.2310.08864
    ×2
  513. Coqui (2023). XTTS-v2 (model release). Hugging Face model repository (coqui/XTTS-v2), released November 2023. https://huggingface.co/coqui/XTTS-v2 ↗
    ×1
  514. Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl. arXiv:2305.01582. https://arxiv.org/abs/2305.01582 ↗
    ×1
  515. Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Duan, N., & Wang, B. (2023). scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI. bioRxiv 2023.04.30.538439 (later published in Nature Methods, 21(8), 1470–1480, 2024). https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2 ↗
    ×1
  516. Cunningham, H., Ewart, A., Riggs, L., Huben, R., & Sharkey, L. (2023). Sparse Autoencoders Find Highly Interpretable Features in Language Models. arXiv:2309.08600. https://arxiv.org/abs/2309.08600 ↗
    ×2
  517. Dao (2023). FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. https://arxiv.org/abs/2307.08691 ↗
    ×4
  518. Gemini Team, Google (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805. https://arxiv.org/abs/2312.11805 ↗
    ×2
  519. Gemini Team, Google (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805. https://arxiv.org/abs/2312.11805 ↗
    ×2
  520. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. Advances in Neural Information Processing Systems 36 (NeurIPS 2023). https://arxiv.org/abs/2305.14314 ↗
    ×2
  521. Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv:2309.11495. https://arxiv.org/abs/2309.11495 ↗
    ×2
  522. Driess, D., Xia, F., Sajjadi, M. S. M., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., Yu, T., Huang, W., Chebotar, Y., Sermanet, P., Duckworth, D., Levine, S., Vanhoucke, V., Hausman, K., Toussaint, M., Greff, K., ... Florence, P. (2023). PaLM-E: An Embodied Multimodal Language Model. Proceedings of the 40th International Conference on Machine Learning (ICML 2023), PMLR 202. https://arxiv.org/abs/2303.03378 ↗
    ×1
  523. Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2023). Improving Factuality and Reasoning in Language Models through Multiagent Debate. arXiv:2305.14325. https://arxiv.org/abs/2305.14325 ↗
    ×2
  524. Frei et al. (2023). Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization. https://arxiv.org/abs/2303.01462 ↗
    ×1
  525. Gale, T., Narayanan, D., Young, C., & Zaharia, M. (2023). MegaBlocks: Efficient Sparse Training with Mixture-of-Experts. Proceedings of Machine Learning and Systems 5 (MLSys 2023). https://arxiv.org/abs/2211.15841 ↗
    ×2
  526. Gao, L., Schulman, J., & Hilton, J. (2023). Scaling Laws for Reward Model Overoptimization. Proceedings of the 40th International Conference on Machine Learning (ICML 2023), PMLR 202, 10835–10866. https://arxiv.org/abs/2210.10760 ↗
    ×2
  527. Goldowsky-Dill, N., MacLeod, C., Sato, L., & Arora, A. (2023). Localizing Model Behavior with Path Patching. arXiv:2304.05969. https://arxiv.org/abs/2304.05969 ↗
    ×1
  528. Goldstein, J. A., Sastry, G., Musser, M., DiResta, R., Gentzel, M., & Sedova, K. (2023). Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations. arXiv:2301.04246. https://arxiv.org/abs/2301.04246 ↗
    ×2
  529. Agostinelli, A., Denk, T. I., Borsos, Z., Engel, J., Verzetti, M., Caillon, A., Huang, Q., Jansen, A., Roberts, A., Tagliasacchi, M., Sharifi, M., Zeghidour, N., & Frank, C. (2023). MusicLM: Generating Music From Text. arXiv:2301.11325. https://arxiv.org/abs/2301.11325 ↗
    ×3
  530. Gould et al. (2023). search Scholar ↗
    ×1
  531. Gu and Dao (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. https://arxiv.org/abs/2312.00752 ↗
    ×4
  532. Hafner, Pasukonis, Ba, Lillicrap (2023). search Scholar ↗
    ×2
  533. Hendel, Geva, Globerson (2023). In-Context Learning Creates Task Vectors. https://arxiv.org/abs/2310.15916 ↗
    ×2
  534. Hilton et al. (2023). Scaling laws for single-agent reinforcement learning. https://arxiv.org/abs/2301.13442 ↗
    ×1
  535. Huang and Chang (2023). search Scholar ↗
    ×1
  536. Ingraham et al. (2023). search Scholar ↗
    ×2
  537. Jin et al. (2023). search Scholar ↗
    ×1
  538. July (2023). search Scholar ↗
    ×1
  539. Kerbl et al. (2023). search Scholar ↗
    ×2
  540. Kiela (2023). search Scholar ↗
    ×1
  541. Kirillov et al. (2023). Segment Anything. https://arxiv.org/abs/2304.02643 ↗
    ×3
  542. Kwon et al. (2023). Efficient Memory Management for Large Language Model Serving with PagedAttention. https://arxiv.org/abs/2309.06180 ↗
    ×5
  543. Lam et al. (2023). search Scholar ↗
    ×2
  544. Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Denison, C., Hernandez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., Lukošiūtė, K., Nguyen, K., Cheng, N., Joseph, N., Schiefer, N., Rausch, O., Larson, R., McCandlish, S., Kundu, S., ... Perez, E. (2023). Measuring Faithfulness in Chain-of-Thought Reasoning. arXiv:2307.13702. https://arxiv.org/abs/2307.13702 ↗
    ×4
  545. Leviathan, Kalman, Matias (2023). Fast Inference from Transformers via Speculative Decoding. https://arxiv.org/abs/2211.17192 ↗
    ×3
  546. Li, J., Li, D., Savarese, S., & Hoi, S. (2023). BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. Proceedings of the 40th International Conference on Machine Learning (ICML 2023). https://arxiv.org/abs/2301.12597 ↗
    ×6
  547. Liao, Y.-L., Wood, B., Das, A., & Smidt, T. (2023). EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations. arXiv:2306.12059. https://arxiv.org/abs/2306.12059 ↗
    ×1
  548. Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., & Cobbe, K. (2023). Let's Verify Step by Step. arXiv:2305.20050. https://arxiv.org/abs/2305.20050 ↗
    ×5
  549. Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., & Han, S. (2023). AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. Proceedings of Machine Learning and Systems 6 (MLSys 2024). https://arxiv.org/abs/2306.00978 ↗
    ×2
  550. Liu, Li, Wu, Lee (2023). Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training. https://arxiv.org/abs/2305.14342 ↗
    ×7
  551. Madaan et al. (2023). search Scholar ↗
    ×3
  552. Mamba, Gu and Dao (2023). search Scholar ↗
    ×1
  553. Mangalam et al. (2023). search Scholar ↗
    ×2
  554. March (2023). search Scholar ↗
    ×2
  555. Marks and Tegmark (2023). search Scholar ↗
    ×1
  556. Meng, K., Sharma, A. S., Andonian, A., Belinkov, Y., & Bau, D. (2023). Mass-Editing Memory in a Transformer. International Conference on Learning Representations (ICLR 2023). https://arxiv.org/abs/2210.07229 ↗
    ×2
  557. Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G., & Cubuk, E. D. (2023). Scaling deep learning for materials discovery. Nature, 624(7990), 80–85. https://www.nature.com/articles/s41586-023-06735-9 ↗ DOI: 10.1038/s41586-023-06735-9
    ×4
  558. Copet, J., Kreuk, F., Gat, I., Remez, T., Kant, D., Synnaeve, G., Adi, Y., & Défossez, A. (2023). Simple and Controllable Music Generation. Advances in Neural Information Processing Systems 36 (NeurIPS 2023). https://arxiv.org/abs/2306.05284 ↗
    ×6
  559. Kinniment, M., Sato, L. J. K., Du, H., Goodrich, B., Hasin, M., Chan, L., Miles, L. H., Lin, T. R., Wijk, H., Burget, J., Ho, A., Barnes, E., & Christiano, P. (2023). Evaluating Language-Model Agents on Realistic Autonomous Tasks. arXiv:2312.11671. https://arxiv.org/abs/2312.11671 ↗
    ×4
  560. Mialon et al. (2023). search Scholar ↗
    ×2
  561. Michael-Mahdi-Rein et al. (2023). search Scholar ↗
    ×1
  562. Microsoft, August (2023). search Scholar ↗
    ×5
  563. Musaelian et al. (2023). search Scholar ↗
    ×1
  564. Nov (2023). search Scholar ↗
    ×1
  565. November (2023). search Scholar ↗
    ×3
  566. Biden, J. R. (2023). Executive Order 14110: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Federal Register, 88(210), 75191–75226 (Executive Order No. 14110, October 30, 2023). https://en.wikipedia.org/wiki/Executive_Order_14110 ↗
    ×1
  567. Biden, J. R. (2023). Executive Order 14110: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Federal Register, 88(210), 75191–75226 (Executive Order 14110, signed October 30, 2023). https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence ↗
    ×2
  568. OpenAI, September (2023). GPT-4 Technical Report. https://arxiv.org/abs/2303.08774 ↗
    ×7
  569. Padmakumar, V., & He, H. (2023). Does Writing with Language Models Reduce Content Diversity?. arXiv:2309.05196 (later published at ICLR 2024). https://arxiv.org/abs/2309.05196 ↗
    ×1
  570. Park, Lan, Tran, Park (2023).
    ×2
    not_found: details

    No paper by Park, Lan, Tran, & Park (2023) on systematic documentation of citation hallucination could be located via web search. Known 2023 studies on this topic (e.g., Walters & Wilder 2023; Bhattacharyya et al. 2023; MacDonald 2023) have different author lists. The cited combination appears to be itself a hallucinated citation.

  571. Peebles and Xie (2023). search Scholar ↗
    ×2
  572. Peng et al. (2023). YaRN: Efficient Context Window Extension of Large Language Models. https://arxiv.org/abs/2309.00071 ↗
    ×5
  573. Pope et al. (2023). search Scholar ↗
    ×2
  574. Rafailov, Sharma, Mitchell, Manning, Ermon, Finn (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. https://arxiv.org/abs/2305.18290 ↗
    ×6
  575. Rein et al. (2023). search Scholar ↗
    ×4
  576. Romera-Paredes et al. (2023). search Scholar ↗
    ×4
  577. Roohani et al. (2023). search Scholar ↗
    ×1
  578. Rosen et al. (2023). search Scholar ↗
    ×1
  579. Sainz, O., Campos, J. A., García-Ferrero, I., Etxaniz, J., Lopez de Lacalle, O., & Agirre, E. (2023). NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark. Findings of the Association for Computational Linguistics: EMNLP 2023. https://arxiv.org/abs/2310.18018 ↗ DOI: 10.18653/v1/2023.findings-emnlp.722
    ×3
  580. Li, J., Li, D., Savarese, S., & Hoi, S. (2023). BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. Proceedings of the 40th International Conference on Machine Learning (ICML 2023). https://arxiv.org/abs/2301.12597 ↗ DOI: 10.5555/3618408.3619222
    ×1
  581. Schaeffer et al. (2023). Are Emergent Abilities of Large Language Models a Mirage?. https://arxiv.org/abs/2304.15004 ↗
    ×9
  582. Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. Advances in Neural Information Processing Systems 36 (NeurIPS 2023). https://arxiv.org/abs/2302.04761 ↗
    ×3
  583. OpenAI (2023). GPT-4V(ision) System Card. OpenAI Technical Report (September 25, 2023). https://cdn.openai.com/papers/GPTV_System_Card.pdf ↗
    ×1
  584. Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2023). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. https://arxiv.org/abs/2310.13548 ↗
    ×9
  585. Shinn et al. (2023). search Scholar ↗
    ×3
  586. Sun et al. (2023). search Scholar ↗
    ×1
  587. Syed et al. (2023). search Scholar ↗
    ×1
  588. Theodoris et al. (2023). search Scholar ↗
    ×1
  589. Tran et al. (2023). search Scholar ↗
    ×1
  590. Tsigler and Bartlett (2023). Benign overfitting in ridge regression. https://jmlr.org/papers/v24/22-1398.html ↗
    ×1
  591. Turner, A. M., Thiergart, L., Leech, G., Udell, D., Vazquez, J. J., Mini, U., & MacDiarmid, M. (2023). Activation Addition: Steering Language Models Without Optimization. arXiv:2308.10248. https://arxiv.org/abs/2308.10248 ↗
    ×3
  592. von Oswald et al. (2023). Transformers Learn In-Context by Gradient Descent. https://arxiv.org/abs/2212.07677 ↗
    ×1
  593. Wang, S., Saharia, C., Montgomery, C., Pont-Tuset, J., Noy, S., Pellegrini, S., Onoe, Y., Laszlo, S., Fleet, D. J., Soricut, R., Baldridge, J., Norouzi, M., Anderson, P., & Chan, W. (2023). Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), 18359–18369. https://arxiv.org/abs/2212.06909 ↗ DOI: 10.48550/arXiv.2212.06909
    ×3
  594. Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisenach, H. E., Ahern, W., Borst, A. J., Ragotte, R. J., Milles, L. F., Wicky, B. I. M., Hanikel, N., Pellock, S. J., Courbet, A., Sheffler, W., Wang, J., Venkatesh, P., Sappington, I., Vázquez Torres, S., ... Baker, D. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976), 1089–1100. https://www.nature.com/articles/s41586-023-06415-8 ↗ DOI: 10.1038/s41586-023-06415-8
    ×4
  595. Wei, A., Haghtalab, N., & Steinhardt, J. (2023). Jailbroken: How Does LLM Safety Training Fail?. Advances in Neural Information Processing Systems 36 (NeurIPS 2023). https://arxiv.org/abs/2307.02483 ↗
    ×5
  596. Willard & Louf (2023). Efficient Guided Generation for Large Language Models. https://arxiv.org/abs/2307.09702 ↗
    ×1
  597. Wu, X., Sun, K., Zhu, F., Zhao, R., & Li, H. (2023). Human Preference Score: Better Aligning Text-to-Image Models with Human Preference. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023). https://arxiv.org/abs/2303.14420 ↗
    ×1
  598. Yao et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. https://arxiv.org/abs/2305.10601 ↗
    ×7
  599. Yu et al. (2023). MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers. https://arxiv.org/abs/2305.07185 ↗
    ×3
  600. Zhai et al. (2023). Sigmoid Loss for Language Image Pre-Training. https://arxiv.org/abs/2303.15343 ↗
    ×2
  601. Zhang, L., Rao, A., & Agrawala, M. (2023). Adding Conditional Control to Text-to-Image Diffusion Models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023). https://arxiv.org/abs/2302.05543 ↗ DOI: 10.1109/ICCV51070.2023.00355
    ×4
  602. Zhao, Y., Gu, A., Varma, R., Luo, L., Huang, C.-C., Xu, M., Wright, L., Shojanazeri, H., Ott, M., Shleifer, S., Desmaison, A., Balioglu, C., Damania, P., Nguyen, B., Chauhan, G., Hao, Y., Mathews, A., & Li, S. (2023). PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Proceedings of the VLDB Endowment, 16(12), 3848–3860. https://arxiv.org/abs/2304.11277 ↗ DOI: 10.14778/3611540.3611569
    ×1
  603. Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Datasets and Benchmarks Track. https://arxiv.org/abs/2306.05685 ↗
    ×5
  604. Zhong et al. (2023). search Scholar ↗
    ×1
  605. Zhou et al. (2023). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. https://arxiv.org/abs/2205.10625 ↗
    ×4
  606. Zhu et al. (2023). search Scholar ↗
    ×1
  607. Zou, A., Phan, L., Chen, S., Campbell, J., Guo, P., Ren, R., Pan, A., Yin, X., Mazeika, M., Dombrowski, A.-K., Goel, S., Li, N., Byun, M. J., Wang, Z., Mallen, A., Basart, S., Koyejo, S., Song, D., Fredrikson, M., … Hendrycks, D. (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405. https://arxiv.org/abs/2310.01405 ↗
    ×7
  608. Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., Ronneberger, O., Willmore, L., Ballard, A. J., Bambrick, J., Bodenstein, S. W., Evans, D. A., Hung, C.-C., O'Neill, M., Reiman, D., Tunyasuvunakool, K., Wu, Z., Žemgulytė, A., Arvaniti, E., ... Jumper, J. M. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630(8016), 493–500. https://www.nature.com/articles/s41586-024-07487-w ↗ DOI: 10.1038/s41586-024-07487-w
    ×3
  609. Adobe (2024). Adobe Launches Firefly Video Model and Enhances Image, Vector and Design Models. Adobe Newsroom press release, October 14, 2024. https://news.adobe.com/news/2024/10/101424-adobe-launches-firefly-video-model ↗
    ×1
  610. Chameleon Team (2024). Chameleon: Mixed-Modal Early-Fusion Foundation Models. arXiv:2405.09818. https://arxiv.org/abs/2405.09818 ↗
    ×1
  611. AIME (2024). search Scholar ↗
    ×1
  612. Anthropic, March (2024). search Scholar ↗
    ×9
  613. Apple (2024). search Scholar ↗
    ×2
  614. Arditi, Obeso, Syed, Paleka, Panickssery, Gurnee, Nanda (2024). search Scholar ↗
    ×8
  615. Bardes et al. (2024). search Scholar ↗
    ×1
  616. Batatia et al. (2024). search Scholar ↗
    ×1
  617. Black, K., Brown, N., Driess, D., Esmail, A., Equi, M., Finn, C., Fusai, N., Groom, L., Hausman, K., Ichter, B., Jakubczak, S., Jones, T., Ke, L., Levine, S., Li-Bell, A., Mothukuri, M., Nair, S., Pertsch, K., Shi, L. X., ... Zhilinsky, U. (2024). π₀: A Vision-Language-Action Flow Model for General Robot Control. arXiv:2410.24164. https://arxiv.org/abs/2410.24164 ↗
    ×1
  618. NVIDIA Corporation (2024). NVIDIA Blackwell Architecture Technical Overview. NVIDIA Technical Brief / Whitepaper (2024). https://resources.nvidia.com/en-us-blackwell-architecture ↗
    ×2
  619. Brown, B., Juravsky, J., Ehrlich, R., Clark, R., Le, Q. V., Ré, C., & Mirhoseini, A. (2024). Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. arXiv:2407.21787. https://arxiv.org/abs/2407.21787 ↗
    ×1
  620. Bussmann, B., Leask, P., & Nanda, N. (2024). Matryoshka Sparse Autoencoders. AI Alignment Forum (blog post, December 19, 2024). https://www.alignmentforum.org/posts/zbebxYCqsryPALh8C/matryoshka-sparse-autoencoders ↗
    ×1
  621. Cai et al. (2024). Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads. https://arxiv.org/abs/2401.10774 ↗
    ×1
  622. Chen et al. (2024). search Scholar ↗
    ×1
  623. Clymer et al. (2024). search Scholar ↗
    ×1
  624. Colossus (2024). search Scholar ↗
    ×1
  625. Dalla-Torre et al. (2024). search Scholar ↗
    ×1
  626. Dao and Gu (2024). Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality. https://arxiv.org/abs/2405.21060 ↗
    ×2
  627. Dec (2024). search Scholar ↗
    ×2
  628. DeepMind, July (2024). search Scholar ↗
    ×15
  629. Ethayarajh et al. (2024). KTO: Model Alignment as Prospect Theoretic Optimization. https://arxiv.org/abs/2402.01306 ↗
    ×1
  630. Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., ... Hendrycks, D. (2024). Humanity's Last Exam. arXiv:2501.14249. https://arxiv.org/abs/2501.14249 ↗ DOI: 10.48550/arXiv.2501.14249
    ×1
  631. Faysse, M., Sibille, H., Wu, T., Omrani, B., Viaud, G., Hudelot, C., & Colombo, P. (2024). ColPali: Efficient Document Retrieval with Vision Language Models. arXiv:2407.01449. https://arxiv.org/abs/2407.01449 ↗
    ×2
  632. Fu et al. (2024). Break the Sequential Dependency of LLM Inference Using Lookahead Decoding. https://arxiv.org/abs/2402.02057 ↗
    ×4
  633. Geiger, A., Wu, Z., Potts, C., Icard, T., & Goodman, N. (2024). Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations. Proceedings of the Third Conference on Causal Learning and Reasoning, PMLR 236, 160–187. https://proceedings.mlr.press/v236/geiger24a.html ↗
    ×1
  634. Genmo Team (2024). Mochi 1: A new SOTA in open text-to-video. Genmo Blog (model release, October 2024); weights at https://huggingface.co/genmo/mochi-1-preview. https://www.genmo.ai/blog/mochi-1-a-new-sota-in-open-text-to-video ↗
    ×1
  635. Glazer et al. (2024). search Scholar ↗
    ×2
  636. Google (2024). search Scholar ↗
    ×2
  637. GPUs (2024). search Scholar ↗
    ×1
  638. Greenblatt, Shlegeris, Sachan, Roger (2024). search Scholar ↗
    ×10
  639. Groeneveld et al. (2024). OLMo: Accelerating the Science of Language Models. https://arxiv.org/abs/2402.00838 ↗
    ×1
  640. Guan et al. (2024). search Scholar ↗
    ×1
  641. Hagemann et al. (2024). search Scholar ↗
    ×1
  642. Hao, M., Gong, J., Zeng, X., Liu, C., Guo, Y., Cheng, X., Wang, T., Ma, J., Zhang, X., & Song, L. (2024). Large-scale foundation model on single-cell transcriptomics. Nature Methods, 21(8), 1481–1491. https://www.nature.com/articles/s41592-024-02305-7 ↗ DOI: 10.1038/s41592-024-02305-7
    ×1
  643. Hayou, S., Ghosh, N., & Yu, B. (2024). LoRA+: Efficient Low Rank Adaptation of Large Models. Proceedings of the 41st International Conference on Machine Learning (ICML 2024). https://arxiv.org/abs/2402.12354 ↗
    ×1
  644. Hendrycks et al. (2024). AI Deception: A Survey of Examples, Risks, and Potential Solutions. https://arxiv.org/abs/2308.14752 ↗
    ×1
    ambiguous: details

    No 2024 paper found with Hendrycks as first author on in-context/agentic LLM deception. Most plausible target given the context is Park, Goldstein, O'Gara, Chen, & Hendrycks (2024) 'AI Deception: A Survey...' in Patterns — but Hendrycks is the last author, so the in-text 'Hendrycks et al.' attribution appears to be a citation error. Cannot rule out that the author intended a different work (e.g., Scheurer et al. 2024 on strategic deception under pressure, or Hagendorff 2024 'Deception abilities emerged in LLMs').

  645. Hong et al. (2024). ORPO: Monolithic Preference Optimization without Reference Model. https://arxiv.org/abs/2403.07691 ↗
    ×3
  646. Hsieh et al. (2024). RULER: What's the Real Context Size of Your Long-Context Language Models?. https://arxiv.org/abs/2404.06654 ↗
    ×2
  647. Hubinger et al. (2024). search Scholar ↗
    ×13
  648. Jain et al. (2024). search Scholar ↗
    ×2
  649. Jiang et al. (2024). search Scholar ↗
    ×1
  650. Jimenez, Yang, Wettig, Yao, Pei, Press, Narasimhan (2024). search Scholar ↗
    ×4
  651. July (2024). search Scholar ↗
    ×1
  652. June (2024). search Scholar ↗
    ×1
  653. Khan et al. (2024). search Scholar ↗
    ×1
  654. Kim, M. J., Pertsch, K., Karamcheti, S., Xiao, T., Balakrishna, A., Nair, S., Rafailov, R., Foster, E., Lam, G., Sanketi, P., Vuong, Q., Kollar, T., Burchfiel, B., Tedrake, R., Sadigh, D., Levine, S., Liang, P., & Finn, C. (2024). OpenVLA: An Open-Source Vision-Language-Action Model. arXiv:2406.09246. https://arxiv.org/abs/2406.09246 ↗
    ×2
  655. Korbak et al. (2024).
    ×1
    not_found: details

    No Korbak-first-authored 2024 paper on game-theoretic analyses of AI safety/control protocols found. The clearly matching game-theoretic AI control paper is Griffin, Thomson, Shlegeris, & Abate (2024), 'Games for AI Control' (arXiv:2409.07985) — not Korbak. Korbak's relevant AI-control work, 'A sketch of an AI control safety case' (Korbak, Clymer, Hilton, Shlegeris, & Irving), is arXiv:2501.17315 from January 2025, not 2024, and is about safety cases rather than equilibrium/game analysis. Likely a hallucinated or misattributed citation.

  656. Kuaishou, China (2024). search Scholar ↗
    ×1
  657. Liu et al. (2024). search Scholar ↗
    ×4
  658. Lu et al. (2024). search Scholar ↗
    ×2
  659. March (2024). search Scholar ↗
    ×2
  660. May (2024). search Scholar ↗
    ×6
  661. Meta (2024). search Scholar ↗
    ×3
  662. METR (2024). search Scholar ↗
    ×1
  663. Microsoft (2024). search Scholar ↗
    ×5
  664. Mirzadeh, I., Alizadeh, K., Shahrokhi, H., Tuzel, O., Bengio, S., & Farajtabar, M. (2024). GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. arXiv:2410.05229. https://arxiv.org/abs/2410.05229 ↗
    ×2
  665. Mouton, C. A., Lucas, C., & Guest, E. (2024). The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study. RAND Corporation Research Report RR-A2977-2. https://www.rand.org/pubs/research_reports/RRA2977-2.html ↗ DOI: 10.7249/RRA2977-2
    ×1
  666. Anthropic (2024). Introducing computer use, a new Claude 3.5 Sonnet, and a new Claude 3.5 Haiku. Anthropic News (blog post), October 22, 2024. https://www.anthropic.com/news/3-5-models-and-computer-use ↗
    ×6
  667. OpenAI (2024). Learning to Reason with LLMs. OpenAI Research Blog (September 12, 2024). https://openai.com/index/learning-to-reason-with-llms/ ↗
    ×14
  668. Panickssery, Bowman, Feng (2024). search Scholar ↗
    ×1
  669. Park et al. (2024). search Scholar ↗
    ×2
  670. Plaat et al. (2024). search Scholar ↗
    ×1
  671. Price et al. (2024). search Scholar ↗
    ×1
  672. RAND (2024). search Scholar ↗
    ×4
  673. Rasp et al. (2024). search Scholar ↗
    ×1
  674. Replit (2024). Introducing Replit Agent. Replit Blog. https://blog.replit.com/introducing-replit-agent ↗
    ×1
  675. Runway (2024). Introducing Gen-3 Alpha: A New Frontier for Video Generation. Runway Research (product/model announcement). https://runwayml.com/research/introducing-gen-3-alpha ↗
    ×1
  676. Lu, C., Lu, C., Lange, R. T., Foerster, J., Clune, J., & Ha, D. (2024). The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv:2408.06292. https://arxiv.org/abs/2408.06292 ↗ DOI: 10.48550/arXiv.2408.06292
    ×2
  677. Salvi, F., Horta Ribeiro, M., Gallotti, R., & West, R. (2024). On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial. arXiv:2403.14380. https://arxiv.org/abs/2403.14380 ↗
    ×3
  678. OpenAI (2024). Introducing OpenAI o1-preview. OpenAI (blog announcement, September 12, 2024). https://openai.com/index/introducing-openai-o1-preview/ ↗
    ×2
  679. OpenAI (2024). Learning to Reason with LLMs. OpenAI (blog announcement), September 12, 2024. https://openai.com/index/learning-to-reason-with-llms/ ↗
    ×2
  680. Shah et al. (2024). FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision. https://arxiv.org/abs/2407.08608 ↗
    ×1
  681. Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y. K., Wu, Y., & Guo, D. (2024). DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300. https://arxiv.org/abs/2402.03300 ↗
    ×2
  682. Shumailov et al. (2024). AI models collapse when trained on recursively generated data. https://www.nature.com/articles/s41586-024-07566-y ↗
    ×5
  683. Snell et al. (2024). search Scholar ↗
    ×1
  684. Suno and Udio (2024). search Scholar ↗
    ×1
  685. Templeton et al. (2024). search Scholar ↗
    ×1
  686. Kong, W., Tian, Q., Zhang, Z., Min, R., Dai, Z., Zhou, J., Xiong, J., Li, X., Wu, B., Zhang, J., Wu, K., Lin, Q., Yuan, J., Long, Y., Wang, A., Wang, A., Li, C., Huang, D., Yang, F., ... Zhong, C. (2024). HunyuanVideo: A Systematic Framework For Large Video Generative Models. arXiv:2412.03603. https://arxiv.org/abs/2412.03603 ↗
    ×1
  687. Google Cloud (2024). Trillium TPU is GA (sixth-generation Tensor Processing Unit). Google Cloud Blog (announcement, 2024). https://blog.google/feed/trillium-tpus/ ↗
    ×1
  688. Trinh, T. H., Wu, Y., Le, Q. V., He, H., & Luong, T. (2024). Solving olympiad geometry without human demonstrations. Nature, 625(7995), 476–482. https://www.nature.com/articles/s41586-023-06747-5 ↗ DOI: 10.1038/s41586-023-06747-5
    ×3
  689. Wang et al. (2024). search Scholar ↗
    ×1
  690. Xie et al. (2024). search Scholar ↗
    ×2
  691. Xu et al. (2024). search Scholar ↗
    ×2
  692. Yue et al. (2024). search Scholar ↗
    ×2
  693. Zhou et al. (2024). search Scholar ↗
    ×2
  694. ×2
  695. AIME (2025). search Scholar ↗
    ×1
  696. OpenAI (2025). Introducing deep research. OpenAI (product announcement, February 2, 2025). https://openai.com/index/introducing-deep-research/ ↗
    ×1
  697. DeepSeek-AI (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948. https://arxiv.org/abs/2501.12948 ↗ DOI: 10.48550/arXiv.2501.12948
    ×4
  698. Anthropic (2025). Claude's extended thinking. Anthropic (company blog/news). https://www.anthropic.com/news/visible-extended-thinking ↗
    ×2
  699. Anthropic (2025). Claude 3.7 Sonnet and Claude Code. Anthropic News (announcement, February 24, 2025). https://www.anthropic.com/news/claude-3-7-sonnet ↗
    ×2
  700. Jan (2025). search Scholar ↗
    ×3
  701. January (2025). search Scholar ↗
    ×6
  702. METR (2025). search Scholar ↗
    ×3
  703. NVIDIA (2025). search Scholar ↗
    ×3
  704. OpenAI (2025). search Scholar ↗
    ×1
  705. ×1

AI: A Living Reference by Fuzue. Content licensed under CC BY-SA 4.0 - share, adapt, and build on it; keep the attribution and the open licence on derivatives.