Browse by filter
Trending This Week
Featured Resources
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks. This paper proposes the Transformer, a model architecture relying entirely on an attention mechanism, dispensing with recurrence entirely. Experiments on two machine translation tasks achieve state-of-the-art results.
We train GPT-3, an autoregressive language model with 175 billion parameters, and show that scaling language models greatly improves task-agnostic few-shot performance. GPT-3 achieves strong performance on many NLP benchmarks without any gradient updates or fine-tuning.
We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. Our deep residual networks are easier to optimise and gain accuracy from greatly increased depth. On ImageNet we achieve a 3.57% error rate, winning 1st place on ILSVRC 2015.
We introduce BERT, designed to pre-train deep bidirectional representations from unlabelled text by jointly conditioning on both left and right context. BERT obtains state-of-the-art results on eleven NLP tasks including question answering and language inference.
We present the first Event Horizon Telescope (EHT) images of the supermassive black hole in M87. An asymmetric ring structure with a bright, thick ring is consistent with the relativistic beaming and shadow predicted by general relativity for a Kerr black hole.