AI Research Archive
Permanent URI for this community
About this Community
AI Research Archive는 인공지능(AI) 기술이 사회, 인간, 정책에 미치는 영향을 다학제적 관점에서 탐색하기 위한 디지털 아카이브입니다. 이 아카이브는 단순한 기술 성과를 넘어서, 윤리, 공정성, 안전성, 사회적 책임을 중심 주제로 연구 자료를 분류하여 제공합니다.
Collection 안내
- AI Ethics & Social Impact – AI의 윤리, 법제도, 사회적 책임, 알고리즘 편향, 규제 이슈
- Natural Language Processing – 언어 모델, 생성형 AI, 다국어 모델, 프롬프트 설계 등
- Computer Vision – 이미지 분류, 의료 영상 분석, GAN, 비전 트랜스포머 등
- Reinforcement Learning – 강화학습 알고리즘, 정책 최적화, 안전한 자율 시스템 등
- AI in Practice – 의료, 교육, 환경 등 다양한 분야에서의 AI 적용 사례
추천 대상
- AI 윤리 및 정책 관련 연구자
- 사회과학, 데이터사이언스, 법학, 기술철학 관련 학문 종사자
- 인공지능에 관심있는 비전공자
News
업데이트 기록
- 2025-05-27 – AI 윤리 컬렉션 5편 탐색
- 2025-05-25 – NLP, CV 컬렉션 기본 구조 완성 및 논문 탐색
- 2025-05-20 – 커뮤니티 개설 및 Collection 구조 확정
Browse
Browsing AI Research Archive by Title
Now showing 1 - 20 of 30
Results Per Page
Sort Options
Item A Style-Based Generator Architecture for Generative Adversarial Networks(arXiv, 2018-12-05) Karras, Tero; Laine, Samuli; Laine, Samuli; Lehtinen, Jaakko; Aila, TimoWe propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.Item A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas(AAAI, 2025-05-07) Venkit Pranav Narayanan; Li Jiayi; Zhou Yingfan; Rajtmajer Sarah; Wilson Shomir본 논문은 LLM이 생성하는 페르소나가 어떻게 특정 사회집단에 나타나는지 컴퓨팅 사회언어학과 HCI 프레임워크로 분석한다.Item Algorithmic Accountability: A Primer(Data & Society Research Institute, 2018-04) Fontaine Claire; Caplan Robyn; Hanson Lauren알고리즘 기반 결정 시스템의 확산이 초래하는 윤리적·사회적 문제를 다루고 있다. 알고리즘이 불투명하게 운영될 경우 편향, 차별, 책임 회피 등이 발생할 수 있으며, 이는 시민의 권리와 신뢰에 부정적 영향을 미친다.Item An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale(arXiv, 2020-10-22) Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob; Houlsby, NeilWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.Item Artificial Intelligence — The Revolution Hasn’t Happened Yet(Harvard Data Science Review, MIT Press, 2019-11) Jordan Michael I인공지능에 대한 과도한 기대와 기술 중심의 언어가 오히려 사회적 발전을 방해할 수 있다고 경고한다. 저자는 진정한 AI 혁명은 기술이 아니라 인간 중심 설계, 정책 설계, 사회적 신뢰 구축에서 시작된다고 주장하며 신중하고 윤리적인 AI 통합을 강조한다.Item Attention Is All You Need(arXiv, 2017-06-12) Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, IlliaThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.Item Big Data’s Disparate Impact(California Law Review, 2016-06-01) Barocas Solon; Selbst Andrew D머신러닝 기반 의사결정에서 나타나는 구조적 차별을 문제화한 고전적 논문이다. AI 윤리와 법제 분야에서 많이 인용되었다.Item Chain-of-Thought Prompting Elicits Reasoning in Large Language Models(arXiv, 2022-01-10) Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed; Le, Quoc; Zhou, DennyWe explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.Item Deep Residual Learning for Image Recognition(arXiv, 2015-12-10) He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, JianDeeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.Item DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning(2025-01-22) DeepSeek-AIWe introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.Item Dermatologist-level classification of skin cancer with deep neural networks(Nature, 2017-01-25) Esteva, Andre; Kuprel, Brett; Novoa, Roberto A; Ko, Justin; Swetter, Susan M; Blau, Helen M; Thrun, SebastianSkin cancer, the most common human malignancy1,2,3, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions. Deep convolutional neural networks (CNNs)4,5 show potential for general and highly variable tasks across many fine-grained object categories6,7,8,9,10,11. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs. We train a CNN using a dataset of 129,450 clinical images—two orders of magnitude larger than previous datasets12—consisting of 2,032 different diseases. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: keratinocyte carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The first case represents the identification of the most common cancers, the second represents the identification of the deadliest skin cancer. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists. Outfitted with deep neural networks, mobile devices can potentially extend the reach of dermatologists outside of the clinic. It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 (ref. 13) and can therefore potentially provide low-cost universal access to vital diagnostic care.Item Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer(arXiv, 2019-10-19) Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michae; Zhou, Yanqi; Li, Wei; Liu, Peter J.Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.Item FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence(arXiv, 2020-01-21) Sohn, Kihyuk; Berthelot, David; Li, Chun‑Liang; Zhang, Zizhao; Carlini, Nicholas; Cubuk, Ekin D; Kurakin, Alex; Zhang, Han; Raffel, ColinSemi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -- just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. We make our code available at this https URL.Item Fully Convolutional Networks for Semantic Segmentation(arXiv, 2015-11-14) Long, Jonathan; Shelhamer, Evan; Darrell, TrevorConvolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.Item GPT-4 Technical Report(arXiv, 2023-03-15) OpenAIWe report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.Item ImageNet Classification with Deep Convolutional Neural Networks(NeurIPS, 2012-12-03) Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E.We trained a large, deep convolutional neural network to classify the 1.3 million high-resolution images in the LSVRC-2010 ImageNet training set into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 39.7\% and 18.9\% which is considerably better than the previous state-of-the-art results. The neural network, which has 60 million parameters and 500,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and two globally connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of convolutional nets. To reduce overfitting in the globally connected layers we employed a new regularization method that proved to be very effective.Item Korean Generative Pre-trained Transformer(arXiv, 2021-12-06) Yang, KichangWith the advent of Transformer, which was used in translation models in 2017, attention-based architectures began to attract attention. Furthermore, after the emergence of BERT, which strengthened the NLU-specific encoder part, which is a part of the Transformer, and the GPT architecture, which strengthened the NLG-specific decoder part, various methodologies, data, and models for learning the Pretrained Language Model began to appear. Furthermore, in the past three years, various Pretrained Language Models specialized for Korean have appeared. In this paper, we intend to numerically and qualitatively compare and analyze various Korean PLMs released to the public.Item LoRA: Low-Rank Adaptation of Large Language Models(arXiv, 2021-06-17) Hu, Edward J.; Shen, Yelong; Wallis, Phillip; Allen‑Zhu, Zeyuan; Li, Yuanzhi; Wang, Shean; Wang, Lu; Chen, WeizhuAn important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at this https URL.Item mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer(arXiv, 2020-10-22) Xue, Linting; Constant, Noah; Roberts, Adam; Kale, Mihir; Al‑Rfou, Rami; Siddhant, Aditya; Barua, Aditya; Raffel, ColinThe recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.Item Pre-training of Deep Bidirectional Transformers for Language Understanding(arXiv, 2018-10-11) evlin, Jacob; Chang, Ming‑Wei; Lee, Kenton; Toutanova, KristinaWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).