AI Research Archive
Permanent URI for this community
About this Community
AI Research Archive는 인공지능(AI) 기술이 사회, 인간, 정책에 미치는 영향을 다학제적 관점에서 탐색하기 위한 디지털 아카이브입니다. 이 아카이브는 단순한 기술 성과를 넘어서, 윤리, 공정성, 안전성, 사회적 책임을 중심 주제로 연구 자료를 분류하여 제공합니다.
Collection 안내
- AI Ethics & Social Impact – AI의 윤리, 법제도, 사회적 책임, 알고리즘 편향, 규제 이슈
- Natural Language Processing – 언어 모델, 생성형 AI, 다국어 모델, 프롬프트 설계 등
- Computer Vision – 이미지 분류, 의료 영상 분석, GAN, 비전 트랜스포머 등
- Reinforcement Learning – 강화학습 알고리즘, 정책 최적화, 안전한 자율 시스템 등
- AI in Practice – 의료, 교육, 환경 등 다양한 분야에서의 AI 적용 사례
추천 대상
- AI 윤리 및 정책 관련 연구자
- 사회과학, 데이터사이언스, 법학, 기술철학 관련 학문 종사자
- 인공지능에 관심있는 비전공자
News
업데이트 기록
- 2025-05-27 – AI 윤리 컬렉션 5편 탐색
- 2025-05-25 – NLP, CV 컬렉션 기본 구조 완성 및 논문 탐색
- 2025-05-20 – 커뮤니티 개설 및 Collection 구조 확정
Browse
Browsing AI Research Archive by Issue Date
Now showing 1 - 20 of 30
Results Per Page
Sort Options
Item ImageNet Classification with Deep Convolutional Neural Networks(NeurIPS, 2012-12-03) Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E.We trained a large, deep convolutional neural network to classify the 1.3 million high-resolution images in the LSVRC-2010 ImageNet training set into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 39.7\% and 18.9\% which is considerably better than the previous state-of-the-art results. The neural network, which has 60 million parameters and 500,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and two globally connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of convolutional nets. To reduce overfitting in the globally connected layers we employed a new regularization method that proved to be very effective.Item U-Net: Convolutional Networks for Biomedical Image Segmentation(arXiv, 2015-05-18) Ronneberger, Olaf; Fischer, Philipp; Brox, ThomasThere is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .Item You Only Look Once: Unified, Real-Time Object Detection(arXiv, 2015-06-09) Redmon, Joseph; Divvala, Santosh; Girshick, Ross; Farhadi, AliWe present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.Item Fully Convolutional Networks for Semantic Segmentation(arXiv, 2015-11-14) Long, Jonathan; Shelhamer, Evan; Darrell, TrevorConvolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.Item Deep Residual Learning for Image Recognition(arXiv, 2015-12-10) He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, JianDeeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.Item Big Data’s Disparate Impact(California Law Review, 2016-06-01) Barocas Solon; Selbst Andrew D머신러닝 기반 의사결정에서 나타나는 구조적 차별을 문제화한 고전적 논문이다. AI 윤리와 법제 분야에서 많이 인용되었다.Item Dermatologist-level classification of skin cancer with deep neural networks(Nature, 2017-01-25) Esteva, Andre; Kuprel, Brett; Novoa, Roberto A; Ko, Justin; Swetter, Susan M; Blau, Helen M; Thrun, SebastianSkin cancer, the most common human malignancy1,2,3, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions. Deep convolutional neural networks (CNNs)4,5 show potential for general and highly variable tasks across many fine-grained object categories6,7,8,9,10,11. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs. We train a CNN using a dataset of 129,450 clinical images—two orders of magnitude larger than previous datasets12—consisting of 2,032 different diseases. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: keratinocyte carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The first case represents the identification of the most common cancers, the second represents the identification of the deadliest skin cancer. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists. Outfitted with deep neural networks, mobile devices can potentially extend the reach of dermatologists outside of the clinic. It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 (ref. 13) and can therefore potentially provide low-cost universal access to vital diagnostic care.Item Attention Is All You Need(arXiv, 2017-06-12) Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, IlliaThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.Item The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation(arXiv, 2018-02-20) Brundage Miles; Avin Shahar; Clark JackAI 기술이 사회적 해를 유발할 수 있는 가능성을 체계적으로 제시한 최초의 종합 보고서 중 하나로 윤리적 대응의 필요성과 국제 협력의 필요성을 강조한다.Item Algorithmic Accountability: A Primer(Data & Society Research Institute, 2018-04) Fontaine Claire; Caplan Robyn; Hanson Lauren알고리즘 기반 결정 시스템의 확산이 초래하는 윤리적·사회적 문제를 다루고 있다. 알고리즘이 불투명하게 운영될 경우 편향, 차별, 책임 회피 등이 발생할 수 있으며, 이는 시민의 권리와 신뢰에 부정적 영향을 미친다.Item Using Spatial Reinforcement Learning to Build Forest Wildfire Dynamics Models From Satellite Images(Frontiers in ICT, 2018-04-11) Sullivan, Andrew L.Machine learning algorithms have increased tremendously in power in recent years but have yet to be fully utilized in many ecology and sustainable resource management domains such as wildlife reserve design, forest fire management, and invasive species spread. One thing these domains have in common is that they contain dynamics that can be characterized as a spatially spreading process (SSP), which requires many parameters to be set precisely to model the dynamics, spread rates, and directional biases of the elements which are spreading. We present related work in artificial intelligence and machine learning for SSP sustainability domains including forest wildfire prediction. We then introduce a novel approach for learning in SSP domains using reinforcement learning (RL) where fire is the agent at any cell in the landscape and the set of actions the fire can take from a location at any point in time includes spreading north, south, east, or west or not spreading. This approach inverts the usual RL setup since the dynamics of the corresponding Markov Decision Process (MDP) is a known function for immediate wildfire spread. Meanwhile, we learn an agent policy for a predictive model of the dynamics of a complex spatial process. Rewards are provided for correctly classifying which cells are on fire or not compared with satellite and other related data. We examine the behavior of five RL algorithms on this problem: value iteration, policy iteration, Q-learning, Monte Carlo Tree Search, and Asynchronous Advantage Actor-Critic (A3C). We compare to a Gaussian process-based supervised learning approach and also discuss the relation of our approach to manually constructed, state-of-the-art methods from forest wildfire modeling. We validate our approach with satellite image data of two massive wildfire events in Northern Alberta, Canada; the Fort McMurray fire of 2016 and the Richardson fire of 2011. The results show that we can learn predictive, agent-based policies as models of spatial dynamics using RL on readily available satellite images that other methods and have many additional advantages in terms of generalizability and interpretability.Item Pre-training of Deep Bidirectional Transformers for Language Understanding(arXiv, 2018-10-11) evlin, Jacob; Chang, Ming‑Wei; Lee, Kenton; Toutanova, KristinaWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).Item A Style-Based Generator Architecture for Generative Adversarial Networks(arXiv, 2018-12-05) Karras, Tero; Laine, Samuli; Laine, Samuli; Lehtinen, Jaakko; Aila, TimoWe propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.Item Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer(arXiv, 2019-10-19) Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michae; Zhou, Yanqi; Li, Wei; Liu, Peter J.Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.Item Artificial Intelligence — The Revolution Hasn’t Happened Yet(Harvard Data Science Review, MIT Press, 2019-11) Jordan Michael I인공지능에 대한 과도한 기대와 기술 중심의 언어가 오히려 사회적 발전을 방해할 수 있다고 경고한다. 저자는 진정한 AI 혁명은 기술이 아니라 인간 중심 설계, 정책 설계, 사회적 신뢰 구축에서 시작된다고 주장하며 신중하고 윤리적인 AI 통합을 강조한다.Item 인공지능(AI)과 법조 분야: 윤리적·규제적 고려사항(경제규제와 법, 2019-11) Webley Lisa; 권헌영(역)법률 및 기타 전문 서비스는 4차 산업 시대에 접어드는 가운데, 근본적 기술 변혁으로 인해 변호사, 법률 서비스, 법률 시스템에 방대한 와해가 발생할 것이라는 예측이 있 다. 인공지능(AI)은 아직 전문 서비스 시장을 심각하게 뒤 흔들고 있지는 않으나, 글로벌 법률회사 및 회계 법인의 제 품과 서비스가 더 많이 상품화되고 있는 상황에서 이 업계 의 변화는 멀지 않은 것으로 보인다. 이는 국내 혹은 국가 간 금융 시스템의 운영 방식에 영향을 미치고 결국 전 사회 에 막대한 여파를 가져올 수 있다. AI는 AI 사용에 대한 현행 법적, 규제적 프레임워크의 범위, 한도, 적절성에 대해 근본적인 질문을 제기한다. 전 문가, 비즈니스, 소비자, 정부, 국가가 당면한 윤리적인 도전과제와 위험이 있는 것이다. 이 막강한 기술은 생산 성과 부를 증대하는 한편 사회적 불평등을 줄이고 정의에 대한 접근성을 높이기 위해 사용될 수 있을 것이다. 그러 나 사생활을 침해하거나, 국가나 기업이 개인을 감시하는 데 사용될 수도 있으며, 권위주의적 통제와 사회적 불평등의 심화의 도구가 될 수도 있다. 변호사는 이런 새로운 상황에서 법치주의, 권리의 보호, 바람직한 거버넌스, 정 당한 의사 결정을 지키는데 앞장설 수 있고, 또한 앞장서 야 한다. 본 논문은 법률 서비스와 사법 제도에서 AI 시스템이 어 떻게 사용되는지 살펴볼 것이다. 변호사 등이 의뢰인에게 법적 자문을 제공하는데 어떻게 AI 기술을 활용할 수 있을 지 제안할 것이다. 또한 가상 시대에 관할권의 무력함과 빅 데이터에 접속하여 법적 맥락에서 AI를 개발하는 사람들의 막강함 등 아직 다루어지지 않은 일부 규제적, 윤리적 도전 과제에 대해서도 논의할 것이다. 본 논문은 또한 의사결정의 투명성, 데이터의 질과 데이 터로부터의 유추, 전문가의 책무와 윤리, 법적 책임, 사회 적 수용, 공적 신뢰 등의 주제에 대해서도 간략히 다룬다. 이런 변화가 우리 법률 시스템에 미치게 될 가능한 영향과, AI 기술과 그 사용의 규제적 접근법에 따라 법치가 어떻게 강화되거나 약화될 수 있는지에 대한 논의를 통해 이를 살 펴볼 것이다.Item FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence(arXiv, 2020-01-21) Sohn, Kihyuk; Berthelot, David; Li, Chun‑Liang; Zhang, Zizhao; Carlini, Nicholas; Cubuk, Ekin D; Kurakin, Alex; Zhang, Han; Raffel, ColinSemi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -- just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. We make our code available at this https URL.Item Scaling Laws for Neural Language Models(arXiv, 2020-01-23) Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, DarioWe study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.Item Towards a Human-like Open-Domain Chatbot(arXiv, 2020-01-27) Adiwardana, Daniel; Luong, Minh-Thang; So, David R.; Hall, Jamie; Fiedel, Noah; Thoppilan, Romal; Yang, Zi; Kulshreshtha, Apoorv; Nemade, Gaurav; Lu, Yifeng; Le, Quoc V.We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.Item An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale(arXiv, 2020-10-22) Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob; Houlsby, NeilWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.