From beff9e877852eb947650cb7e02bb763e7fe734a8 Mon Sep 17 00:00:00 2001 From: Nemo Date: Tue, 11 Jan 2022 13:24:05 +0530 Subject: [PATCH] new paper on Hanabi Closes #9 --- README.md | 4 +- boardgame-research.rdf | 116 +++++++++++++++++++++++++++++++++++------ 2 files changed, 104 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 2746e52..1daf14c 100644 --- a/README.md +++ b/README.md @@ -160,9 +160,11 @@ If you aren't able to access any paper on this list, please [try using Sci-Hub]( - [Playing mini-Hanabi card game with Q-learning](http://id.nii.ac.jp/1001/00205046/) (conferencePaper) - [Hanabi Open Agent Dataset](https://github.com/aronsar/hoad) (computerProgram) - [Hanabi Open Agent Dataset](https://dl.acm.org/doi/10.5555/3463952.3464188) (conferencePaper) -- [Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi](http://arxiv.org/abs/2107.07630) (journalArticle) +- [Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi](https://arxiv.org/abs/2107.07630) (journalArticle) - [A Graphical User Interface For The Hanabi Challenge Benchmark](http://oru.diva-portal.org/smash/record.jsf?pid=diva2%3A1597503) (thesis) - [Emergence of Cooperative Impression With Self-Estimation, Thinking Time, and Concordance of Risk Sensitivity in Playing Hanabi](https://www.frontiersin.org/articles/10.3389/frobt.2021.658348/full) (journalArticle) +- []() (conferencePaper) +- [K-level Reasoning for Zero-Shot Coordination in Hanabi](https://papers.neurips.cc/paper/2021/hash/4547dff5fd7604f18c8ee32cf3da41d7-Abstract.html) (conferencePaper) # Hearthstone - [Mapping Hearthstone Deck Spaces through MAP-Elites with Sliding Boundaries](http://arxiv.org/abs/1904.10656) (journalArticle) diff --git a/boardgame-research.rdf b/boardgame-research.rdf index fd02506..85c0f41 100644 --- a/boardgame-research.rdf +++ b/boardgame-research.rdf @@ -7000,7 +7000,7 @@ Publisher: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarb - + journalArticle arXiv:2107.07630 [cs] @@ -7057,6 +7057,7 @@ Publisher: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarb + @@ -7075,12 +7076,23 @@ Publisher: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarb arXiv.org - http://arxiv.org/abs/2107.07630 + https://arxiv.org/abs/2107.07630 2021-07-24 06:30:44 arXiv: 2107.07630 + + attachment + 86e8f7ab32cfd12577bc2619bc635690-Paper.pdf + + + https://papers.neurips.cc/paper/2021/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf + + + 2022-01-11 07:50:59 + 3 + attachment arXiv Fulltext PDF @@ -8179,8 +8191,8 @@ guaranteed decent high score. The algorithm got a lowest score of 79 and a - + Analysis of 'The Settlers of Catan' Using Markov Chains Markov chains are stochastic models characterized by the probability of future states depending solely on one's current state. Google's page ranking system, financial phenomena such as stock market crashes, and algorithms to predict a company's projected sales are a glimpse into the array of applications for Markov models. Board games such as Monopoly and Risk have also been studied under the lens of Markov decision processes. In this research, we analyzed the board game "The Settlers of Catan" using transition matrices. Transition matrices are composed of the current states which represent each row i and the proceeding states across the columns j with the entry (i,j) containing the probability the current state i will transition to the state j. Using these transition matrices, we delved into addressing the question of which starting positions are optimal. Furthermore, we worked on determining optimality in conjunction with a player's gameplay strategy. After building a simulation of the game in python, we tested the results of our theoretical research against the mock run throughs to observe how well our model prevailed under the limitations of time (number of turns before winner is reached). May 3, 2021 @@ -8192,17 +8204,6 @@ guaranteed decent high score. The algorithm got a lowest score of 79 and a 53 - - attachment - Nagel__Lauren-Honors_Project.pdf - - - https://repository.tcu.edu/bitstream/handle/116099117/49062/Nagel__Lauren-Honors_Project.pdf?sequence=1&isAllowed=y - - - 2021-12-19 11:15:50 - 3 - attachment Full Text @@ -8215,6 +8216,17 @@ guaranteed decent high score. The algorithm got a lowest score of 79 and a 1 application/pdf + + attachment + Nagel__Lauren-Honors_Project.pdf + + + https://repository.tcu.edu/bitstream/handle/116099117/49062/Nagel__Lauren-Honors_Project.pdf?sequence=1&isAllowed=y + + + 2021-12-19 11:15:50 + 3 + journalArticle @@ -8783,6 +8795,78 @@ guaranteed decent high score. The algorithm got a lowest score of 79 and a 1 text/html + + conferencePaper + + + + + + conferencePaper + + + + + + + + Brandon Cui + + + + + Hengyuan Hu + + + + + Luis Pineda + + + + + Jakob Foerster + + + + + + + K-level Reasoning for Zero-Shot Coordination in Hanabi + The standard problem setting in cooperative multi-agent settings is \emph{self-play} (SP), where the goal is to train a \emph{team} of agents that works well together. However, optimal SP policies commonly contain arbitrary conventions (``handshakes'') and are not compatible with other, independently trained agents or humans. This latter desiderata was recently formalized by \cite{Hu2020-OtherPlay} as the \emph{zero-shot coordination} (ZSC) setting and partially addressed with their \emph{Other-Play} (OP) algorithm, which showed improved ZSC and human-AI performance in the card game Hanabi. OP assumes access to the symmetries of the environment and prevents agents from breaking these in a mutually \emph{incompatible} way during training. However, as the authors point out, discovering symmetries for a given environment is a computationally hard problem. Instead, we show that through a simple adaption of k-level reasoning (KLR) \cite{Costa-Gomes2006-K-level}, synchronously training all levels, we can obtain competitive ZSC and ad-hoc teamplay performance in Hanabi, including when paired with a human-like proxy bot. We also introduce a new method, synchronous-k-level reasoning with a best response (SyKLRBR), which further improves performance on our synchronous KLR by co-training a best response. + + + https://papers.neurips.cc/paper/2021/hash/4547dff5fd7604f18c8ee32cf3da41d7-Abstract.html + + + + + Advances in Neural Information Processing Systems 34 pre-proceedings (NeurIPS 2021) + + + + + attachment + Paper + + + https://papers.neurips.cc/paper/2021/file/4547dff5fd7604f18c8ee32cf3da41d7-Paper.pdf + + + 2022-01-11 07:52:40 + 3 + + + attachment + Supplemental + + + https://papers.neurips.cc/paper/2021/file/4547dff5fd7604f18c8ee32cf3da41d7-Supplemental.pdf + + + 2022-01-11 07:52:49 + 3 + 2048 @@ -8886,9 +8970,11 @@ guaranteed decent high score. The algorithm got a lowest score of 79 and a - + + + Hearthstone