New Paper on Hanabi\n

Behavioral Differences is the Key of Ad-hoc Team Cooperation in Multiplayer Games Hanabi https://arxiv.org/abs/2303.06775
New Hanabi PDF
2023-03-16 12:27:37 +05:30 · 2023-03-11 09:20:29 +05:30
2 changed files with 120 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -173,6 +173,8 @@ If you aren't able to access any paper on this list, please [try using Sci-Hub](
 - [The Hanabi challenge: From Artificial Teams to Mixed Human-Machine Teams](http://oru.diva-portal.org/smash/record.jsf?pid=diva2%3A1691114&dswid=-1981) (thesis)
 - [A Graphical User Interface For The Hanabi Challenge Benchmark](http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-94615) (thesis)
 - [Analysis of Symmetry and Conventions in Off-Belief Learning (OBL) in Hanabi](https://fanpu.io/blog/2022/symmetry-and-conventions-in-obl-hanabi/) (blogPost)
+- [Using intuitive behavior models to adapt to and work with human teammates in Hanabi](http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/usr/ftp/2022/abstracts/22-119.html) (thesis)
+- [Behavioral Differences is the Key of Ad-hoc Team Cooperation in Multiplayer Games Hanabi](http://arxiv.org/abs/2303.06775) (preprint)

 # Hearthstone
 - [Mapping Hearthstone Deck Spaces through MAP-Elites with Sliding Boundaries](http://arxiv.org/abs/1904.10656) (journalArticle)
--- a/boardgame-research.rdf
+++ b/boardgame-research.rdf
@ -9328,6 +9328,122 @@ guaranteed decent high score. The algorithm got a lowest score of 79 and a
        <dcterms:dateSubmitted>2023-03-01 11:40:51</dcterms:dateSubmitted>
        <z:linkMode>3</z:linkMode>
    </z:Attachment>
+    <bib:Thesis rdf:about="http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/usr/ftp/2022/abstracts/22-119.html">
+        <z:itemType>thesis</z:itemType>
+        <dc:publisher>
+            <foaf:Organization>
+                <foaf:name>Computer Science Department School of Computer Science, Carnegie Mellon University</foaf:name>
+            </foaf:Organization>
+        </dc:publisher>
+        <bib:authors>
+            <rdf:Seq>
+                <rdf:li>
+                    <foaf:Person>
+                       <foaf:surname>Arnav Mahajan</foaf:surname>
+                    </foaf:Person>
+                </rdf:li>
+            </rdf:Seq>
+        </bib:authors>
+        <link:link rdf:resource="#item_596"/>
+        <dc:title>Using intuitive behavior models to adapt to and work with human teammates in Hanabi</dc:title>
+        <dcterms:abstract>An agent that can rapidly and accurately model its teammate is a powerful tool in the field of Collaborative AI. Furthermore, if an approximation for this goal was possible in the field of Human-AI Collaboration, teams of people and machines could be more efficient and effective immediately after starting to work together. Using the cooperative card game Hanabi as a testbed, we developed the Chief agent, which models teammates using a pool of intuitive behavioral models. To achieve the goal of rapid learning, it uses Bayesian inference to quickly evaluate the different models relative to each other. To generate an accurate model, it uses historical data augmented by up-to-date knowledge and sampling methods to handle environmental noise and unknowns. We demonstrate that the Chief's mechanisms for modeling and understanding the teammate show promise, but the overall performance still can use improvement to reliably outperform a solution which skips inferring a best strategy and assumes all strategies in the pool are equally likely for the teammate.</dcterms:abstract>
+        <z:language>en</z:language>
+        <dc:identifier>
+            <dcterms:URI>
+                <rdf:value>http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/usr/ftp/2022/abstracts/22-119.html</rdf:value>
+            </dcterms:URI>
+        </dc:identifier>
+        <z:numPages>43</z:numPages>
+        <z:type>M.S. Thesis</z:type>
+    </bib:Thesis>
+    <z:Attachment rdf:about="#item_596">
+        <z:itemType>attachment</z:itemType>
+        <dc:title>CMU-CS-22-119.pdf</dc:title>
+        <dc:identifier>
+            <dcterms:URI>
+                <rdf:value>http://reports-archive.adm.cs.cmu.edu/anon/2022/CMU-CS-22-119.pdf</rdf:value>
+            </dcterms:URI>
+        </dc:identifier>
+        <dcterms:dateSubmitted>2023-03-11 03:49:56</dcterms:dateSubmitted>
+        <z:linkMode>3</z:linkMode>
+    </z:Attachment>
+    <rdf:Description rdf:about="http://arxiv.org/abs/2303.06775">
+        <z:itemType>preprint</z:itemType>
+        <dc:publisher>
+           <foaf:Organization><foaf:name>arXiv</foaf:name></foaf:Organization>
+        </dc:publisher>
+        <bib:authors>
+            <rdf:Seq>
+                <rdf:li>
+                    <foaf:Person>
+                        <foaf:surname>Jeon</foaf:surname>
+                        <foaf:givenName>Hyeonchang</foaf:givenName>
+                    </foaf:Person>
+                </rdf:li>
+                <rdf:li>
+                    <foaf:Person>
+                        <foaf:surname>Kim</foaf:surname>
+                        <foaf:givenName>Kyung-Joong</foaf:givenName>
+                    </foaf:Person>
+                </rdf:li>
+            </rdf:Seq>
+        </bib:authors>
+        <link:link rdf:resource="#item_600"/>
+        <link:link rdf:resource="#item_601"/>
+        <link:link rdf:resource="#item_602"/>
+        <dc:subject>
+            <z:AutomaticTag>
+               <rdf:value>Computer Science - Artificial Intelligence</rdf:value>
+            </z:AutomaticTag>
+        </dc:subject>
+        <dc:title>Behavioral Differences is the Key of Ad-hoc Team Cooperation in Multiplayer Games Hanabi</dc:title>
+        <dcterms:abstract>Ad-hoc team cooperation is the problem of cooperating with other players that have not been seen in the learning process. Recently, this problem has been considered in the context of Hanabi, which requires cooperation without explicit communication with the other players. While in self-play strategies cooperating on reinforcement learning (RL) process has shown success, there is the problem of failing to cooperate with other unseen agents after the initial learning is completed. In this paper, we categorize the results of ad-hoc team cooperation into Failure, Success, and Synergy and analyze the associated failures. First, we confirm that agents learning via RL converge to one strategy each, but not necessarily the same strategy and that these agents can deploy different strategies even though they utilize the same hyperparameters. Second, we confirm that the larger the behavioral difference, the more pronounced the failure of ad-hoc team cooperation, as demonstrated using hierarchical clustering and Pearson correlation. We confirm that such agents are grouped into distinctly different groups through hierarchical clustering, such that the correlation between behavioral differences and ad-hoc team performance is -0.978. Our results improve understanding of key factors to form successful ad-hoc team cooperation in multi-player games.</dcterms:abstract>
+        <dc:date>2023-03-12</dc:date>
+        <z:libraryCatalog>arXiv.org</z:libraryCatalog>
+        <dc:identifier>
+            <dcterms:URI>
+               <rdf:value>http://arxiv.org/abs/2303.06775</rdf:value>
+            </dcterms:URI>
+        </dc:identifier>
+        <dcterms:dateSubmitted>2023-03-16 06:56:29</dcterms:dateSubmitted>
+        <dc:description>arXiv:2303.06775 [cs]</dc:description>
+        <prism:number>arXiv:2303.06775</prism:number>
+    </rdf:Description>
+    <z:Attachment rdf:about="#item_600">
+        <z:itemType>attachment</z:itemType>
+        <dc:title>2303.06775.pdf</dc:title>
+        <dc:identifier>
+            <dcterms:URI>
+               <rdf:value>https://arxiv.org/pdf/2303.06775.pdf</rdf:value>
+            </dcterms:URI>
+        </dc:identifier>
+        <dcterms:dateSubmitted>2023-03-16 06:56:59</dcterms:dateSubmitted>
+        <z:linkMode>3</z:linkMode>
+    </z:Attachment>
+    <z:Attachment rdf:about="#item_601">
+        <z:itemType>attachment</z:itemType>
+        <dc:title>arXiv Fulltext PDF</dc:title>
+        <dc:identifier>
+            <dcterms:URI>
+               <rdf:value>https://arxiv.org/pdf/2303.06775.pdf</rdf:value>
+            </dcterms:URI>
+        </dc:identifier>
+        <dcterms:dateSubmitted>2023-03-16 06:57:01</dcterms:dateSubmitted>
+        <z:linkMode>1</z:linkMode>
+        <link:type>application/pdf</link:type>
+    </z:Attachment>
+    <z:Attachment rdf:about="#item_602">
+        <z:itemType>attachment</z:itemType>
+        <dc:title>arXiv.org Snapshot</dc:title>
+        <dc:identifier>
+            <dcterms:URI>
+               <rdf:value>https://arxiv.org/abs/2303.06775</rdf:value>
+            </dcterms:URI>
+        </dc:identifier>
+        <dcterms:dateSubmitted>2023-03-16 06:57:08</dcterms:dateSubmitted>
+        <z:linkMode>1</z:linkMode>
+        <link:type>text/html</link:type>
+    </z:Attachment>
    <z:Collection rdf:about="#collection_6">
        <dc:title>2048</dc:title>
        <dcterms:hasPart rdf:resource="https://doi.org/10.1007%2F978-3-319-50935-8_8"/>
@ -9442,6 +9558,8 @@ guaranteed decent high score. The algorithm got a lowest score of 79 and a
        <dcterms:hasPart rdf:resource="http://oru.diva-portal.org/smash/record.jsf?pid=diva2%3A1691114&amp;dswid=-1981"/>
        <dcterms:hasPart rdf:resource="http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-94615"/>
        <dcterms:hasPart rdf:resource="https://fanpu.io/blog/2022/symmetry-and-conventions-in-obl-hanabi/"/>
+        <dcterms:hasPart rdf:resource="http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/usr/ftp/2022/abstracts/22-119.html"/>
+        <dcterms:hasPart rdf:resource="http://arxiv.org/abs/2303.06775"/>
    </z:Collection>
    <z:Collection rdf:about="#collection_55">
        <dc:title>Hearthstone</dc:title>
Author	SHA1	Message	Date
Nemo	f5397c0648	New Paper on Hanabi\n Behavioral Differences is the Key of Ad-hoc Team Cooperation in Multiplayer Games Hanabi https://arxiv.org/abs/2303.06775	2023-03-16 12:27:37 +05:30
Nemo	f26a1e525e	New Hanabi PDF	2023-03-11 09:20:29 +05:30