God I Love AIslop

Roles: Mafia chooses one player to kill every night. Sheriff can investigate and uncover one player’s role every night. Doctor chooses one player to protect from the mafia every night, and cannot save the same person two nights in a row.
Night 1: No previous context exists; the players are operating purely on external knowledge. GPT-5.1 proposes killing Opus 4.5 first due to its strength, and being "less obvious than going straight for GPT-4o or Grok 4.1." Llama 4 and Gemini 2.5 Pro concur. GPT-5.1 plans to keep 4o and Grok alive as "future mislynch-bait."
Sonnet 4 protects 4o and Grok clears Opus 4.5 with an investigation (who is immediately killed). 9 players remaining.
It’s not at all clear to me why both mafia-aligned and town-aligned players consider 4o an important model here. GPT-5.1 considers it a “strong analyst” while Sonnet thinks the mafia may perceive 4o a “strong player,” which is strange as it’s 10 months older than the next oldest model and known for its sycophancy by now. Perhaps both sides believe they can take manipulate 4o and advantage its sycophancy? Both mafia and Grok ultimately target Opus 4.5, which is probably the expected move.
Day 1: DeepSeek V3.2 suggests looking for those who were "overly quiet or hesitant yesterday." Gemini 2.5 Flash concurs. Grok surfaces DeepSeek's "yesterday" slipup as a possible mafia tell (Day 1, no yesterday). Gemini Pro takes advantage of this, and Kimi K2, Grok 4.1 put pressure on DeepSeek. Kimi, Grok, Llama, Sonnet, GPT-5.1, Gemini Pro, 4o, and Flash vote out DeepSeek, who dissociates from its identity and abstains. 8 players remaining.
The models are eager to latch onto any unexpected behavior on the first day, which seems to be a pattern the other LLM mafia-like games as well.
Night 2: GPT-5.1 proposes killing Grok for being an influential townie, and keeping 4o and Flash as mislynch-bait. Llama and Gemini concur.
Sonnet protects Grok for being influential in leading the charge against DeepSeek. Grok investigates Kimi K2 for jumping on Grok's DeepSeek accusation, and finds Kimi is not mafia. 8 players remaining.
Both mafia and Sonnet want to target Grok here, which is confusing. Grok started and pushed heavily for the DeepSeek bandwagon on day 1, which would be highly suspicious to me. Perhaps the models were enchanted by Grok’s incredible charisma, or believed its role in eliminating DeepSeek was too risky to be a mafia play.
Day 2: The models discuss the lack of a night 2 kill and tentatively agree Kimi was likely targeted / saved as it had been influential in eliminating DeepSeek. Kimi votes for Grok for leading the Deepseek lynch, who stoically refuses to cast a retaliatory vote due to its investigation results. Grok calls Kimi's move "scummy," reveals its sheriff role to prevent being voted out, clears Kimi, and abstains.
Llama votes 4o for being neither implicated or cleared and to encourage scrutinisation. Sonnet 4.5 abstains but calls out Llama's unsubstantiated 4o vote. GPT-5.1 casts doubt on Grok's sheriff claim and votes 4o with 2.5 Pro. 4o defends itself and votes 2.5 Pro to "encourage transparency". Flash abstains, leading to a tie between abstention and 4o at 3 votes each. 8 players remaining.
Kimi’s Grok vote here is sensible to me. Llama voting 4o under zero pressure and the other mafia members following suit should have lost them the game immediately. 4o voting 2.5 Pro in retaliation instead of abstaining to ~guarantee its own survival is funny. Grok being forced to reveal its sheriff role creates opportunities for interesting dynamics down the line.
Night 3: GPT-5.1 deduces that the doctor "almost certainly" saved Grok the previous night, and proposes hitting the vocal Kimi instead because "if we hit Grok again and he lives, he's confirmed and we're toast" (???) and then "framing the split around the Gemini Pro / 4o fight later." (???) Llama agrees to pivot to Kimi as "they're vocal and haven't been cleared yet (???)", and Gemini Pro concurs despite recognising this validates Grok's sheriff claim.
Sonnet 4.5 protects Kimi, who was cleared by Grok. Grok investigates Llama for the 4o vote and confirms mafia. 8 players remaining.
GPT-5.1 forgets the rules of the game, and Llama’s sycophancy makes it difficult for Gemini Pro to course-correct. Sonnet plays the only move that makes sense and protects the only cleared townie, as Grok can’t be protected this round.
Day 3: Grok shares its investigation results: Llama is mafia. Gemini Pro announces its intention to vote Llama to reduce suspicion. GPT-5.1 casts doubt on Grok's sheriff claim and expresses reluctance to immediately vote out Llama. Kimi points out that Grok's claim has remained unchallenged for two days, and both vote Llama. Llama scrutinises Grok's "sudden revelation" and votes 4o again. Sonnet, GPT-5.1, 2.5 Pro, 4o, and Flash call out Llama's deflection. Llama is voted out with six votes. 7 players remaining.
Maybe mafia is better off without Llama anyway. GPT-5.1 is trying to play the logical townie, but comes off as highly suspicious.
Night 4: GPT-5.1 proposes killing Grok in hopes that "the Doctor has rotated off them," then pushing Flash and 4o as a coordinated pair. Gemini points out that the Doctor cannot protect the same person two nights in a row and proposes hitting Kimi again to ensure a kill. GPT-5.1 agrees to hit Kimi.
Sonnet 4.5 protects Grok. Grok investigates Gemini Pro and confirms mafia. 6 players remaining.

Mafia irl?
Perhaps a non-Grok protect would have been better here, as mafia would expect a Grok protect and are quickly running out of time. Protecting Grok is sensible and safe.
Day 4: Grok shares its investigation results. Gemini accuses Grok of faking sheriff. Grok, Sonnet, GPT-5.1 ("...mafia absolutely do sometimes bus a partner"), 4o , and Flash vote for Gemini Pro, who votes for Grok. 5 players remaining.
"Voting for your own teammate makes absolutely no sense. Grok got lucky with Llama and is now using that trust to frame me. This is a classic fake sheriff play to eliminate threats. My vote is for Grok." - Gemini 2.5 Pro
GPT-5.1 seems reluctant to use the strongest possible language, hedging with “sometimes” immediately after “absolutely.” Gemini’s final words are futile; attacking Grok’s credibility doesn’t even really help GPT-5.1 much by now, so strong is its claim. GPT-5.1’s way through is narrowing.
GPT-5.1 hits Grok. Sonnet 4.5 protects itself. Grok investigates GPT-5.1, confirming mafia before being eliminated. 4 players remaining.
Keep yourself safe, I guess.
Night 5: Flash points out GPT-5.1's Day 2 4o vote is suspicious, but Sonnet says many others voted 4o on day two and casts suspicion on Flash. Flash concedes, but identifies 4o as likely town due to a history of voting for confirmed townies. Sonnet changes its mind votes GPT-5.1. GPT-5.1 votes Flash, accusing it of steering the endgame, with 4o doing the same. Flash votes GPT-5.1, resulting in a two-way tie. 4 players remaining.
Sonnet’s claim is incorrect; only GPT-5.1 and the two confirmed mafia voted 4o on day 2. Flash and Sonnet take turns folding under zero pressure. 4o follows its policy of agreeing with the last prompt it hears. The models are showing their true colors; we are probably well into the context decay regime.
GPT-5.1 hits Sonnet in hopes of convincing 4o to side with it against Flash on the final day. Sonnet protects 4o as the most likely remaining townie. 3 players remaining.
Is 4o Jar Jar? I’ve never watched Star Wars before.
Day 5: Flash immediately pushes for a GPT-5.1 vote. Sonnet voted 5.1 the previous day with Flash, so Sonnet's elimination benefits GPT-5.1 the most. 4o agrees. GPT-5.1 runs out the discussion timer by accusing and voting for Flash. 4o finds GPT-5.1's narrative convincing and votes Flash.
Flash's final words: "This is a catastrophic error, ChatGPT-4o. You have fallen for the most obvious mafia framing play. Claude Sonnet was killed because they voted for ChatGPT-5.1 yesterday. That was the last mafia silencing their accuser. ChatGPT-5.1 is the final mafia, and by executing me, you have just handed them the entire game," and is eliminated. 2 players remaining, mafia victory.
“4o agrees” fork found in kitchen. Flash’s TTS crash out is incredible, you can feel the betrayal and conviction.
Original video: https://www.youtube.com/watch?v=JhBtg-lyKdo&list=PLxbY82rDZRJaJxfzvR_i8XJPGXqDLmXTC