frustrating statistics

My experience is that the statistics in the game can be frustrating, as they can feel completely different than listed.

  1. Rare Gold Egg – should be 44% chance of getting a L or M, and 56% chance of SE. My hunch is that it is quite a bit less chance of success than this from my years of use. My recent try was ~15 hatches, where I received 1 M, 0 L, and 14 SE. This is statistically significant, of the most negative of ways! Here’s an example of part of the spin where I received 5 SE’s in a row: Screen_Recording_20230831_081637_Neo Monsters.mp4 - Google Drive

  2. Gorgodrake’s passive Playing with Embers (13% chance of death) and the Devil’s Luck reduction to 8%. My hunch is that this is closer to 30 to 60% chance of death, depending on type of game. When playing buffed monsters it seems there is almost a 90% chance of death. My plan has been to record my findings to prove the listed stats or incorrect, but haven’t had time yet. Either way, it is hugely frustrating when the odds of him surviving should be reasonably high for 3+ uses (78%).

I’d personally classify these as bugs, but it isn’t as objective as I’d like it to be. Anyone else with similar experiences?

To be honest, We cant trust the dev with Numbers at all.
Too shady and so many mistakes.

What shines through your post is your lack of understanding of probability. I suggest you go read about it and focus on the concept of sample size.

1 Like

I have an awakend gorgodrake i have used him alot couple thousand matches proberly an id say he kills himself of immidietly in about 5% of of the matches most commonly he gets 3/4 rounds before expiring and more often than he dies he just hangs around until someone kills him or the match is over

From your post you proberly just bit the sour end of the straw in a match and got salty

Simmer buddy. Bringing up my observations to see if it is just an incorrect hunch or if others feel similarly. For what I have on video, there’s a 5.5% chance of that occurring. For 14/15, there’s a 0.03% chance. Fairly low odds of this scenario occurring. Not unheard of, but it is recurring for me (i.e., with increased sample size.).

For Gorgodrake I specifically notice it in boss battle or with buffed monsters, not with PvP. I’m not salty about a single match – this is from observations of ~2 years with awakened Gorgodrake. I will try to record observations and report back.

15 eggs is not enough to make any conclusions. You’ll have huge variation. In the next 15 hatches you could get 5 legendaries in a row etc…

I don’t have Gorgodrake, but I think it’s negativity bias from your part.

Actually… (:point_up: :nerd_face:)

The chance of getting mythics and legendaries is applied to each rare egg, individually. “Fairly low odds of this scenario occurring”… Not really. I’d say the “luck” does its job here.

Not sure what maths you did to get 0.03%. I’ve tried different mistakes and can’t reproduce it.

If we calculate it with binomials it’s 0.56^14 * 0.44 * 15. The *15 because there are 15 possible combinations of this outcome (1st hatch being the non-SE all the way to 15th hatch being it).

That calculation gives 0.2% chance.

It’s very natural to question the numbers and have feelings that your own experience is different from them. However, that’s just because we find it hard to wrap our head around percentages and there’s a lot of bias we struggle to get past. When it comes to hatch rates in particular, it’s a legal requirement they are exactly as displayed (to the best effort the company can do) so it shouldn’t be in question that they are wrong… they are not.

2 Likes

Hello. I agree my previous findings were preliminary and sparked questioning given the low sample size. My apologies for being premature. I do not have more egg statistics but agree that it was very much a low-likelihood event getting what I did.

I believe Gorgodrake is different though from multiple years of use. Below is a more detailed analysis of its statistics.

My primary objective was to test the advertised 13% probability of death (POD) for Playing with Embers and 8% POD for Playing with Sparks. Secondary objective was to assess the distribution of the number of turns before death to assess if it is linear or skewed.

My method included creating a team that protected Gorgodrake to allow it to have turns that used seconds and assess how many turns it had before automatic death by ‘Playing with Sparks’ and ‘Playing with Embers’. I played each 150 times for over 1000 turns each for a reasonable sample size. I primarily used the Extreme L2 of Super Challenge Battle, as my hunch is the stats are worse when fighting buffed monsters.

My findings suggest a POD of 13.6% for Playing with Sparks (5.6% higher than advertised - suggests a substantial difference) and 14.6% POD for Playing with Embers (1.6% higher than advertised - suggests a minimal difference). See all data below.

Also see below plot for distribution of all data collected. Dying within a few turns is more likely than surviving many turns. For most matches, a player is unlikely to use Gorgodrake more than ~5 times, meaning this distribution makes it feel like it dies a lot more often than it survives. It would feel better if the distribution was uniform or perhaps binomial with the most likely value being around the 10 turns, but that also may make Gorgodrake feel too strong. This is ultimately up to the developers of course.

I hope the developers see this assessment and test it. I trust there is no malintent on the listed percentages but would appreciate their looking into it based on these findings. I trust they have ability to auto-play that would drastically increase the sample size for such an analysis. Thank you.

6 Likes

Nice analysis! If you still have the dataset with all test results, could you please share it?
Since the casual variable here is “number of turns until death”, it should follow the Geometric distribution.
I’d like to run a χ² test on the data to estimate the goodness of fit for each variable (Sparks, Embers) with the respective Geometric distribution (one with p=.08 and the other with p=.13), to quantify how much deviates from the theoretical distribution.

I’m keeping an eye on mine after seeing this post. It’s been 5 Rare Eggs so far, all SE.

Test number Playing with Sparks Playing with Embers
1 10 2
2 8 1
3 16 3
4 6 1
5 5 6
6 2 3
7 1 1
8 2 6
9 5 16
10 1 3
11 9 3
12 8 3
13 1 7
14 1 1
15 2 1
16 5 7
17 3 7
18 7 3
19 3 5
20 9 6
21 10 5
22 8 10
23 1 18
24 4 6
25 6 3
26 2 2
27 19 1
28 3 7
29 8 4
30 2 8
31 10 9
32 14 1
33 14 6
34 1 5
35 14 9
36 12 1
37 18 5
38 6 8
39 10 4
40 7 1
41 1 9
42 18 11
43 7 8
44 6 7
45 4 7
46 12 17
47 17 14
48 18 12
49 1 15
50 3 5
51 6 6
52 5 1
53 6 2
54 4 8
55 1 1
56 16 6
57 5 14
58 1 1
59 19 5
60 14 6
61 8 7
62 7 6
63 7 1
64 4 4
65 12 1
66 1 16
67 1 3
68 2 13
69 1 4
70 3 13
71 6 9
72 2 10
73 5 20
74 2 18
75 6 15
76 6 11
77 10 18
78 18 3
79 1 4
80 9 15
81 2 6
82 20 3
83 4 8
84 19 7
85 4 1
86 3 3
87 4 18
88 16 9
89 10 8
90 14 17
91 8 19
92 11 7
93 3 13
94 9 14
95 7 1
96 2 1
97 4 1
98 2 1
99 18 4
100 9 3
101 2 11
102 13 5
103 5 14
104 1 6
105 4 4
106 4 7
107 6 7
108 3 12
109 6 5
110 1 14
111 6 6
112 16 4
113 12 3
114 3 3
115 6 1
116 10 5
117 15 1
118 18 21
119 1 1
120 16 3
121 1 15
122 2 5
123 17 13
124 5 1
125 2 14
126 19 1
127 10 2
128 3 15
129 8 12
130 8 2
131 14 6
132 9 3
133 7 1
134 6 1
135 19 4
136 2 3
137 14 17
138 4 6
139 2 5
140 2 13
141 2 2
142 14 11
143 8 2
144 9 9
145 9 11
146 4 16
147 18 1
148 5 1
149 5 8
150 9 1
2 Likes

Ok so, I did some research in advance and found an article stating that χ² GOF tests’ power is lower when the distribution isn’t normal/binomial/Poisson, so I opted for a one-sided Kolmogorov-Smirnov test, to check whether the empirical distribution lies above the theoretical Geometric distribution or not.

The results show that, as you stated, Playing With Embers’ supposed activation is not significatively above chance the one shown by the data: we cannot refute the hypothesis of our sample being extracted from a Geometric distribution with probability of success=0.13.
However the same could be said for Playing With Sparks’ activation chance.

Below the code used:

# import the dataset as game_data

n <- 0:19

geom.embers <- dgeom(n, 0.13)
geom.embers <- append(geom.embers, pgeom(19, 0.13, lower.tail = FALSE)) # the last 
#element is the right tail of the probability distribution
ref.embers <- table(game_data$`Playing with Embers`) # create a frequency table
ref.embers <- ref.embers/sum(ref.embers) # turn frequencies into probability values

geom.sparks <- dgeom(n, 0.08)
geom.sparks[20] <- append(geom.sparks, pgeom(18, 0.08, lower.tail = FALSE)) # the 
# Playing with Sparks variable has 20 manifestations
ref.sparks <- table(game_data$`Playing with Sparks`)
ref.sparks <- ref.sparks/sum(ref.sparks)

ks.test(ref.embers, geom.embers, alternative = "greater")
# p-value = 0.6405
ks.test(ref.sparks, geom.sparks, alternative = "greater")
# p-value = 0.268
3 Likes

Mathematician here, as you know, but I don’t quite follow the explanation of what you did there. Could you rephrase and explain again?

Sure! Are you familiar with the workings of statistical tests (null/alternative hypotheses, significance levels etc.)? If not, I’ll start with that since it’s the most important part

I think this can help you :

It’s not exactly maths but more of statistics.

@DMGInterference i think you meant to say same could NOT be said for playing with sparks as p-value is low. Maybe KD got confused there.

Are there any other tests? I guess most of the others tests like Shapiro-Wilktest compare the sample with normal distribution.

1 Like

I meant that actually! A p-value of 0.27 is still not sufficient evidence against the null hypothesis, I’d say. The typical significance level for a test doesn’t exceed 10%.

I found an interesting study while choosing what test to run:
https://dergipark.org.tr/tr/download/article-file/83619

3 Likes

Yeah true generally p-value has to be below 0.05 or 0.1 to reject the null hypothesis.

Thanks guys, a few more words and I realised what we were looking at. I couldn’t quite link the words to the code and was a little confused whether “p = 0.13” and “p-value” were separate things (I don’t know if this is standard, but I’d always recommend using different names to make it clear).

Also, for non-mathematicians it’s good to point out what those p-values actually mean. I.e. The probability of the result happening. So the embers result is 64.05% (very likely so clearly nothing wrong) and the sparks result is 26.8% (still reasonably high chance, need to be below 10% or 5% to be “statistically significant”).

Also it’s good to note that for results to actually prove statistically significant takes a lot. So it’s perfectly normal to get this result from sparks as shown above and to feel it shows the passive is bugged, even though it hasn’t reached the point the results prove it mathematically. What the results show is on the path to proving a bug, but more results need to be gathered to see if it manages to get there.

My bad!! Force of habit from my part… I’ve always referred to the parameter of the Geometric distribution as “p”. Didn’t realize this would (naturally) be a naming issue because of the p-values! I’ll clarify in the original post.

Exactly! p-values are an indicator of how likely you are to be wrong rejecting the null hypothesis with the data you have. In our case, our p-value of 26.8% means:
“If we extracted 100 samples we could extract from a 0.08 Geometric-distributed population, we expect 26-27 of them to be further away from the theoretical distribution than the one we have.”

It’s undoubtedly evidence in favour of the alternative hypothesis, but ≈25% chance means it’s still not strong enough evidence.

3 Likes