[Mutator] Even Match (Onslaught team balancer)

Anything about UT2004 mapping, Uscripting & more
User avatar
Wormbo
Posts: 383
Joined: Sun 28. Aug 2011, 11:52
Description: Coding Dude

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Wormbo » Tue 4. Oct 2016, 19:41

Technically, matches are already more balanced with EvenMatch than without it. (The Omni crowd had to admit that after EvenMatch was taken down for some time.) Too high of an average time makes matches exhausting, while getting people to stop complaining will never happen. Anyway, I don't really plan on any additional changes for EvenMatch beyond this point. The only thing I'm still considering is an option to turn off map-specific stats so you aren't forced to have your stats file grow if you don't see any relevant improvements.

That said, what exactly do you mean by "win/lose rate"? Being on the winning or losing team? That probably only really works for players who stay until the round actually ends.

User avatar
Cat1981England
Administrator
Posts: 2232
Joined: Mon 23. Aug 2010, 15:35

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Cat1981England » Tue 4. Oct 2016, 20:58

We keep a careful eye on things such as average match time and are very much aware and appreciative of the balance EvenMatch achieves. When switching players during the match was enabled, things did become "too" balanced and virtually every match with 10+ players went into overtime.

The reason why i ask though, is that it seems fairly balanced as it is. Yes there are occasional matches where a team is stacked, but from what i've observed, it's normally down to people not playing very well or a poor map design choice. As an example, when i added 5 link setups to Minus compare to 1, the average match time went down from 18.6 to 16.7 That's not down to the balancer but my poor link setups and/or player gameplay.

Putting to one side any performance issues this update may cause, i wonder if this would make much of a difference to OMNI as their popularly played maps are fairly similar and could also make us "too" balanced once again. No reason to waste your free time.

----------

What i mean by win/lose rate is if you start on Blue and they win, you get a point regardless of what team you finish on - unless TitanTeamFix or EvenMatch switches you. This would give OMNI truly well rounded teams while also getting around CEONSS's spamming/DM vs vehicle point discrepancy. One thing you really notice when looking through Epic stats and the player avg PPH is how they have very little to do with each other :thumbup:
The Universal Declaration of Human Rights, Article 1:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

User avatar
Wormbo
Posts: 383
Joined: Sun 28. Aug 2011, 11:52
Description: Coding Dude

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Wormbo » Wed 5. Oct 2016, 04:39

That win/loss rate idea actually seems reasonable. The only issue with it could be the amount of time (games) required to "calibrate" itself. I wouldn't worry about "overbalancing" too much here, as there's still a fair amount of in-game fluctuation going on. The Omni server has one important difference, though: They don't play just a single round. I suspect their balancer/server settings are a bit suboptimal for what they want to achieve.

User avatar
Cat1981England
Administrator
Posts: 2232
Joined: Mon 23. Aug 2010, 15:35

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Cat1981England » Sun 9. Oct 2016, 19:18

I'm just going to leave these here if that's ok because i seem to be having problems trying to save files with the correct encoding :crazy:

Russian:

Code: Select all

DeathString="%o был вынужден автоматически сменить команды."
descDeletePlayerPPHAfterDaysNotSeen="Чтобы предотвратить бесконечное накопление данных PPH (и влияния на эффективность), удалите игроков PPH, которых не видели в течение данного числа дней."
lblPlayerGameSecondsBeforeStoringPPH="Игрок в игре за секунды доя сохранения PPH"
descPlayerGameSecondsBeforeStoringPPH="Игрок должен был набрать по крайней мере это число секунд игры в текущем матче, прежде чем его или ее PPH будет считаться достаточно значимым для сохранения в базе данных."
lblPlayerMinScoreBeforeStoringPPH="Минимальный балл игрока перед сохранением PPH"
descPlayerMinScoreBeforeStoringPPH="Игрок должен набрать по крайней мере эту сумму очков в текущем матче, прежде чем его или ее PPH будет считаться достаточно значимым для сохранения в базе данных."


Spanish:

Code: Select all

DeathString="%o me vi forzado a cambiar automáticamente de equipo."
descDeletePlayerPPHAfterDaysNotSeen="Para evitar que los datos del PPH se amontonen indefinidamente (y afecten al rendimiento), borra el PPH de los jugadores que no han aparecido en este número de días."
lblPlayerGameSecondsBeforeStoringPPH="El jugador en el juego segundos antes de guardar el PPH"
descPlayerGameSecondsBeforeStoringPPH="Un jugador debe haber acumulado al menos este número de segundos de tiempo de juego en el partido actual antes de que su PPH s considere lo bastante significativo como para guardarlo en la base de datos."
lblPlayerMinScoreBeforeStoringPPH="Puntuación mínima del jugador antes de guarda el PPH"
descPlayerMinScoreBeforeStoringPPH="Un jugador debe haber obtenido al menos todos estos puntos en el partido actual antes de que su PPH se considere significativo como para guardar en la base de datos."


Italian:

Code: Select all

DeathString="%o fu obbligato a cambiare le squadre automaticamente."
descDeletePlayerPPHAfterDaysNotSeen="Per evitare che i dati dei punti PPH si accumulino per un tempo indeterminato (con ripercussioni sulle prestazioni), cancellare i PPH dei giocatori che non si sono visti in questo numero di giorni."
lblPlayerGameSecondsBeforeStoringPPH="Giocatore in gioco alcuni secondi, prima di memorizzare PPH"
descPlayerGameSecondsBeforeStoringPPH="Un giocatore deve aver accumulato un tempo di gioco di almeno questo numero di secondi, nella partita corrente, affinché i suoi PPH siano considerati sufficientemente significativi per essere salvati nel database."
lblPlayerMinScoreBeforeStoringPPH="Punteggio minimo del giocatore per memorizzare PPH"
descPlayerMinScoreBeforeStoringPPH="Un giocatore deve aver totalizzato almeno questi tanti punti, nella partita corrente, affinché i suoi PPH siano considerati sufficientemente significativi per essere salvati nel database."
The Universal Declaration of Human Rights, Article 1:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

User avatar
Wormbo
Posts: 383
Joined: Sun 28. Aug 2011, 11:52
Description: Coding Dude

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Wormbo » Mon 24. Oct 2016, 19:23

So, the new version isn't even out yet, and it already affects player experience negatively on the Omni server. I'm equally confused and amused.

User avatar
Miauz55555
Posts: 777
Joined: Sun 7. Jun 2015, 22:12
Location: Germany

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Miauz55555 » Mon 24. Oct 2016, 22:34

Wormbo wrote:[...] I'm equally confused and amused.

Me too. But more the second one. :D

***Edit: They may have lost all there data and start from 0.. or the thing Lagzilla postet.
We are happy with your work. Thanks for that. :thumbup: ***
Image

User avatar
Pegasus
Posts: 962
Joined: Wed 4. Nov 2009, 23:37
Description: ONSWordFactory
Location: Greece

[Mutator] Even Match (Onslaught team balancer)

Postby Pegasus » Mon 31. Oct 2016, 02:27

Only managed to free up my customary, evening "web-related stuff" writing window in recent weeks, so I figured I might as well try n' put down some of the (better formed) thoughts milling about in my head over the past year or so on this mod and balancing in general. Considering the tardiness in addressing recent concerns n' developments, this post could be argued to work more to my benefit, by finally silencing 'em through flushing out and allowing me to move on with other concerns, rather than offering timely aid to anyone else (still) invested in or involved with EvenMatch's development at this point, so with apologies for that as well as thanks for indulging, as per usual, let's get into it.


Wormbo wrote:[...]Keep in mind that map-specific lists only ever include players EvenMatch saw playing that map for long enough and with high enough score. As a result they will pretty much always be a lot smaller than the overall list.[...]
Just to make sure I'm not getting this wrong, since both "map-specific lists" as well as an "overall list" are mentioned here, will the updated EM with map-specific PPH tracking enabled be regularly retrieving data just from a number of (dozens of) different map name-specific .ini files for the most part (which would indeed make reading and writing to them a trivial overhead in terms of server resource consumption), or will a massive, possibly multi-thousand-entry, "overall" list still be coming into play at some points, thus necessitating a correspondingly massive Uscript data structure to reside in RAM and reflect such an EvenMatchPPH.ini's two-dimensional contents (previous playerID-PPH pair versus map) for reading/searching, storing and/or sorting purposes? If a server would have to deal with the latter scenario only on rare occasions, that's a sensible design choice; if not, it could be worrisome. From your post I'm getting a sense that the data will mostly be handled along a fractured [multi-dozen] number of map-specific .inis, but it's the use frequency of the alternative that involves the "overall" list I'm trying to get a better grasp on. Whatever direction the implementation of this updated EM goes, adding an option for server admins to opt out of multi-map PPH tracking seems like a wise choice indeed and one that can also help glean useful info post-deployment via differential testing n' analysis without having to regress to previous versions, which might render conclusions from direct comparisons between EM versions on the same server/context less compelling.

Wormbo wrote:[...]UnrealScript itself can handle the data amount just fine. It only keeps the total list and the current map's list in memory, so all the other maps don't matter at all. Each list is a simple dynamic array of structs, sorted by the players' "EvenMatch ID" string, which allows efficient O(log N) lookup and approximately O(1) insert. (I know, it's really O(N), but the constant overhead of inserting outweighs the linear part by so much, it won't really matter.)[...]
The dozens of per-map, individual lists are indeed not a scary arrangement, exactly for the reasons you mention here - a few hundred structs in a dynamic array is something that should still be both manageable and searchable enough. What might "keep[ing] [...] the total list [...] in memory" entail in terms of memory size, though, if said list now incorporates a whole new dimension of data, meaning one might be facing a dynamic array "skyscraper" of 5 or 6 or more thousands of entries? Would something like that still be possible to be handled elegantly in the background, even when not doing any specific (say, searching) operations to it? Has any other mod in the game's past, UTAN or otherwise, reached similar limits in data structure size residing in memory, for that matter? As far as stock game Uscript code goes, the only case type that comes to mind where arrays of such size can be encountered are with per-vertice, manually configured or procedurally composed matrices, like TerrainInfos, FluidSurfaceInfos, xProcMeshes and such, but even those, I believe, receive substantial native support, i.e. they are not exclusively powered by Uscript in terms of performance.
Anywho, the nitty-gritty of Uscript resources management n' optimization isn't a field I'd consider myself any kinda expert in yet, so I'm mostly probing here because this project is exceptional/unique enough in this regard that it could help further inquiring parties' understanding. In other words, this is more about trying to grok how far Uscript's limits can be stretched memory-wise, rather than being a stubborn, doubting Thomas - just in case you're wondering why I've stuck with this as far as I did :). Seeing as the discussion is also developing in another direction that's more my speed (so... more like velocity then?), plus considering your generally prudent n' resource-frugal developing attitude, I think it's safe for me to drop the performance talk here and focus on the other facets mentioned.


Cat1981England wrote:[...]I have to ask though, at what point are we going to consider matches balanced? Are we looking for an average match time or until people stop complaining?[...]
Wormbo wrote:[...]Technically, matches are already more balanced with EvenMatch than without it. (The Omni crowd had to admit that after EvenMatch was taken down for some time.) Too high of an average time makes matches exhausting, while getting people to stop complaining will never happen.[...]
Ah. Now we're starting to get serious, and, specifically, to the rarely discussed core of the balancing issue: what we fundamentally understand the concept to be, what its presence in a match is understood to look like - and, just as importantly, what should not be construed as balance - or what the most accurate way to assess team balance is.

To start from the part that seems the easiest to address first, and because there's recently also been legitimate points raised about it elsewhere, I'd hope we can at least agree that the function of a mod aiming to assist in delivering more balanced matches, such as EvenMatch intends here, shouldn't resemble that of, say, a corrupt referee on the take by a sports TV network, who'll make whatever arbitrary calls they need to prolong the match as long as possible for the sake of ad revenue. One reason for this assertion stems from our collective knowledge that lengthy matches aren't necessarily the product of closely balanced teams, since this can sometimes be attributed to flawed map editing in terms of objective chokepoint design (say, due to single primary Node Link Setups {NLSes} or lengthy, "pearls-on-a-string" arrangements of nodes) or, more typically, geometric bottlenecks near cores (think Panalesh at late game) than can have a much stronger team stymied right outside the defenders' base even from early on in a round right until the end of OT. Basically, "round went on for as long as possible, so it must've been a balanced fight" is a fallacious assumption without additional info is what I'm driving at here.
Beyond just constituting an insufficient yardstick by itself, match length is also not the only manner through which balance is expressed during a round, since other events that also tend to be observed when closely matched teams fight each other include the changing hands of nodes across both midsection sides with increased frequency, and/or both cores becoming vulnerable to, or even receiving, damage during Regulation Time (RT), occasionally with rounds even coming down to "base trade" strategies in the end due to sparsely intersecting flows or well buffered-NLSes in some maps - your TripleSlaps, Tanks-a-Lots, Maelstroms etc.. It follows then, that metrics such as the flow of territory possession and pre-OT core damage also need to be taken into account before one can safely conclude that a long-drawn round, or match, was the product of proper balancing, bad map design or something else.
What this boils down to, to my mind anyway, is that all matches must (and, indeed, need to) eventually end, and if the only consistent result a balancer mod that attempts to address stacked teams (by switching people around) ends up producing is longer, on average, matches, that on its own should by no means constitute sufficient evidence the mod is actually serving its mission - before even factoring in the cost to user experience this will inevitably also carry.

So, if an EvenMatch that merely acts as an artificial hurdle to "natural" round/match resolution, by switching people around whenever things look likely to take such a course and so running out the clock, isn't what we want, what do we want from it? Well, we want what its name suggests, presumably: even matches. Or, at least, what we perceive that to be through our own perspective n' awareness while playing ingame, which pours another can of, ahem, worms right in the middle of the ongoing debate of whether "we had it better before or after the balancer" and when matches were truly the most balanced. The crucial, introductory distinction here is between measurable fact on a comprehensive scale and shifting, emotionally-affected personal impression. While we should care that the former is designed to do its task as accurately and verifiably as our theoretical understanding of balance assessment currently allows it to be, the latter should also not be overlooked or neglected - say, on grounds of subjectivity or unsubstantiated complaints - because its buildup over time can have adverse effects to the system, possibly even resulting to having it torn down, as the Omni example has aptly demonstrated.
As already stressed, a system allowing for post-match reviewable and verifiable performance can go a long way towards allaying concerns or dispelling impressions of bad judgement calls on the mod's behalf: "Why did it choose to switch me when someone else was obviously the better choice?", "Why did the balancer think things were even and wasn't kicking in when we were behind for this long?", "How was this balanced?" and so on. This requires both that such a feature exist - and, IIRC, it does in EvenMatch, in the debug/verbose mode - as well as that it's impressed upon server staff administering communities where such concerns are ongoing that transparency via the enabling of EvenMatch's debug output can be helpful to spotting problems in the code's logic, if indeed there are any. To move past the skepticism rut though, it's necessary not just to enable a switch and sit back, but to also start looking into any such sentiments whenever they arise in matches, and following through with them systematically (and on a venue better suited to dispassionate, levelheaded examination, such as a msg. board) by producing the accompanying debug output, showing exactly what kind of calculations the mod was doing at any balance-affecting juncture in any match rife with commentary of the aforementioned type, and thus reaching a community-wide verdict on the validity of each such claim rather than allowing 'em to build up to a climate of pervasive doubt or perceived indifference.
If proactive (and reactive) server staff is the more straightforward component of the human perception factor in the whole match balance issue, players and their ingame sense of what an even match looks, or should look, like is certainly its far messier complement. Even though EvenMatch cannot be expected to understand all of the finer, qualitative subtleties that might affect some matches' outcome (say, in specific team members' playstyles on specific maps), generally speaking, the mod tries to tackle the concept of balance through the Progress indicator, a compound measurement that relies on facts based on "hard" metrics, such as node possession and core exposure, all it of which it has direct access to and can keep constant, comprehensive track of. That is fundamentally not how a typical player will approach the question of whether teams are evenly matched in any given instance, however. Sure, they'll occasionally glance at the scoreboard and some might even hash out a ballpark estimate of total PPH difference, or (more usually) how many among the overall top 6-10 scorers belong to their team or how many idlers they have, but, overall, it's safe to say that crunching down four-figure/1st-decimal accurate numbers for territory control percentages or PPH aggregates isn't how most people go about deciding whether they're happy or not with the current state of match balance. Their approach is a bit more vague, a bit more fuzzy, and the perception of an even match tends to involve things like success frequencies for personally devised n' phrased tests to the tune of "number of times it took to successfully raid an enemy node OR number of times I've repeatedly died in the attempt", "number of times I was assisted at said raid OR number of times I remember trying alone", "number of times I've noticed being outnumbered in a row out in the open OR general team presence near me around the map", "number of times the enemy successfully assumed control of key midsection/neutral resources OR took down (one of) our primary/-ies versus our own attempts", "number of times I've noticed key vehicles remaining unused at our base", to name just a few of the ways we've all tried to assess on the fly how well our team was doing against the enemy.
When it comes to boiling down to something usable and useful what "balance" actually is to an individual player, I gotta sympathize with any developer who might try to tackle the task, because it seems there's a certain "I'll know it when I see it" quality to it - kinda like how intelligence was framed as a concept in introductory courses in Artificial Intelligence and Expert Systems, if memory serves. This epistemological deficit doesn't just mean any such dev would be starting from a theoretically shifty foundation as far as processing/analyzing the problem comprehensively goes, or being able to compare it to other familiar ones for additional insight, but that, in practice, it could also require resorting to statistical methodology (talking about surveys n' polls here) to suss out which of the various practical, trackable metrics they might try associating "balance" with would best align with what people felt were balanced matches when asked after the fact, thus sorting out the (combinations of) variables that are truly relevant from the rest. What I'm getting at here is, since the scale of this problem, if we cared to dissect, map it out and attack it as thoroughly as possible, is pretty damn big, whereas EvenMatch has always had more sensible and practical intentions than rivaling the scope of scientific research projects, the judicious cutting of some corners and playing it by ear (regarding the extent of efforts to identify the best balance-related metrics or the amount of statistical verification those would require to confirm) might be a more prudent (read sane) course to stick to here. This would typically include shortcuts, such as employing our own experience as UT players to get a sense of how balanced test matches ended up being instead of conducting laborious, large sample polling each time, or narrowing down the number of candidate variables to be used for balance calculating to the ones that seem more probable rather than exhaustively running through the gamut of player performance-related metrics that could influence balance - stuff like that. Basically, this is just a pragmatic admission that nobody involved with EvenMatch's improvement can leverage the technical, financial, scientific, playerbase telemetry-powered or Big Data-sized resources that large, professional game dev studios with dozens of released projects under their belts, or universities' research departments can typically muster in order to comprehensively "solve" match balancing, and that this is the reality we need to adjust ourselves, and EvenMatch's scope, to.
Now, understandably, "we might need to go lo-fi rather than hi-fi in some places" isn't exactly the most groundbreaking takeaway to end the theoretical part of the issue on, I'll admit, but this section was less meant to offer something like that from the outset, while being more concerned with sufficiently bolstering n' clarifying EvenMatch's conceptual framework, through which communication around the UT community about the project could possibly be improved so as to prevent false impressions from forming (or, worse, compounding into pervasive skepticism) about how [magically] effective EM might ever end up being with regard to "solving balance". And since, unfortunately, managing impressions or expectations has already become a (messy) part of the mod's development, complicating the remaining work or possibly having an even more adverse impact to its potential future form (affecting all servers that employ it, it bears reminding), perhaps part of the ...miscommunication problem originating from parties, whose theoretical understanding of the subject matter EM attempts to handle has remained persistently lacking, could be lifted from Wormbo's shoulders just by pointing them to this wordwall. After all, as hundreds of my retired advisors tell me, this is a great wall, folks - THE BEST wall, in fact - one that's sure to strike fear in the heart of even the baddest of UT hombres, believe me, you'll be SO satisfied! Oh, and we'll also get Omni to pay for it too. Ahem. Just hoping that last section might help improve the overall signal to noise ratio of ongoing EM discussions, is all.

As for the remaining matter of which metrics are likely better suited to gauge personal performance or match balance by, practically speaking, there's a couple of suggestions already thrown out there that come across as a bit hasty upon closer inspection, and which I'd like to get into briefly before moving on to what other options might have more reliable chances of fulfilling that role.
One such recent idea, then, proposed retrieving players' win/loss rates from Epic's ut2004stats tracking server(s) and, presumably after focusing only on the ONS-relevant fields, using those as a reliable determinant for individual player performance, through which participants would subsequently be sorted n' assigned to either team during pre-match as the balancer deems necessary. Beyond the technical issue that immediately springs to mind from the suggestion that a notoriously slow to respond, third party resource become an integral part of a game server's per-match balancing loop (and pre-match to boot, in terms of retrieval window urgency!), with all the unreliability and co-dependence complexities this would naturally introduce, there's also the issue of data irrelevance to consider. While some people may have had numerous different profiles tracked on those servers along their UT "career", which could make their stats more recent and representative of their current playing abilities, for others those win/loss figures can represent aggregate achievements (or failures) that might date back to more than a decade ago and possibly paint an entirely different picture to the kind of player they might've evolved into in later years. Not to mention the near immutability of those grand figures compared to the added day-to-day trickle, meaning that if they still represent a player's abilities, that's cool, but if not, that needle's unlikely to ever move significantly enough, so by incorporating those stats, the balancer would be thrown off track in some players' case no matter how much it retrieved, flushed or retracked ingame stats. Edit: Also forgot about the numerous times when entire matches' worth of stats just don't get recorded due to an as yet unsolved, data-eating bug (outlined in the last paragraph of this post), which would distort derived performance accuracy even further. [/edit]
Seeing as we're talking about just a few total figures here, with no informative breakdown time-wise of how they developed over the years for the better or the worse, the question of their reliability and relevance to their owner's current performance can neither be overlooked or adequately addressed - not without starting one's own ut2004stats scraping project that would store beyond a month's worth of data and could produce more recency-adjusted figures ...which would be an interesting undertaking for an entirely different set of reasons, but best not go down that path right now. The same prohibitive "staleness problem" would apply to similar considerations of using the nodes' and cores' ONS-specific totals for purposes of player performance, and thus overall team balance, assessment, it should be noted here too, btw. Point here is, beyond using 'em for general, historical appreciation/revisiting of one's past, Epic's server-stored, per-player stat totals are by far not the kind of figure one should be basing any inference of current player ability on, if accuracy of the metric matters at all to their goals.
Besides, the recorded victories or losses any player might've been a party to have likely been influenced by dozens of other factors affecting that outcome beyond said player's skills; hell, a favourable win/loss ratio total might even have little to do with actual fighting skill at all, and, instead, could just represent the track record of an efficient winning team switcher, for all anyone really knows. All told, it seems preferable to focus on personal metrics that are much more directly affected by a player's current/recent performance than anything else as a sound first step towards building up to a match balance measurement, and on that basis, ut2004stats figures would be a pretty unreliable place to start pulling data from in order to accurately construct such a bigger picture.
Onward to a suggestion that's more closely related to what EM already uses to assess balance, the notion has been put forth that perhaps PPH figures should be considered less relevant to estimating a player's skill, while shifting emphasis to nodes-related figures, as in total number of (operational or constructing) nodes a person has torn down or built during a match [on a certain map]; why not expect more fair balancing results by switching EM's player performance tracking to this metric, this theory suggests? Well, for one thing, and it feels exceedingly obvious stating this in an ONS-playing community, constructing and destroying nodes or cores - i.e., the game's objectives - both award points to players, namely up to 4 for the former, up to 5 for the second and up to 10 in gradations of 1 per 10% of core health knocked off. Those are added to players' scores and, as such, affect their PPH calculation, so this suggestion doesn't really involve using a metric independent of the currently tracked one.
It should also be noted here that this is an expression/outcrop of a specific n' oft-repeated (and perhaps seen by some as counter-cultural) mentality within the ONS community, especially during more recent years, which rejects automatically regarding a team's top scorers as the players deserving of the greater plaudits, and substitutes this with heaping praise on the "true heroes" that destroy, build or heal the nodes, per what might be regarded as the team's "proper" agenda. "Fair enough" might be an instinctual, first response to that, and there's certainly some valid points one could raise to support the notion - e.g. the impact of ONSPlus point inflation to scores, particularly in conjunction with certain types of overpowered custom gear. Also, you kinda need the nodes to win, so there's that too. Upon further consideration, however, one might find that embracing this maxim on an institutional level, such as it might be perceived if it were embedded in EvenMatch's tracking, can allow for more structural flaws n' misconceptions to creep into the ONS-playing community's collective conscience, forming an even worse divide in attitudes between players that could replace certain unsportsmanlike or self-serving habits (say, spawn killing or near-core vec sabotage) with others (point leeching from locked nodes comes to mind as an actual, if not rare, phenomenon from the past).
Since it's beyond the scope of this post to delve too deeply into this debate or the type of circumstances (or player archetypes) it usually tends to spark around, what's relevant to highlight in this discussion is that the ultimate goal of ONS is to successfully control a series of territories through skill and teamwork so as to reach n' destroy the enemy core. Given this, any player action that ethically (always important to remind this) assists in a team's chosen strategy towards claiming this series of territories, directly or indirectly, can, and should, legitimately be considered as contributing to the team - manifested in EM's design, too, by way of a corresponding performance tracking metric. These actions can often include a range of tactics that have little to do with nodes themselves, such as manning stationary turrets, linking damaged vehicles, proactive enemy interception, [neutral] resource denial, spotting n' advising on enemy movements, and so on. To hastily proclaim the dedicated node runners/tenders as the only worthwhile bunch while devaluing others' support to the team agenda wouldn't just be unwise and illogical (guess what happens when all 3 raiders link and nobody's on the lookout for incoming threats), but it would also foment pointless ideological division and, therefore, be wholly unproductive in the pursuit of better quality gameplay.
Nodes n' cores matter a good deal to ONS, and just so that nobody forgets it, they award points for successfully pursuing them pretty generously too, especially when one cuts off an entire enemy chain of 'em, thanks to ONSPlus. So does, however, taking out incoming enemies (or nukes), patching up friendly defending vehicles or shielding the node builders, to name just a few other good, supportive plays. The bottom line in all these cases is, one way or another they'll usually all translate to the same, measurable common denominator, namely points added to one's score. As much as this ...schismatic movement might try to dress it up in moralist terms, essentially its proclamation boils down to "our kind of playstyle's points (should) matter more than theirs because we do the Real Work™ and they don't". This conveniently also disregards the fact that there's several unsavoury ways for people to pad their score within any kinda playstyle, as long as they've set their mind to it. Rejiggering point tracking with a tighter focus on node activity won't change the fundamental reality of how skilled any specific player is or that some people will pursue their own agenda just to see themselves at the top of a scoreboard no matter what consequences that might have to their team. We might as well save ourselves yet another round of [cross-]community-wide values bickering, recognize this for what it is, which is an antagonistic exercise in playstyle preferentialism, and steer towards more constructive solutions, hopefully sooner than later.

Okay, so the previous part about unsuitable metrics might've veered slightly into principled rant territory, despite contrary initial intentions, so I'm gonna try to wrap up this final section on a more affirmative note regarding relevant or promising stats that EM can use when tracking player performance and assessing match balance.
To my mind, in terms of measuring n' recording personal contribution, PPH still remains the most informative metric to go with for the simple enough reason it covers most actions that tend to have some relevance to the team agenda since they'll usually yield some kinda point reward, whether by ONS' stock rules or through ONSPlus' extended provisions. For all the other stats I considered besides PPH, aforementioned and more exotic alike (amount of movement around the map without dying and average time between deaths per team were even two of those, partly inspired by Wormbo's unreleased analyzer tool), I couldn't find any that was completely resilient to data value contamination/interference in the sense of being affected by egotistical, or otherwise pointless/misguided playstyles that have little to do with team goals, I should also note here.
Going by a match's node/core build/wreck figures as an alternative EM-tracked stat seems to rival PPH in only the regard that it rates slightly higher at the "tolerance to irrelevant data" test, because in this case stats would be affected mostly just by crowd size, instead of half a dozen additional factors; one can rack up sizeable node build/takedown figures even on Torlan, for example, as long as there's only few defenders around. Moreover, a nodes/cores-derived stat is likelier to be more map-agnostic in terms of per-match variance, since personal PPH performance can vary wildly between maps, depending on size, playing crowd, type of included gear, etc., so it could be argued to be more reliable an indicator of player performance even across fewer measurements than PPH can. As a whole though, that's about where the advantages run out, as the more strict, objective-focused stat leaves a whole host of actions unaccounted for (i.e. what it can't track, which is far from an insignificant set), as mentioned earlier, so in total it cannot be argued to constitute a more comprehensive, personal efficiency assessing alternative to PPH.
Lastly, as far as evaluating a match's balance goes, a recent EM code examination reminds that the mod determines Progress stats using the following formula, to which I've made some crude interpretive adaptations here in an effort for the following bit to make sense in less lines and without having to paste the entire GetTeamProgress function's code:

Code: Select all

Progress[Team] = TeamCoreHealth{%age-wise} * Sqrt(MinOwnCoreDist){hops} + (NumNodes + 0.5 * NodesHealth{%age-wise,sum}) / MinEnemyCoreDist{hops}; during OT losing team's Progress is scaled by exposed core's health (percentage-wise), inverse used for dominating team

Overall Progress = Progress[Blue] / (Progress[Red] + Progress[Blue])
I wanna re-emphasize the "crudely adapted" part here once again before moving on with further commentary, so if something still doesn't make sense (and I wouldn't blame ya, the math is kinda dense here) or has been misrepresented here by accident, and your confusion's first inclination is to point a finger at someone, point it at me rather than Wormbo. Also, any corrections on the formula's substance are welcome.
Okay, so essentially what EM decides makes up the team Progress figure is partly how unharmed said team's core is in conjunction with enemy proximity to it (in hops), and that's added to the other part where the number of nodes it holds is added to the halved sum of their health's state (exposed ones only fractionally contributing to the count, so, say, 6 locked and 2 barely holding on might give a ~6.1) and the pair is weighed against minimum proximity (in hops) to the enemy core. All put together, teams' own Progress values can obviously go above 1.0, but the value of a match's total Progress was decided to range between 0.0 (when red's got it all) and 1.0 (when blue completely dominates), so the stat is derived from Progress[Blue] over both team's progress sum.
While I'd certainly be interesting to learn more about what thoughts or experience derived from elsewhere led to this particular configuration of the formula, the key takeaway here is that the raw metrics it's based on are node count for each team, respective cores' and nodes' health, and core exposure (measured in hops). When the time comes for EM to consider rebalancing - whether by player join/leave/spec events, someone calling "teams", or in fixed, admin-set intervals - a weighted combination of those four metrics will be taken into account, as Progress, to determine how many times swaps need to happen, based on the numeric threshold, in terms of who's winning and how many more players they have, that Wormbo laid out in this post, in the 2nd page of this thread. Again, not exactly an obvious arrangement of figures that one could look at and instinctively determine as well-calibrated or otherwise (suppose that's to be expected when one includes square roots and exponents in formulas :/), but, coarsely speaking, one can still get a sense for what logic the code is working off of.
While clarity in terms of what measurements are used in the code and what match aspects EM's looking at when assessing its state of balance is important, it's equally important to consider what the mod is choosing not to look at, and what the consequences of that design choice may be. In order to more effectively comment on that (hopefully), however, I'll have to ask that you indulge me in one final inclusion of abbreviated code snippets from EvenMatch's V2b2 MutTeamBalance class:

Code: Select all

[...]
function BalanceTeams()
[...]
   SizeDiff = TeamSizes[0] - TeamSizes[1];

   BiggerTeam = byte(SizeDiff < 0);
   LeadingTeam = byte(Level.GRI.Teams[0].Score <= Level.GRI.Teams[1].Score);
   
   if (SizeDiff == 0 || Abs(SizeDiff) == 1 && BiggerTeam != LeadingTeam)
      return; // no need to balance teams
[...]
{Label Red as "bigger team" if red has more players or the player count is equal, label Blue as "leading team" if blue is ahead in points or team score is equal.}

function bool RebalanceStillNeeded(int SizeOffset, float Progress, optional out byte BiggerTeam)
[...]
   SizeDiff = TeamSizes[0] - TeamSizes[1];
[...]
   // > 0 if red is larger, < 0 if blue is larger
   if (SizeDiff == 0) {
      BiggerTeam = 255;
      return false; // same size, don't rebalance
   }
   BiggerTeam = byte(SizeDiff < 0); // 0 if red is larger, 1 if blue is larger

   return Abs(BiggerTeam - Progress) < SmallTeamProgressThreshold ** Sqrt(1 / Abs(SizeDiff));
[...]
{In effect, only return a "yes" rebalancing recommendation if one team has more players and its current Progress has gotten closer to BiggerTeam than admin-set threshold to the power of Sqrt(1/Abs(SizeDiff).}
Please note that the text in curly brackets is not actually present in those code parts, but constitutes my own explanatory comments on it.

While it's sensible in most cases, one immediate, but less favourable, takeaway from the above code is that, with an equal amount of players at first round (i.e. at 0-0), no rebalancing will occur, even if it turns out to be what's usually characterized as a "blowout round", i.e. a quick, one-sided stomp.
Worse, in cases (read servers) where inter-round team reshuffling (bBalanceTeamsBetweenRounds) hasn't been enabled or MinDesiredFirstRoundDuration is set to 0 (no quick round reset mulligan), even when blue is ahead by 2pts (say, after a blowout round), if there's an equal amount of players or red has one more, no balancing will be deemed necessary because EvenMatch will consider red to still be in a better position to recover, and no other recourse will be available to make a repeat blowout outcome less likely. Even if a player join/leave/spec event takes place in this case, btw, the mod's options are limited while the round is underway because of "valuable" and "key" player considerations, so red will just have to hope it gets a PPH edge from it somehow.
What's at the core of EM's balancing logic is that teams are considered [un]evenly matched mainly with regard to player count, and then via current Progress; candidates for switching during rounds aren't considered based on any team PPH disparity that might need bridging ("crooked referee syndrome" aversion?). This means the trend of a team's rate of territory expansion or its Progress (a derivative metric thereof) within a round isn't considered actionable info by which to determine a need for rebalancing, even if it's spiking multiple times, and neither will be its total PPH advantage, unless bBalanceTeamsBetweenRounds is set to true. Unfortunate conclusion here is that EM's design in conjunction with bad admin setting decisions can leave sufficient room for balancing failure and subsequent erosion of trust in the mod - but never the admins' choices, curiously enough :/ - in the eyes of players.
To offer a recommendation after finally managing to grasp the bigger EvenMatch picture more clearly for the first time ( :)), I think there's room for the balancing assessment part of ActuallyCheckBalance to be strengthened against edge cases or unforeseen situations that players might consider obviously or flagrantly imbalanced post-match, but against which current EM versions may take little redressing action. The best way I could see that happening would be through including into the match balancing calculation (i.e. before switching players is decided) a couple more metrics, such as team PPH disparity, the situationality of which team is dominating or overwhelmed in the evaluating moment (a constituent part of Progress), but also the trend of territory possession (or Progress) change (i.e. a first derivative of either metric, which will carry a contextual difference between a defending team suddenly making big territory/Progress gains vs. a dominating one continuing to do so with a second or third push within regulation time), at different weights and/or combinations; territory being a bit more relevant to TPPH, and DominantTeam slightly more important than SizeDiff to my mind, perhaps even to LeadingTeam. As a minor note here, the pace of territory gains as a metric was something that came to mind when I was following to its logical conclusion what tended to be my most used manner of assessing match balance, which is the question of whether both teams have had a fair shot at each other's objectives beyond the midsection at least once in a round. This could translate into both teams having had at least one territory gains trend spike after most (all) nodes have been claimed (whether through a strong, defensive rebound or a successful surge), but I'm still not entirely sure this would be a necessary condition, so I'm only putting it out there as food for thought, even if it's still a bit raw.
As one final proposal here, casting as byte, but effectively treating as binary, the various team advantage-representing variables - which are supposed to accurately depict three distinct situations, let's not forget - might need some rethinking, as it introduces an unpredictable bias and allows for relevant context to be lost when equal player sizes or team scores are grouped together with cases in which one team's ahead; red outnumbering blue by one player and both teams having an equal amount of players are not the same thing. RebalanceStillNeeded already does it right when it branches out for SizeDiff == 0, maybe a few more forks could afford a similar degree of nuance to BalanceTeams? Just a thought.
As a concluding thought on the issue, one might ask here what is even the point in bothering to make further refinements to the mod's code just to cover up for underperformance exposure owed mostly to administrative oversight in configuring it effectively enough on a server, and especially in light of the blame from unbalanced matches usually being assigned disproportionately on EvenMatch rather than said server staff anyway. A valid point perhaps, but I'd still argue that, if there's room to meaningfully improve key aspects of the project and the ways to do so have been highlighted and confirmed to be of value and substance, one should consider that a compelling enough reason to go ahead and include them in a newer (final?) version even just for the sake of delivering a well-rounded project that can measure up to its original, conceptual aspirations n' serve as a guide for other developers pursuing similar goals in the future, regardless of current practical impact or surrounding politics. Also, it might improve players' ingame experience just a bit more, despite opposing factors, so there's that too.

Okay, after about three weeks' worth of slow [re]writing, heavy editing n' stitching parts together, I think it's finally time to zap this beast alive and send it shambling out the door - how season-fittingly spooky! It's possible it'll be too slow, or big (or dumb?) to catch up to recent developments, but, who knows, there's always the chance it might help with something down the road. Either way, as I'm sure I must've said somewhere amidst this sea of words, thanks for your support and ongoing work in improving player experience within this gametype, Wormbo. It's always, and highly, appreciated.


PS1:
Wormbo wrote:[...]Anyway, I don't really plan on any additional changes for EvenMatch beyond this point.[...]
Kinda understandable at this point. Still, if you're ever short on ideas, or just entertaining suggestions, for other, small-scale ONS mods of a gameplay-improving nature, don't hesitate to say so; I might even have a few that could pique your interest :).

PS2:
Wormbo wrote:So, the new version isn't even out yet, and it already affects player experience negatively on the Omni server. I'm equally confused and amused.
Finally worked out all the bugs in EvenMatch's causality-violating code n' managed to make the mod retroactively affect the past, huh? Bet it was some struct replication bug in the ArcaneAlienObject abstract class. Either way, great news! Now to check out the source once it's out n' see about embedding that bad boy in future map edits; so many hectic CEONSS matches back in 2010 or so I could be turning around with this :p!
Image

User avatar
Gaffer
Posts: 158
Joined: Tue 7. Jan 2014, 19:43
Description: Total arse
Location: Germany

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Gaffer » Mon 31. Oct 2016, 10:18

Ahh a vintage Pegasus post! Let me fetch my week's rations and a thesaurus before I settle down for good read.

PS: Sorry for not contributing to the thread in any way. These few (questionable) jokes I write are all I'm qualified to share :(

User avatar
Cat1981England
Administrator
Posts: 2232
Joined: Mon 23. Aug 2010, 15:35

Re: [Mutator] Even Match (Onslaught team balancer)

Postby Cat1981England » Mon 31. Oct 2016, 20:31

@ Peg

Win/loss: The idea was to use our own independent stats and not those from epic, just like we currently do with PPH. Everyone starts on 0, so any new players would be considered average to begin with. If you win you simply get a point, and if you lose - you lose one. Your team would be assigned at the start of the match and ignore manual moving (unless switched by TitanTeamFix or EM) so people wouldn't be able to manipulate the stats.

This seems like an undemanding and practical way of balancing which doesn't have the potential performance issues of map-by-map pph stats, or discrepancies produced by pph or node building/destroying both of which get skewed by the way (and maps) people play.

PPH and node work both ignore the contribution of intelligent and team oriented players who will hold a node (GunShop corners) or attack the corrected node at the right time (Dria).
The Universal Declaration of Human Rights, Article 1:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Zon3r
Posts: 536
Joined: Thu 7. Apr 2011, 07:46
Description: Don't shoot at me!

[Mutator] Even Match (Onslaught team balancer)

Postby Zon3r » Sun 19. Feb 2017, 19:14

Not sure if this is the right thread for this, the last few days, the balancer seems like it stopped working, keeps putting together unbalanced teams, i know you may argue that some players leave on the end of the match and new ones join for the new one, but even in the case of a quick round, it's making stupid decisions(one example is todays matches on bridge), devastating defeats make people leave. it was fine till now, but lately it's acting up, am i the only one who feels like this?
Image


Return to “The Creative Corner”



Who is online

Users browsing this forum: No registered users and 3 guests