[Nel] TCP vs. UDP

Thu, 5 Jul 2001 12:05:07 -0700

----- Original Message -----
From: "Vincent Archer" <archer@frmug.org>
Sent: Thursday, July 05, 2001 9:04 AM

> This post (and the rest of the discussion) do highlight the problems of
> AO. However, only one post, in the whole thread, seems to be close to the
> real "problem".
>
> I have two friends who are playing AO together. They often experience
> "bad lag" (i.e. 20-30s delays between a command and it's execution).
> However, there's one strange thing during these periods of bad lag.

I also play AO but every time I've experienced real "bad lag" you cannot sit
or do any action that requires server-side confirmation including ALL chat
channels.  In fact, I frequently will say something or shout something so
that I know as soon as it shows up it's done lagging and this works like a
charm.  What they're experiencing is a different type of lag than is
discussed in the above post, that's when a particular zone they're in lags
(or the server hosting that zone) but not all servers are affected.  I
haven't experienced that type of lag since beta.

As a side note I've noticed that when I get the "bad lag" others around me
appear to get it too (at least sometimes) which lends weight to the packet
storm theory.

> My guess is that their architecture is based on a front-end/zone service
> model. Clients connect to a front end, and said front-end connects to a
> zone service, depending on the zone you are in. This is further supported
> by various analysis points during beta, notably when the zone service
> crashed while I was in it (and the whole mission dungeon got resert and
> randomly re-rolled), and the numerous problems people have for zoning
> (zone... after a strangely fixed 45s - the default TCP connection
timeout -
>  you get "Area Change not initiated on server).
>
> So you have:
>
> Client ---- TCP ----> Front End ---- TCP ----> Zone server
>                           ^                  /
>                           |                 /
>                           V                /
> Client ---- TCP ----> Front End ---- TCP -/
>
> which is probably the worst architecture I can imagine, specially as there
> appears to be one front-end per client, and front ends closes and opens
> communication to zone servers. :(

There doesn't need to be one front-end per client.  There can be several
load-balanced front-ends that handle multiple clients each.  The major
problem with this is if one front-end crashes all those clients get dropped
(although under UNIX (I haven't been able to get win32 to do this) you can
pull some funky sockets tricks and recover from a crash without dropping
most players, just majorly lagging them & loosing some updates).

The good side to this is you only need one connection per client per
protocol (so 2 connections if using both TCP and UDP).  Unfortunately with
TCP that's both a pro and a con.  With one TCP connection a dropped packet
on a chat message delays all other TCP traffic, but it also lessens
bandwidth and server resources over multiple connections (larger, more
efficient packets).  Also, with a single front end you can have as many
seperate services as you want without having to have a ton of different
connections to the client.

Regardless, we have no data as to wether or not AO is doing it that way.
Maybe tonight I'll run it in windowed mode and check netstat.  If we've got
more than one active TCP connection to Funcom servers than that model
probably isn't what they're using.

On a side note, using multiple TCP connections would eliminate some of the
packet-loss latency issues at the cost of increased bandwidth.  Say you have
one connection for chat channels, one for inventory & stat handling, one for
world actions and one for combat.  If connection 1 drops a packet its lag
won't affect the other connections as much.  But of course if they all drop
packets at the same time we get the packet storm problem again. :)

> Packet loss is a non-sequitur under TCP. You *cannot* lose packets under
TCP :)
> (you lose connection first)

Yes but TCP has latency issues, UDP has packet-loss issues.  Why can't we
have the uber protocol that has neither??? :)

BTW, does anyone know if ipv6 has addressed this issue?  I'm aware of QoS
but not sure to what degree they've taken it.  Personally I think the only
way we could get garunteed delivery with low latency is to have each router
along the way garuntee a packet is delivered (if, of course, it's load
allows that packet to be accepted in the first place).  That way if a packet
is dropped by a router due to load (or some other issue) the previous router
expects a timely response and when it doesn't get one it resends or sends
via a different route.  (Of course I would expect a per-packet ack, probably
a CRC-ack for a certain amount of traffic)  The point being the original
sender should never have to resend as long as the first router gets all the
packets.

-E.J. Wilburn
zane@supernova.org