I finished the original May 2011 version of this article and its linguistic metaphor a few days before coming across an article that described research showing the feasibility of identifying language patterns over encrypted channels.
One goal of an encryption algorithm is to create diffusion of the original content in order to camouflage that content’s structure. For example, diffusion applied to a long English text, say one of Iain M. Bank’s novels would reduce the frequency of the letter
e from the most common one to (ideally) an equally common frequency within the encrypted output (aka ciphertext).
The confusion property of an encryption algorithm would obscure meaning with something like replacing every letter
e with the letter
z, but that wouldn’t affect how frequently the letter appears – hence the need for diffusion.
Check out the Communication Theory of Secrecy Systems by Claude Shannon for the original (and far superior) explanation of these concepts.
There have been analyses of SSL and SSH that demonstrated how it was possible to infer the length of encrypted content (which therefore might reveal the length of a password even if the content is not known) or guess whether content was HTML or an image. The Skype analysis is a fantastic example of looking for structure within an encrypted stream and making inferences from those observations.