I have a new article1 on Mashable regarding the importance of having https:// in front of the web sites you visit.
I finished that article and its linguistic metaphor a few days before coming across an article2 on El Reg that describes research3 showing the feasibility of identifying language patterns over encrypted channels.
One goal of an encryption algorithm is to create diffusion of the original content in order to camouflage the content’s structure. For example, diffusion applied to a long English text, say one of Iain M. Bank’s novels, would reduce the frequency of the letter ‘e’ from the most common letter to (ideally) an equally common frequency within the ciphertext. The confusion property of an encryption algorithm would do something like replace every letter ‘e’ with the letter ‘z’, but that wouldn’t affect how frequently the letter appears — hence the need for diffusion.
There have been similar analyses of SSL and SSH in the past that demonstrated it was possible to infer the length of the encrypted content (which might reveal the length of a password even if the content is not known) or guess whether content was HTML or an image. The Skype analysis is a fantastic example of looking for structure within an encrypted stream and making inferences from those observations.