[RUS] Обзор алгоритмов кластеризации данных / Data Mining / Хабрахабр

http://habrahabr.ru/blogs/data_mining/101338/#habracut short article [in Russian ] that covers basics for data clusterization (sorting vectors into groups, aka clusters)

Official Google Research Blog: All Our N-gram are Belong to You

http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html google word frequency dataset, n-gram models. Donwload or DVDs are available from U of Penn. ~24GB.

The different attitudes of computer scientists and economists - A Computer Scientist in a Business School

http://behind-the-enemy-lines.blogspot.com/2009/09/different-attitudes-of-computer.html interesting article (and comments, yes, read comments) about what data models actually mean

Datawocky: More data usually beats better algorithms

http://anand.typepad.com/datawocky/2008/03/more-data-usual.html interesting post (it made to /.) about dataning and statistics. make sure u read comments as well.

Evolution and Wisdom of Crowds

http://karmatics.com/docs/evolution-and-wisdom-of-crowds.html excellent essay! “algorithm of evolution” describes predictive-markets and statistical prediction