Monday, July 24, 2006

Python metablog, newsgroup, mail list, digest, survey, aggregator, portal, manual summary, link dumps

Python has a bunch (ruby has more: monstrous blogroll
loosely mirrored at

3 areas to become proficient in:
- metaprogramming (the links below),
- Design Pattern / O-0 design (no recommendations yet), and
- unit testing / Test-driven Development(NRY also/either)

python-dev read this 1st

Py wiki table contents daily URL 90's style link dump Assoc. Francophone PY, beaucoup de renseignements sur le Py probably more ruby/rails mindshare now French guys work on Zope, etc YABR
Dr. Dobb's Python-URL No archive, but good to peruse

Sunday, July 09, 2006

NLP, IR, Data Mining books

- Jurafsky/ Martin updating SLP. Half the chapter drafts are up.

- Manning and Schütze, Foundations Statistical NLP. Get 6th printing, 2003, with most of critical errata folded in (still have to look at errata ) <OT>why is it so hard to find at the Stanford bookstore? </OT>

- Manning, Prabhakar Raghavan, Schütze: IR book out in Oct.

- Jackson and Moulinier, NLP Online Apps (2002) recommended, 200 page survey Retrieval, Info Extraction, Category /cluster, Text Mining. footnotes on implementing: "labor intensive", "regular maintenance" etc.

- Oxford Handbook Comp Linguistics review here)
- Charniak, Statistical Language Learning, 1993
- Norvig and Russell, AI

- Bod, Hay, Jannedy, eds. Probabilistic Linguistics

- Hal Daumé's excellent blog also suggests Allen's text as well
- Alias-I's Bob Carpenter's 20-book Amazon list. Excellent
Dissertations i like: Klein, Finn;
List of disserts from this UMass student, again via Hal Daumé's blog
- Witten and Frank, Data Mining
- Han and Kamber, Data Mining
- Hand, Mannila, Smyth, Data Mining

- Kumar, Intro Data Mining
- Chakrabati, Mining the Web. Excellent. Fairly rigorous math, covers conditional probability modelling, supervised/semi- / and unsupervised inference, how to build crawlers, graph algorithms (HITS, pagerank)

, Survey Text Mining
- Springer is doing a series of Web Intelligence books.

- Hatcher, Lucene in Action thorough coverage Java open source search engine (how to roll it out, not architectural / algorithmic detail)

- Hemenway/Calishain: Spidering Hacks. Pragmatics, so your corpus gets collected in finite time. "Perl examples easily translated to ruby/python" is all I'll say about that ;-|

- Google Hacks
- Berry and Browne, Understand Search Engines (2nd ed.)
Excellent, 100 pages intro to mechanics: stemming/stopwords, tf/idf, QR & SVD in C,
Not discussed: -other matrix decompositions;
-all the wrappers around SVDPACKC:
- by Doug Rohde
- by Stanford Computational Semantics lab
- by UT-Austin
Data Mining has hit the mass market: Borders misshelves Data Mining books with books for Oracle DBAs. A few bookstores stock non-trivial # above books: Barnes&Noble NYC, Seminary Book Co-op Chicago, Powells Portland, Stanford Bookstore. Cody's on Telegraph(Berkeley) *was* a wonderful bookstore. There must be others in LA/Pasadena and Seattle or Vancouver as well.

Browse Amazon, click on "Customers who bought this also bought:". They're pretty good at clusterin ;-}