PyData on Vimeo

15 11 2012

PyData on Vimeo

There was a time when the go to machine learning library was Weka, a behemoth of a Java library.

Advertisements




Learn how to jam

15 11 2012

LTE can be disabled with a $650 jammer, but not really | ExtremeTech

the MIT Technology Review boldly stated that LTE networks can be easily shut down with a “simple jamming trick” utilizing a cheap software-defined radio. However, they are incorrect in believing that the trick shuts down the network, or can shut it down city-wide.





Sarah Jessica Parker

14 11 2012

OK, this post has actually very little to do with the actress. It’s just there’s this guy in YouTube, commenting my comment on a Doctor Who clip, which was meant to refer to the Sarah Jessica Parker’s horse joke. He/she said that “Sarah” or “Jessica” is not a common name in the western time. Well, I don’t want to start a fight with him/her, it’s just interestingly I’d been playing around with American baby name historical data, and so I’m really tempted to figure out how popular those names are.

First as a background, the referred Doctor Who episode is called “A Town Called Mercy“, aired in September this year. The episode features the Doctor going back to the wild wild west time, with cowboys and stuff. In one scene, the Doctor, claiming that he speaks horse, contradicts a preacher, saying that the horse he’s about to ride is called Susan, not Joshua as the preacher had claimed.

There was no mention of the year where the whole story takes place, so I can only infer it from a dialogue between the Doctor and Rory:

The Doctor: That’s not right

Rory: It’s a street lamp.

The Doctor: An electric about ten years too early.

Rory: That’s only a few years out.

The Doctor: That’s what you said when you left your phone charger in Henry VIII’s own suite.

Given that Thomas Edison invented the electric light bulb in 1879, the event must have taken place around 1860-1870s. My dataset started in 1880, so it’s actually not so far off.

So I just went on doing some processing of the data using pandas, and get the percentage over time of the name “Sarah” over the whole American population:

Hmm, in 1880 the name “Sarah” constituted about 1.3 percent of the overall population. As we will see later, this is actually quite high. Maybe not so surprising because Sarah is sort of a biblical name (well, I guess; at least it has some religious flavour in Islam, I suppose it’s pretty much the same in the Bible..). Extrapolate back in time, looking at the graph, it could as well be higher before 1880. So that kid’s comment is invalid! Sarah is a popular name…

By the way I don’t differentiate if the name is a boy’s or a girl’s name (the dataset actually does). I just sum up the statistics of both as that is the only interesting number for my analysis.

Then another plot for “Jessica”:

All right, “Jessica” seems to be a modern-world phenomenon. It gained some popularity in the 1960s, peaked up in late 1980s, and has lost its popularity since then. Now, “Parker”:

Hmmm…. the name “Parker” is even a more recent phenomenon. What I find really interesting is the glitch slightly after the year 2000 and the continuing popularity of this name. Just go ahead let your imagination free and relate this phenomenon with the release year of Spiderman the movie… (Peter Parker, that is. You’re welcome.)

Now, about the relative proportion of the name “Sarah”; the following plot is a segment of the first plot, between 1880 and 1890, overlaid on the average proportion of all names in each year:

Here’s what it means: any (boy or girl) name between 1880 and 1890 constitutes in average only about 0.09 percent of the population. With 1.3 percent, “Sarah” is actually quite popular…

What can we learn from this? If a horse in 1870 claims that her name is Sarah, we really should believe it. If a horse today claims that her name is Sarah Jessica Parker, I think that’s quite possible as well.

UPDATE. The name “Susan” is actually less popular than “Sarah” in 1880





Clean code is…

13 11 2012

Andrey Paramonov : Weblog

Ward Cunningham: You know you are working on clean code when each routine you read turns out to be pretty much what you expected. You can call it beautiful code when the code also makes it look like the language was made for the problem.





Weekend bookmark

12 11 2012

…full of data analysis and machine learning stuffs….

  • Google refinea power tool for working with messy data. Nice, but kinda slow when I loaded the US presidential candidate donation data from Wes McKinney’s Pandas tutorial (ca. 500000 lines; I use an MBA with 4GB of RAM, ca. 1 gig was available when loading the data to Refine….)
  • scikit-learn, machine learning in Python. A MUST TRY.
  • mrjob, Yelp‘s open sourced mapreduce package for Python.
  • dumbo, another Python mapreduce package. Not sure why Yelp created a new library (mrjob, that is) for the same purpose…
  • Nominatim, kinda nice tool to get (latitude, longitude) coordinate from addresses or vice versa.
  • Seven Python libraries you should now about….
  • and, what seems to be the most exciting so far: Ramp, rapid machine learning prototyping, essentially a pandas wrapper around Python’s various machine learning and statistics libraries (scikit-learn, rpy2, etc.). 

A time full of excitement awaits us, folks…





Stepping through interactive commands in Python

9 11 2012

This week’s gem: Python pexpect module (http://www.noah.org/wiki/pexpect).

Annoyed by some test software that forces me to step through commands in interactive mode (telnet-like) I decided to look for something in Python to make my life easier. I started looking in subprocess but didn’t find anything useful. Someone in a forum suggested pexpect instead and I was thrilled….

So, for example, instead of typing “telnet localhost 7777” manually again and again (see here), I could’ve written a code below once, and use it as many time as I want, with very little typing:

import pexpect as pe

ch = pe.spawn('telnet localhost 7777')
ch.expect('mip6d')
ch.sendline('verbose yes')
ch.expect('mip6d')
ch.sendline('bul')
ch.expect('mps')

out = ch.before

ch.sendline('quit')

“out” contains now a string that can be parsed to get whatever information I want.

Me 1:0 Computer





Efficient market hypothesis

28 10 2012

Efficient market hypothesis roughly says that the “market” is an efficient information processor, in the sense that it processes all news that may affect the performance, hence the value, of a company, and instantaneously reflects it as its share price (as the resultant of sell-buy actions of the actors).

If this hypothesis is true, then trying to find an “edge” in the market through news gathering and forecasting/modelling, is an effort in vain. Because one tries to find the actual value of a company, through its book value, news, intrinsic value (the present value of a company’s future dividends), and compares it with the market value, and decide to buy or sell the company’s share (i.e. hope that the market will eventually reveal the true value of the company).