Exploring US Healthcare data

A few days ago, the Centers for Medicare and Medicaid Services (CMS) released some unprecedented data on the US healthcare system. The data consists of 9 million rows showing how much each doctor in the US charged Medicare, for what, and how much Medicare paid out. It doesn't quite cover everything (for example, services with less than 11 beneficiaries were removed for privacy reasons), but its the best thing we've got. Immediately after the release,... »
Author's profile picture Vik Paruchuri on r, python, data, visualization, healthcare, and us

Simple speech recognition in Python

Sometime today, I got the idea to try to do automatic speech recognition. Speech recognition, even though it is widely used (and is on our phones), still seems kind of sci-fi-ish to me. The thought of running it on your own computer is still pretty exciting. I looked for open source libraries, and was pleasantly surprised to find Sphinx, a CMU project. It has python bindings, and even lets you train your own language models... »
Author's profile picture Vik Paruchuri on python, speech, and scribe

An easy way to get started with automated essay scoring

Wow, it's been way too long since I have updated this blog! I am going to start making more frequent updates, and I have some cool things in the pipeline, so bear with me. Last year, I wrote this post on automated essay scoring. This was essentially distilling my experience with automated essay scoring and trying to introduce it to people unfamiliar with it. I got a lot of great reaction, and based on this... »
Author's profile picture Vik Paruchuri on aes, essay, scoring, and python

What makes us happy? Lets look at data to find out.

I've had a lot of different jobs over the past 4 years, and I've had some incredible experiences along the way. Lately, I've been struggling with what to do next. Or perhaps more accurately, I've been struggling with how to decide what to do next. Decisions that seem obvious in hindsight are tough to come to grips with beforehand, and it's led me to think about what metric I am trying to maximize. I admit... »
Author's profile picture Vik Paruchuri on happiness, happsee, somerville, r, leaflet, kaggle, and causation

Open sourcing movide, a student-centric learning platform

I haven't blogged in a while, mostly because I have been trying to figure out what I should do next. One thing that I have been working on lately that I am very passionate about is Movide. Movide is a student-centric learning platform. You might yawn at this point and wonder why Movide matters. It's a natural reaction, given the crowded learning tools marketplace. Movide, matters, I think, because it is an open source attempt... »
Author's profile picture Vik Paruchuri on lms, movide, learning, education, edtech, open, and source

The power, and danger, of visualizations

I recently posted about visualizing the voting patterns of senators. In the post, I scraped voting data for each senator on every vote in the 113th Congress from the Senate website, and then assigned a code of 0 for a no vote on a particular issue, 1 for a yes vote, 2 for abstention, and 3 if the senator was not in office at the time of the vote (ie, a senator was switched mid-term).... »
Author's profile picture Vik Paruchuri on r, senate, visualization, mistakes, and svd

On the automated scoring of essays and the lessons learned along the way

We've all written essays, primarily while we were in school. The sometimes enjoyable process of researching the topic and composing the paper can take hours and hours of careful work. Given this, people react badly to the notion that their essays may be scored not by a human teacher, but by machine. A piece of software coldly judging the quality of our carefully constructed phrases and metaphors based on unknown criteria is more than most... »
Author's profile picture Vik Paruchuri on aes, asap, kaggle, edx, essay, scoring, discern, ease, and python

How divided is the Senate?

I very seldom pay attention to politics directly, because politics have always seemed a bit circular and cyclical to me. Most of the political news that I take in ends up worming its way into the news sources that I do consume, like the excellent longform.org. Even given my limited intake of political news, one trend that I have noticed lately is the increasing number of references to the Senate as "polarized" or "divided." Here... »
Author's profile picture Vik Paruchuri on r, politics, senate, democrats, republicans, congress, and python

Programming instrumental music from scratch

I recently posted about automatically making music. The algorithm that I made pulled out interesting sequences of music from existing songs and remixed them. While this worked reasonably well, it also didn't have full control over the basics of the music; it wasn't actually specifying which instruments to use, or what notes to play. Maybe I'm being a control freak, but it would be nice to have complete control over exactly what is being played... »
Author's profile picture Vik Paruchuri on r, python, music, markov chains, genetic algorithms, machine learning, ml, and instruments

Evolve your own beats -- automatically generating music via algorithms

Update: you can find the next post in this series here. I recently went to an excellent music meetup where people spoke about the intersection of music and technology. One speaker in particular talked about how music is now being generated by computer. Music has always fascinated me. It can make us feel emotions in a way few media can. Sadly, I have always been unable to play an instrument well. Generating music by computer... »
Author's profile picture Vik Paruchuri on music, audio, sound, r, python, and remix

Making infographics using R and Inkscape

I have been making charts with R for almost as long as I have been using R, and with good reason: R is an amazing tool for filtering and visualizing data. With R, and particularly if we use the excellent ggplot2 library, we can go from raw data to compelling visualization in minutes. But what if we want to give our visualizations an extra kick? What if we want to do some manual retouching? I... »
Author's profile picture Vik Paruchuri on r, infographics, inkscape, plotting, ggplot2, and visualization

Do the Simpsons characters like each other?

One day, while I was walking around Cambridge, I had a random thought -- how do the characters on the Simpsons feel about each other? It doesn't take long to figure out how Homer feels about Flanders (hint: he doesn't always like him), or how Burns feels about everyone, but how does Marge feel about Bart? How does Flanders feel about Homer? I then realized that I work with algorithms -- maybe I would be... »
Author's profile picture Vik Paruchuri on r, python, machine learning, simpsons, sentiment analysis, nlp, and audio

Using the power of sound to figure out which Simpsons character is speaking

Update: you can find the next post in this series here. In a previous post, I looked at transcripts of Simpsons episodes and tried to figure out which character was speaking which line. This worked decently, but it wasn't great. It gave us memorable scenes like this one: Homer : D'oh! A deer! A female deer. Marge : Son, you're okay! Bart : Dad, I can't let you sell him. Stampy and I are friends.... »
Author's profile picture Vik Paruchuri on nlp, simpsons, r, python, ml, and machine learning

Figuring out which Simpsons character is speaking

Update: you can find the next post in this series here. You probably have a favorite Simpsons character. Maybe you hope to someday block out the sun, Mr. Burns style, maybe you enjoy Homer's skill in averting meltdowns, or maybe you identify with Lisa's struggles for acceptance. Through its characters, the Simpsons made a huge impact on a generation, and although the show is still running, my best memories will be of the early seasons.... »
Author's profile picture Vik Paruchuri on simpsons, percept, machine learning, ml, clustering, python, and r

Find the determinant of a matrix

The determinant of a matrix is a number associated with a square (nxn) matrix. The determinant can tell us if columns are linearly correlated, if a system has any nonzero solutions, and if a matrix is invertible. See the wikipedia entry for more details on this. Computing a determinant is key to a lot of linear algebra, and by extension, to a lot of machine learning. It is easy to calculate the determinant for a... »
Author's profile picture Vik Paruchuri on math, ml, machine learning, python, and matrix