Back to basics – project coding

Going back to basics to sharpen the skills with some coding.  Today, I’m starting a project to use the BLS dataset.  The dataset is massive:

BLS Mnemonic Structure

And my primary objective is to acquire a certain level of comfort with this dataset.  I want to automate the data pull and mess around with various features of this dataset.  I plan to implement a hierarchical dynamic factor model using the national and regional aspects of leisure, employment, and wages.  The factor model I’ll be implementing comes from the following paper:

Banbura, Giannone, Modugno, Reichlin (BGMR)

I’ve chosen this model, because it is written in Matlab and I want to eventually build on the nowcasting capacity of the model.   I also happen to be fairly familiar with the mechanics from my previous work experience.

With the dataset and model in mind, I will then be able to work on automating the content to appear on this WordPress blog, and will then start thinking about how to archive the results.  Any model from this dataset should run on Quarterly and Monthly data, and I should aim to produce an output every month when the BLS calendar dictates.  The BLS has pretty crummy vintage data availability, unlike the Federal Reserve Economic Database (FRED) with archival version (ALFRED). For this reason I’ll mainly concern myself with getting a working version of current data release results up in an automated way.

Next steps for this project look like storing the generated output (charts and csv) somewhere public so I don’t have to worry about storage costs.  Potentially, I’d then like to experiment with Google trends to add data of higher frequency.

For data collection I plan to make use of bls-matlab written by a friend Micah Smith.  I want to focus on the spatial aggregation of the data, and how using a tier version of the BGMR DFM might yield interesting results about the cross-section for employment and activity dynamics in various industries for the US.  This is loosely similar to the concept I had submitted to the NSF.



Basketball – Nothing but Net

I recently had a horrible morning playing hoops.  I had a particularly unlucky set of shots, which rimmed out.  For anyone who plays basketball, we know how that feels.  We also know how great it feels to hit nothing but net, sinking the shot with a rather inspiring sound…”swish…water baby.”  Although this “nothing but net” event happens in almost all pickup games, even a novice can get lucky, the majority of shots hit the rim or the backboard.  It is rather rare for a shot to go straight in.  I was curious what an errant angle of say 1% could have on my shot inside the free throw line.  I thought maybe you could calculate the likelihood of hitting a shot from various distances on the floor assuming a given level of error in the shot attempt.  Turns out there is a deep literature pertaining to shot accuracy…duh no brainer it’s a professional sport.  I was just assuming the error in shot accuracy left/right, but there is of course up/down, rotation, and 1 paper I found even discusses the following:

“Nonlinear ordinary differential equations describe three components of ball angular velocity and contact point position on the toroidal rim.  The model includes radial ball compliance and dissipation and contains three sub-models describing slipping contact, nonslipping contact and purely gravitational flight.”

Purely gravitational flight…are we simplifying the model too much without applying relativity and the Coriolus force.  Damn, these guys went pretty serious on the explanatory variables for making a shot in basketball.

I’ve always been more of the empirical type, definitely not interested in partial differential equations for theorizing about basketball performance.  This leads me to the real questions I haven’t been able to fully understand:

  • Percentage of “nothing but net” shots in a professional league like the NBA.  Maybe comparison to european leagues, and see if any significant difference.

Outside the purely analytical, a friend once tried explaining to me that the game of American Football has a statistically perfect allocation of points [7 Touchdown (assuming given extra point) and 3 field goal].  My friend was trying to convince me that the ratio of 3/7 is the exact representation of difficulty for obtaining the results of field goal and touchdown respectively.  I remembered this discussion on leaving this miserable day on the basketball court.  The next question I have pertains to the theoretical model of shot accuracy at a specified distance.

  • Given the vast literature on the subject, what is the statistically appropriate distance of the 3 point line?  Given the probability of a players errant shot, equal distribution of angles where the shot originated on the court, and the distance effect/function on the player error.  The literature should yield a 3 point line distance, under a given amount of player error, such that the ratio 2/3 (2 pointer to 3 pointer) is a  statistically appropriate reward for the probability of success of a shot under accuracy assumptions.
    • One could then take the model to the data, and check the empirical level of accuracy by looking at shots that touch the rim.
    • With empirical level of accuracy, could calculate the statistical distance for the 3 point line.  A 4 point line is in talks for the NBA… how far back should it be.


With these questions in mind, I took a look for shot accuracy, as defined by rim contact.  It doesn’t appear that this dataset exists.  It could be a potentially useful set to collect given the NBA 4 point line questions.  Would need this rim shot (or nothing but net) datasets to begin this type of empirical analysis.

This is exactly the type of analysis I was looking for:

I had a few students in introductory econometrics, who developed an NBA scrapping tool.  They were able to collect stats on dunks and other miscellaneous statistics.

Looking through the NBA play by play there is no record of “nothing but net.”  Very sad.  If there were, it could help in describing shot accuracy.  Maybe it’s just a tool in describing the Steph Curry anomalous ability, maybe just another statistic in a world of big data, but… seems this statistics could be more informative for player scouting and league development than dunks.  Gathering data, not always easy…