Due to a setback in how my nfl project was progressing in the handling of game statistics, I decided to move back to my nba project. The nba project was my first attempt at the scraping idea, but grew stagnant as it moved towards MVP. Though I had successfully built (and then gone back and tested) the code to download the game data and player data (and then rebuilt it when the original connection method failed), I had never actually gotten skilled enough to write Ruby or Rails code to query (analyze) data in any useful way, so I decided that now was the time.

I enjoy building SQL queries. Be it with my own database that I built from scratch or being given an older database with tables and relations built (sometimes very poorly documented), I really enjoy figuring out how to extract the data that people want to see. It’s fun. In fact one of the more fun things I’ve ever done was help out someone determine the daily standings of a major league baseball season given an excel sheet with the records of every team every day (oh yeah, it had to be by division too). It was a long time ago, but I remember that it required two aliases of the same table and took me about three hours to figure out.

Rails does allow you to use raw SQL to query your data, but it’s not ideal, if you don’t have to. Not only is it a bit clunky, it isn’t portable to other databases systems all the time, and it can create security holes if you aren’t really careful. Rails provides a variety of builtin methods to be able to write your queries through the Rails model and this not only creates a layer of protection against attack (if you write them properly) but also makes the query writing database independent. Rails will translate the code into the proper query for whatever database system you are connected to which is pretty cool.

A big part of the motivation for this attention change was that the process has started yielding excellent fruit. I still remember the moment (to me) that the process started. I was driving home in my car (I’m in California as I write this) listening to the NBA draft. I whooped in my car, for I knew that the the trade that the Sixers had just made, for a player who would have probably been number one if not injured indicated a tectonic shift in how the Philadelphia Seventy-Sixers were going to approach team building, and it was a shift I had been hoping for since The Answer left town. I realize it’s not really relevant to this specifically, but motivation for something, I find, is always helpful.

I have discussed in previous articles how I would obtain a useful data set, use it to populate the test database, and clean up the test database. The fact that I had set this up previously meant I had a solid amount of data (one month of games from 2015) with which I could run tests against, but first I had to know what numbers I was actually looking for, but fortunately, I realized, I could do that as well, in a way that wouldn’t (I hoped) circumvent the learning and testing process.

Finding the Proper Result to Test

The proper way to write your test driven code as stated before is red-green-refactor. At first your test must fail. Then you write the code to make it pass. To be able to write a failing test properly, I would need to know not only what statistics I wanted to work with, but what the result should be before I wrote the code to extract it through and Active Record Query.

Lucky for me, I realized that the articles linked above, which focused purely on the testing environment, provided me with a way to solve this conundrum. Making sure I was in the right folder on the command line, I used the same command I had used in an earlier article, but just changed the target database: psql -d nba_development -f spec/dumps/query_data_source.sql

So, now I had my data to work with, and was ready to start the testing process by writing my failing test, but to know what the test result should be, I would need to access the development database via the Postgres interface and write the raw SQL query that I wanted to convert into Ruby on Rails code, and I decided to start with an easy one first.

Select SUM(twosmade) * 1.0 / SUM(twostaken) 
FROM statistics;

Note: The 1.0 is necessary as this data is stored as whole integers in Postgres, so without converting one of the values to a decimal, you’d end up with zero as a result

This query in postgres yielded a very long decimal, which is fine, but I wanted my method to yield a percentage to two decimal places, so converting 0.479280297 from Postgres, I could write my test and expect the answer to be 47.93.

For those wondering, I store player game statistics as twos, threes, and frees, made and taken, as opposed to the standard field goals (twos and threes together) storage in box scores. I always felt it should be more like this, plus the consistency of the naming helped in building one class method when I got to it

I ran a similar SQL query on both the frees and threes option and determined what those results would be as well, and then I was ready to work on the tests and build the methods in Ruby to get the same results.

Translating the SQL into Ruby

The first step, as usual, is to write the tests that fail, so I can write the code that makes them pass. The key line is:

expect(Statistic.averages("twos").to eq 47.93

The averages would be a method I would build that would take one parameter (in the example above, twos), and then it would process the made and taken information for the set of data to yield a percentage like the 47.93 above in the ruby code.

To get the answer I would want, I would have to use the built-in to_f method. When dividing two integers, the default Ruby response is to return the whole number portion only and ignore any kind of decimal/remainder option. In this case if i just divided the two quantities, I would end up with zero. A second issue that would have to be dealt with is rounding. If you use the to_f without any rounding you’ll get a very long answer like in the SQL query above. I realize I could clean it up in the view, but it makes sense to me to clean it up in the method in the model so the data is clean from the jump.

Remembering what I had learned earlier in my NFL project, I came up with a method on the statistic model that did the trick:

  def self.averages(stat)
    made = pluck("#{stat}made".to_sym)
    taken = pluck("#{stat}taken".to_sym)
    (made.sum/taken.sum.to_f).round(4) * 100
  end 

As opposed to the previous example in the NFL project, the pluck working on only one attribute returns a simple array that can be easily summed. A little tweaking on exactly how to format the to_f within the parentheses was required, but that code above yielded the 47.93 properly and the first test passed, so now I just had to write the tests for threes and frees and they’d pass with flying colors as well, or so I thought.

The Trouble with Floats

The method I wrote that worked on the twos rounds a float, and since the first test I wrote worked, I didn’t put a lot of thought into what a float is or that rounding floats within Ruby and Rails isn’t always precise. (I can’t point to a specific link but if you search rounding floats on any variety of search engines you’ll find a lot of information about it). Thus, it turns out the success of my first test working as written was a bit of a fluke that I was unaware of, until the second test.

The second test, on threes was written similarly to the first test on twos:

  expect(Statistic.averages("threes").to eq 34.91

Since the test on twos passed, this should have just been confirmation that the method worked on another similarly set up group of attributes, but, the test failed, with this message:

expected: 34.91
     got: 34.910000000000004

This was a rather surprising outcome as the result that was got seems to violate the .round(4) portion of the written method. This issue to be dealt with and since the research I did online did indicate that this was an issue inherent working with floats, I had to hope that RSpec would have a solution that I could work with.

Using eq tests equality. That means that your result has to match exactly what you said you were looking for. From looking at it above, the answer mostly matches 34.91, but it’s not exactly 34.91. Therefore, I had to find a way with RSpec to say that my result should be close to a given value. A little research via google queries led me to to the be_within matcher functionality in RSpec.

This seemed like it would solve the issue I was having, so I rewrote my test expectation to use the newly discovered be_within option:

expect(Statistic.averages("threes").to be_within(.001).of(34.91)

I ran my test expecting success, and sadly, it failed. It failed not because I had used be_within improperly but an examination of the error text found this tidbit no . floating literal anymore; put 0 before dot (SyntaxError). So it seems at some point you could write .001 in Rails, but now you had to use the leading 0. .001 bad, 0.001 good. So one more small tweak to the test expectation:

expect(Statistic.averages("threes").to be_within(0.001).of(34.91)

and voila, that test had passed. So, then I moved on to testing the frees in a similar method, and that test passed on the first try. A quick refactoring of my first test (since the eq working was more of a fluke it seems), and I had conquered my first, albeit low, hurdle in trying to convert what I knew I could create in SQL into code within Ruby on Rails.

So there you have it. One small step for John, one giant step for, um John as well? It’s not much, I know, a simple SQL query turned into a basic method in Rails, but it is a first step towards working with more complicated ideas (for instance parsing the same information based on whether the game was home or away), and realistically there isn’t a lot of Active Record Query work in this, but maybe that shows me that my thinking on how I want to convert my SQL queries into useful Rails code being purely a function of the query methods has been wrong from the get go. What I do know is that solving this rather simple query has given me the confidence to move forward on converting even more complicated SQL queries into clean Ruby on Rails code. So that’s what we’re going to start doing.