Greetings true believers, or you know, folk who happened upon this article somehow.

As you may (or may not depending on if you read Step One of Testing Models) remember, due to a variety of external and internal motivations, I had started facing my fears regarding testing the Models of my NBA statistics application. A small reminder that this application populates numerous models by making multiple calls to an external web site to gather raw data that is then parsed and populated into the correct database tables.

In Step One, I described the journey that led me to successful test the first two Models i was working with, the Game Model and the Participant Model. Since then I have done one more test to determine that when the information regarding a player is downloaded, it is inserted into the Player Model successfully and that the proper data is populated into the proper fields.

In this article (creatively named Step Two), I will do what I consider the hardest part of this beginning testing of Models. The Statistic Model is where all player box-score information is stored. This results in a series of web calls for each game on a given date, after the Game Model and Participant Model have the relevant information added. In addition, when populating the Statistic Model, it must be verified that the player currently exists in my database (via the Player Model), which I have already tested.

So, remember, I’m going about this backwards. That means the code to successfully do this already exists, but I need to isolate a specific game and then stub out the right methods so that:

  1. The process I want to run runs cleanly without errors
  2. The process runs properly as well.
  3. I can test the results of the process so that I can back my way into the passing tests that show my methods work.

So without further ado, on with the show shall we.

Day One - April 4, 2016 - 1.5 hours, after work.

As stated previously, I had downloaded JSON files to use as source data for running my tests. The first thing I needed to do (since it was a while ago) was verify to review the JSON to identify which game I had gotten the information for. (You didn’t think it was going to be as easy as running the tests were you? Sadly, no. The test databases ‘empty themselves’ after each set of tests is run, which makes sense, however, that means to properly test this, I will have to populate the Game and Participant models before running my tests so that I can check that all my relationships will yield the proper set of records.)

After examining my creatively called JSON file statistics_test.json, I verified that the game we were dealing with had an NBA assigned id of 0021100001. (That’s right, the NBA tries to make it difficult using leading zeroes since computers don’t like leading zeroes as numbers, hence our identifiers in our various databases might be strings because of this.)

Since, I had already successfully tested the first two parts of the ‘getting the stats’ process, I could copy over the information from my Participant testing file to my Statistic testing file within the before do clause to get the needed ‘starter data’ going. So that’s what I did, and just to make sure I ran the test file then to see if anything broke and thankfully it did not.

Now, because I thought I was being slick at the time, the player statistics for a given game is actually populated from the Player model using a method I actually had to stub out to get the preliminary tests setup, which could present a problem.

This took me back to a previous meet up where another mentor talked about some rules of programming and the concept of a seam (as it turns out, after some research that’s what he meant, a seam).

Now what’s a seam in regards to programming, well after some research online I discovered this simple definition:
A seam is a place where you can alter behavior in your program without editing in that place.

I’m still not 100% sure what that means, but looking at the code I had written, I see that I could break it up into two parts, that might make sense programmatically, but also make it easier to test, tweak, and if I did it right, I wouldn’t break my previously existing steps. As it currently exists, the code looks like this:

def self.get_playerstats(game)    
  search_string ="http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=28800&GameID=#{game}&RangeType=2&StartPeriod=1&StartRange=0"    
  boxscore_link = URI(search_string)    
  boxscore = JSON.parse(Net::HTTP.get(boxscore_link))    
  player_game_info = boxscore["resultSets"][0]["rowSet"]    
  player_game_info.each do |player|    
    player_id = player[4]    
    checkplayer(player_id)    
    Statistic.insert_player_stats(player)    
  end    
end

So, the best way I saw to do this was to use the player_game_info to check all the players first while looping through them as above, but instead of adding the statistics of each player individually, just send the entire player_game_info piece of information to the Statistic Model and alter the insert_player_stats method to loop through the variable a second time. This not only maintains my primary functionality but also means only calling the the Statistic Model once, which has to be a good thing I would think.

So over to the Statistic Model I headed, and altered that insert_player_stats method to take an array and loop through it as opposed to the original code (above) sending each index of said array individually after checking for the player’s existence within the database representing the Player Model.

The method above was now broken into two methods, one in the Player Model:

def self.get_playerstats(game)
  search_string ="http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=28800&GameID=#{game}&RangeType=2&StartPeriod=1&StartRange=0"
  boxscore_link = URI(search_string)
  boxscore = JSON.parse(Net::HTTP.get(boxscore_link))
  player_game_info = boxscore["resultSets"][0]["rowSet"]
  player_game_info.each do |player|
    checkplayer(player[4])
  end
  Statistic.insert_player_stats(player_game_info)
end

and one in the Statistic Model:

def self.insert_player_stats(statstics_array)
  statstics_array.each do |statline|
    player_stats = Statistic.new
    game_id = Game.find_by(nbacomid: statline[0])
    team_id = Team.find_by(nbacomid: statline[1])
    player_stats.player = Player.find_by(nbacomid: statline[4])
    player_stats.participant = Participant.find_by(game_id: game_id, team_id: team_id)
    player_stats.starter = is_starter?(statline[6])
    player_stats.time_played = get_seconds(statline[8])
    player_stats.twosmade = statline[9] - statline[12]
    player_stats.twostaken = statline[10] - statline[13]
    player_stats.threesmade = statline[12]
    player_stats.threestaken= statline[13]
    player_stats.freesmade = statline[15]
    player_stats.freestaken = statline[16]
    player_stats.oreb = statline[18]
    player_stats.dreb = statline[19]
    player_stats.assists = statline[21]
    player_stats.steals = statline[22
    player_stats.blocks = statline[23]
    player_stats.turnovers = statline[24]
    player_stats.fouls = statline[25]
    player_stats.plusminus = statline[27]
    player_stats.save
  end
end

So now I had successfully separated those two methods (that should have been separated in the first place) and altered the code to maintain the integrity of the original method (not to mention maintaining arguments and such so that I didn’t have to rewrite too much code in insert_player_stats) while making the method in the Statistic Model more independent. This makes for (hopefully) cleaner code, and definitely easier to test code.

Next, I set up a variable that parsed my aforementioned JSON file and accessed the proper index to replicate player_game_info above. I set up my before statement to include passing that variable to the newly altered insert_player_stats method, and while no tests have been successfully run, the code runs without error or failure of existing (and basic) shoulda-matchers.

And that was it for the first day of this process.

Day Two - April 10, 2016 - 1.25 hours

(Busy week, working on other things on this site, reviewing JavaScript, dealing with real life intrusions and a need to decompress a bit, so it took a few days to get back to it)

So the next step after successfully ‘decoupling’ my two methods, and setting up the testing file to populate from my example file was to make sure that it had populated properly. That verification meant running queries against my test data to see that it matched the source data of the game.

The first most easy test, is a record count. My set up only takes into account players who actually play in a game, so the first test was written to determine that the correct number of records (21) were inserted into the file. This test passed easily without any tweaking

The next tests I would write, would require slightly more complicated writing. I wanted to test that the each team had the right amount of records assigned to them (11 for Boston and 10 for New York)

This was a two fold test in that I had set up my models so that the Team Model could access the Statistic Model though the Participant Model. Since this was a freshly populated test database, the only data in the statistics table should be for the two teams (Boston and New York) participating in the game used to run these tests. Thus, I should be able to access the statistics of any team directly by accessing the team itself, which I can do easily by finding the team by the abbreviation assigned to it (BOS for Boston, NYK for New York). These tests were written and successfully run. So I had successfully verified that the right amount of records had been created and assigned to the right team, but now I had to verify that the right values had been entered.

Entering a players statistics for a given game once you have access to the right data is a relatively simple process of accessing the right indices and assigning them to the right attributes of your newly created Model object. There are a couple quirks in this set up based on how I wanted to look at data compared to how the source data provided it.

  1. A players time played is given as “MM:SS”, which for a variety of reasons I didn’t really like, so I set up a small method within the Statistic Model to convert this data into the total number of seconds played. It seems, to me, that that would make it easier to deal with ‘time played’ by a given player over x amount of games (or an average) later on, as it’s a simple matter of using divide and modulo on the seconds to easily convert back to the original format.
  2. Players take 2 point shots, 3 point shots, and 1 point shots (referred to as free throws) but when you look at a box score, that’s not how the numbers are broken down. The FGM/FGA are actually a combination of 3 point shot and 2 point shot information. I’ve never really liked that myself for a variety of reasons (primarily because a 3 point shot is more valuable than a 2 point shot, and you break them out so why not break out two point shots), so my design stores the information based on the 1/2/3 point shot model. This requires a little subtraction for ‘two point shots’ taken and made in a given game.
  3. It occurred to me while trying to think of ways to make this data useful, that tracking daily fantasy sports (DFS) scores for given players in given games might provide some insight in the future. The three major DFS sites I looked at (Draft Kings, Fan Duel, and Yahoo) each have a slightly different scoring model (that they provide to the public), so I wrote some code to determine the game score for each of the three DFS sites, once the stat line had been added. This did require a determination of ‘doubles’ as Draft King provides one bonus for a ‘double double’ and another bonus for a ‘triple double’. I decided to store the ‘double’ value directly into the data permanently as tracking the number of ‘significant doubles’ by a player in the season is often seen as a helpful piece of information.

So, in addition to testing that the right information got put into the right place, I also needed to test that these three ‘additions’ beyond just storing the source data had been inserted properly.

A standard NBA game (no overtime) is 4 x 12 minute quarters, and each team plays five players at a time. Therefore the total time played, by a team should be 240 minutes (48 * 5) or 14,400. It would seem the easiest way to test that the time played for each stat line had been entered properly would be to take the successful tests previously done and tweak them to total the time_played field for all the records found.

Realizing that I would be running numerous tests on the same basic find_by, I reorganized my rspec into two contexts, one for testing the numbers for the New York Knicks and one for testing the numbers for the Boston Celtics, and I started with the Celtics first.

After a misfire (I forgot to use pluck to isolate the data I wanted) on the first test, I got to compare the total time played versus my expected time, and sadly, it was off by 2 seconds (I expected 14,400 seconds, and got, 14,002 seconds), so I had to investigate what the issue was. A quick check of the source boxscore in my spreadsheet program, verified that while the box indicates ‘240 minutes’ it actually works out to 240 minutes and 2 seconds, so 14,002 is the right outcome, and the test was altered to indicate as such.

For clarity sake, I ran the same spreadsheet count on the number for New York and it worked out to a clean 240 minutes (or 14,400 seconds).

After tweaking that unfortunate time miscalculation by the NBA, I proceeded to test that all the relevant box score data in my table matched what was on the box score page itself. Following that I tested all box score totals (including my own ‘two points made’/’two points taken’ calculations to ensure that the populated data matched the box scores for the totals. The next step is to figure out a way to isolate a individual records and match them up with a specific players numbers from the same box score. This would include testing the DFS calculations I have set up and the various built in methods to calculate a few other things.

Alas due to numerous other obligations that is it for the second day of this.

Day Three - April 11, 2016 - 1 hour after work

I suppose I could presume that since the totals for the teams that I tested on Day 2 calculate properly that each individual entry calculates properly as well. However, there are some custom methods I built (determining points scored, converting the time played back into “MM:SS”, adding up rebounds) that I still want to test, plus the neurotic in me wants to make sure that individual player information is being added properly.

Now, there’s one little fly in the ointment in that to test only the Statistic Model, no player link is created when the test is being run. This is not a huge problem as I know the order the data is added and Rails provides us with built in methods to obtain the first through fifth (but not beyond) and last record in a table by appending first through fifth, or last, to the model name. Using this, I plan to test that the first and last records are inserted as expected.

Again, using contexts allows me to create the variable for the first or last record and then write the proper tests to determine proper data insertion and functionality of my basic custom built methods.

The first failed test was that my method for determining minutes played was not returning a leading zero for seconds less than 10. Therefore I had to tweak the method with a conditional based on the total seconds played.

Originally, the method looked like this:

def minutes_played
  "#{time_played/60}:#{time_played%60}"
end

Now, I could have written an if/else statement that would have output one string if the %60 portion was less than 10 and another if it was 10 or greater. However, I ran a quick web search and happened upon this gist, which I tweaked a bit in my method so that it was still one line, taking advantage of yet another wonderful built-in functionality of Ruby and Rails:

def minutes_played
  Time.at(time_played).strftime("%M:%S")
end

The failing test now passed as the expected minutes played format (37:03 by the way) was returned.

The next test tested the total points scored method I had written. This test would have the two fold impact of testing that I had inserted the correct number of shot types made, or the points calculation wouldn’t work out properly.

That test passed on the first try indicating that I had properly set up the math on the raw data so I could score 1, 2, & 3 point baskets individually as indicated earlier.

The next test was to calculate that one of my custom calculations was working properly. The `doubles method’ determines how many of 5 categories (points, rebounds, assists, steals, blocks) total more than 10 in a given game for a given player. This is needed because one DFS (Draft Kings) company gives a bonus for 2 doubles (called a double double, long before In-n-Out I hope) or 3 doubles (a triple double). Quadruple doubles happen, but are very rare and Draft Kings doesn’t indicate any bonus for such a thing, but based on rarity perhaps they should? Anyway, this scoring mechanism means that I must determine the total doubles a player had in a given game before I can attempt to calculate their Draft Kings game score.

That test passed simply as well, and now I wanted to test the DFS score calculators.

Of the three ‘major’ DFS sites I know (Draft Kings, Fan Duel, Yahoo), they each have a slightly different method for calculating the game score for a given player, so I wrote three methods that calculate the corresponding game score after the raw data has been added (and the aforementioned doubles calculated). This game was played in 2011, and there are no historical records I know of to determine game scores like this, so to do it properly, I went in to Excel with the raw data and formulas to calculate the expected game scores for each of the DFS sites I was using.

After properly determining the respective DFS scores through Excel, I wrote the tests to make sure that’s what my data would give and each three passed easily. Which, in the end should have been enough to tell me that the data was inserting properly for the first record and by extension all the others. However, I still had one small concern.

The first player only had 1 ‘double’ and thus I couldn’t be 100% sure my Draft Kings score calculation method worked. I needed to find at least a ‘double double’ to test for sure. As I scrolled through the box scores I found that the fifth record (remember the last of the ‘beginning group’ that had a built in call scored 31 points and had 13 assists and thus was a stat line I could use to test if my Draft Kings method truly worked properly. (Remember how good Rajon Rondo was in Boston with good players around him NBA fans?)

So, I got the stat line for the fifth player, manually calculated the DFS scores again and tested them against my app and again the tests come out all passing. The Yahoo and Fan Duel scoring systems, though written differently came up with the same total points twice. I should look into that more closely to see if they always yield the same score for every players performance. If that is the case, then I could make things simpler (and DRYer) by only having two DFS attributes, one for Yahoo and Fan Duel while a separate one for Draft Kings which worked out differently each time (with or without the ‘double’ bonus).

So, after 43 (and that’s probably too many but I would rather over than under do it) tests on my Statistic Model that tested data insertion, the linking of the individual statistics for a game to a specific team and some of my own built methods to store game data (and some extra) the way I wanted it, I have successfully tested the existing Models and data population functionality I had written previously without testing. It wasn’t without hiccups (like having to rebuild some code on the first day as my source site had changed its setup), but in the end, I think it was instructive and I feel like I was pretty thorough in testing what I had already written (which while not as good as properly using TDD, it is better than never testing at all) and had covered as much as I could think of in said tests. So, after a little GACP processing, I feel like my nba app is ‘up to date’ and that it’s ready for the next step (and as soon as I figure out what that is I will pursue it and write about it here).

Thanks for reading.

Comments always welcome if you made it this far.