If you have read the articles I wrote about my NBA Project, you might recall the trials and tribulations I went through writing the code to not only download the data (and then re-writing it once it stopped functioning), but to populate all the data into the right models. At the time I wrote that code, I wasn’t focused on doing the proper test-driven development (TDD) on the application. That was until I spoke to a hiring exec at a local meetup who said the first thing he looks at are people’s tests. Thus I had to work backwards to write the tests for my code that was already working, and sadly, that wasn’t as easy as you’d think for a couple of reasons:

  1. When you are testing, your application can not connect to the web, so any commands that try such a thing will throw an error. I researched ways to deal with this (messed around with a gem called VCR that I couldn’t get to work, but perhaps should try again soon) solution for this was to isolate the connections to the internet as separate methods that would then pass the results to the methods for populating the data. Thus I could create example files to represent the downloaded data, and I could pass them manually to the populating methods themselves. This way I’m testing JUST the processing of the raw data and not the grabbing of the raw data itself.
  2. When getting the game data in the NBA (or the NFL for that matter) things are kind of interconnected. Game information, the participant information, the player information, and on and on. These pieces of information are obviously related within your database or you’d have have way to find information for specific games, teams players easily. When I originally wrote the code that processed the downloads, it would be written in a way that multiple methods and models would be run with one line of code, so that a game would be processed properly beginning to end all at once. Many of these methods would also require calls to the web, so I had to deal with problem #1 again, but I also had to deal with a way to test the single method I was working on while ignoring calls to any others. That issue was solved when I learned how to stub out methods using expect on a class method and assigning it a return value. (There was some talk of double, but I was never ever to fully get double to work)

Solving these two issues in my NBA project did not come quickly or easily for me. It took me a while, perhaps longer than it should have, to extract the proper code to separate different functionalities to deal with the issue in 1. Writing the tests to run the methods with the test files was no picnic either, and coming upon stubbing solution was down right aggravating as, even when I thought I was writing the code properly, errors would still be thrown or tests would still fail that should be passing. Anyway, the end of the story is that I did finally work it all out without letting time or aggravation lead me to giving up. I know have a project that while not finished works (with tests) up to the steps I’ve completed. And as I had gone through the hardship with the NBA application, I knew, going in, I had to do with my NFL application to test the similar methodologies.

Starting this NFL project with a TDD bent in mind from the beginning, I will code, test, and refactor, the data processing one step at a time. Then, when each individual step is written, tested, and working properly, I will write the code that integrates them so that one command will download and process all the information from a given weekend of competition that I want. Because of number 2 above, that code will undoubtedly break the passing tests, but I can then refactor those tests with the stubbin to once again pass. I feel this will be a better way to do things as I’m taking smaller bites at a time, so onto to the first bite.

Getting the Game Information

As previously written, the basics of the game information (teams, winners, losers, home, away, etc…) is actually broken down into two separate models (database tables) to (I believe) make future querying of information easier and more efficient. Therefore, the model that houses the game information is pretty simple. It only has two pieces of information:

  • The date of the game
  • The unique identifier used by NFL.com for the game as this is used by the application to get the more detailed information from NFL.com later.

In an earlier article, I identified that pages that look like this seem to be the best way to identify the two pieces of information needed to complete the process I am working on here. That same article talked about how the nokogiri gem provided me the best and fastest way to isolate the information I needed out of the gobbledygook that is the source code of the page I’m working on. (I’m not posting the source code, you can look at it if you want, but I will say that there are 16,263 lines in the file, and I need, at most, 16 entries for any given week of the season). The first step, as it always should be, is writing the tests and making sure they fail properly as opposed to throwing any errors due to programming mistakes by me.

The initial testing that I set up was pretty simple. I set up one test to see that running the method I would write for parsing the gobbledygook would change the record count of the games by 16. As I was writing this article, I realized that that one test does not determine if the right game dates have been included. So I refactored the original testing to look like it does below so it would test for the proper number of games with each game date (which took a second to get right because I forgot that the NFL does two Monday night games on opening weekend), so the relevant RSpec test information looks like this:

  context "Nokogiri will successfully populate the game and date information" do

          before {Game.add_games('spec/fixtures/nokogiritest.html')}

          it "inserts 16 new records" do
                  expect(Game.count).to equal(16)
          it "inserts one game with the game date 9/8/2016" do
                  expect(Game.where(gamedate: "2016-09-08").count).to equal(1)
          it "inserts two games with the game date 9/12/2016" do
                  expect(Game.where(gamedate: "2016-09-12").count).to equal(2)
          it "inserts thirteen games with the game date 9/11/2016" do
                  expect(Game.where(gamedate: "2016-09-11").count).to equal(13)

The reference to spec/fixtures/nokogiritest.html is the file I manually created. In doing the research on the NBA testing, it seems that storing files like this in a fixtures folder was SOP for TDD so that’s what I’ve done.

Now, presuming I had written all the tests at the same time, they all would have failed, as to be expected, because Game.add_games hadn’t yet been written, so it couldn’t be called. So I had the red, and now I had to write the code to get to the green.

Writing the add_games method did take me a little bit while then it should have, as some of the key aspects of what I needed to do slipped my mind even after reviewing my notes and previous article. Some of the details of precisely what I needed to isolate from that large mass of code took me a moment to get right, but in the end, after a bit more reviewing and a few failed tests, I came up with this solution that made the one test, and the new ones added while writing this, pass with everyone’s favorite color green:

  def self.add_games(rawdata)
    source = Nokogiri::HTML(open(rawdata))
    games = source.css("a.gc-btn")
    games.each do |game|
      href = game["href"]
      gameid = href.scan(/\/(\d+)\//)[0][0]
      year = gameid[0..3].to_i
      month = gameid[4..5].to_i
      day = gameid[6..7].to_i
      gamedate = Date.new(year,month,day)
      Game.create(gamedate: gamedate, nflcomid: gameid)

One of the things that hung me up for a bit was that href = game["href"] line. I had forgotten that I had to go one step deeper into each btn to really get what I needed. Additionally, converting the individual date pieces into a properly formatted date took me longer, unfortunately, than it should have. A few other articles I’ve written have dealt with dates, and sometimes creating them like this, but I couldn’t remember the exact way to do it, so I had to do some web re-researching to find the correct Date.new formatting I needed. However, in the end, I got to where I needed to be. The above code will take the raw HTML file for any given week of NFL play and get me the basic identifying information for each game, so after a local git commit and push to the github repository, I was ready to move onto the next step on the data processing process; populating the game participants.

Populating the Participants

Ideally, what happens after the game information is populated, the nflcomid for each game is then used to download the game specific information, which would then be processed by that variety of methods and models to get the rest of the data. (Most of of the models that would pe populated haven’t been built yet.) Unfortunately, as noted above, you can’t do that when you’re testing, so you have to go about it a different way.

The process of doing the next step is quite similar in idea to getting the game information. All the way back in part 1 of this project, I started out by finding the correct JSON information I would need to isolate and process the bulk of the game data. I worked deep into the JSON object to identify not only what was available but the various keys and/or indices I would need to access the information later on when I really wanted to make use of it. However, there are two key differences to the game information above:

  1. JSON data has to be processed differently. Rails thankfully has JSON handling built-in so no gextra ems would be required.
  2. The complexity of the data being accessed is much deeper and thicker than the basic game information and as such requires more code to extract what you need (and a few more tests to be written)

There is a third difference that isn’t relevant to this article but is important. As stated above, the file/source used for the game data above is used only with one model. This JSON object provides information to extract all the rest of the game specific information needed in the variety of models that will exist by the time this project is ready to launch, but they have not been built yet. So this test object fixture will be referenced again and again in future articles, and tests, but for now, I’m focusing on the easiest part of the processing process.

So let’s get to the first part of processing the JSON, getting the participants. First off, as always, I started with the tests:

  context "The JSON captured by the gameID will populate participants correctly" do

    before do

    it "should create two records" do
      expect(Participant.count).to eq 2

    it "should have one winner and one loser" do
      expect(Participant.where(winlosstie: "W").count).to eq 1
      expect(Participant.where(winlosstie: "L").count).to eq 1

    it "should have a home team equal to the Broncos" do
      expect(Participant.find_by(homeaway: "H").team.nickname).to eq "Broncos"

    it "should have an away team equal to the Panthers" do
      expect(Participant.find_by(homeaway: "A").team.nickname).to eq "Panthers" 

As you can see the first tests were a bit more involved, and required that the basic game information already exist (hence why those tests were done first) before the participants information could be used as well. This makes sense if you think about it. You don’t want to be adding random information that doesn’t belong to a game the system already knows about, and that is why each participant entry belongs_to :game. Thus, the game must exist before the participants can be added. True I’m only working with the participants of one game, but this was the quick and easy way to make sure the right game information was available.

As the basic tests are more involved, so is the code that I ended up writing to process the information for a game’s participants. It’s not as easy as just isolating the data from the JSON and inserting it into the database. Much of the isolated data must be further processed before the participants can be properly added to the database:

  1. When using models to manage databases in Rails, Rails creates its own unique identifier column for each row, the primary key, that it will use to relate members of different models when you use belongs_to, has_many, or any of the other options available to use Rails built-in database table relating functionality. As such, I can not use the game identifier provided by the JSON, I must find what identifier Rails uses for the same value. That is why the nflcomid column exists in the games model. I can find the unique Rails identifier based on that value in the JSON.
  2. Similarly, the teams model has the same issue regarding how I want linking with the participants model. That’s one benefit of storing the team abbreviation in the abbr attribute of my team model. The downloaded JSON identifies teams by that abbreviation. By isolating that data from the JSON I can identify the correct Rails identifier for the teams participating in the game that I’m working on.
  3. The scoring by quarter is broken down in the JSON by q followed by a number 1-5. The 5 represents the possibility of one overtime period in a game. You may recall that when I was building my participant model, I named that column OT instead of q5. Let’s just say that that was a mistake on my part that I didn’t identify until working on this code. A quick Rails migration using the newly discovered rename_column and refactoring of the shoulda-matchers integrated that change. That allowed me to easily access all the scoring information with a loop in the code below.
  4. Much like in my NBA project, the JSON object doesn’t identify the winner and the loser, so the total points scored by each competitor must be accessed somehow (NFL.com provides the total points scored by a team in the same hash in which it provides the scoring by quarter) and then a comparison must be made to determine the winner or loser, The way I solved this with my NBA application was a quick if statement because there’s only two possible outcomes. The code below is slightly more complex because of the possibility of a tie in the NFL.

Using my previous research on the JSON to get the raw data and my existing knowledge of Ruby and Rails, I was able to construct the code below which made all the tests above turn green:

def self.add_participants(gameinfo) 
          rawgame = gameinfo.keys.first
          gameid = Game.find_by(nflcomid: rawgame.to_i)
          homehash = gameinfo[rawgame]["home"]
          awayhash = gameinfo[rawgame]["away"]
          homescore = homehash["score"]
          awayscore = awayhash["score"]
          hometotal = homescore["T"]
          awaytotal = awayscore["T"]
          hometeam = Participant.new(homeaway: "H", game: gameid)
          hometeam.team = Team.find_by(abbr: homehash["abbr"])
          homescore.each do |key, value|
                  unless key == "T"
                          key = "q#{key}".to_s
                          hometeam[key] = value
          awayteam = Participant.new(homeaway: "A", game: gameid)
          awayteam.team = Team.find_by(abbr: awayhash["abbr"])
          awayscore.each do |key, value|
                  unless key == "T"
                          key = "q#{key}".to_s
                          awayteam[key] = value
          if awayscore === homescore
                  hometeam.winlosstie = "T"
                  awayteam.winlosstie = "T"
            if awayscore > homescore
              hometeam.winlosstie = "L"
              awayteam.winlosstie = "W"
              hometeam.winlosstie = "W"
              awayteam.winlosstie = "L"

So, that was it. I had successfully built, and tested, code that would take downloaded information and insert it into two of my database tables. As always though, even when progressing, questions/issues can be raised:

  1. How will the third possible outcome (a tie) affect functionality in the future. At some point I will want to query the data to find winners and losers, but when there is a third option, that has to be accounted for properly.
  2. Since I’m not storing the point totals for each participant (mistake?), how will I determine how many points they each scored in future queries, and how complicated will it become when I attempt to calculate the points starting in different models (like the game or team ones). It turns out that that is very complicated and requires me to learn a few more things I didn’t know when I wrote the code in the above article.

So, steps forward are good, but many more to go, and more created even while I cover more ground, but I think that’s the be expected. Hopefully, this was helpful, or encouraging, to you. Thanks for stopping by.

Note about this article: *I’m currently participating in the 100 Days of Code, which you can also follow on Twitter. It is an enjoyable experience but it has taken some getting used to to balances the needs of the challenge with my desire to document here as well. As such, this article covers only some of the work I’ve done so far in the challenge. If you wish to follow my progress, you can read my git log. As I’m a little verbose, the extra writing for that has caused me to fall behind here (as I’m not sure I’d consider writing these posts part of the 100 days work), but I’ll get the hang of it soon I’m sure.