I Scraped the Analytics Pros Blog Using Ruby
Updated: Nov 27, 2018
Lately, I've been learning a lot of Ruby programming as part of the full-stack web development bootcamp I'm doing from Flatiron School. My very first solo project without much help has been to create a CLI (or command-line application) from scratch to scrape a website and present to the user a list of options and/or any other types of data in a Bash type of environment. To give you a better idea of what a command-line is, just type Terminal in your upper right search bar (if using a Mac) and click on the result. You should see something like the image below.
For the project I decided to scrape the Analytics Pros blog because it is one of my main sources of information to learn legit analytics. I thought it would be a fun exercise to give it a try so I got excited and went for it. The very first thing I did was to write my plan. What were going to be the exact steps for each component of my program. What do I wanted users playing around with my application to do. Once I had all of the step-by-steps in a Google doc it was a lot easier for me to start setting up my environment to code.
The plan for my program was the following:
A user would clone my program's repo from GitHub to their own machine.
Once the program runs, the application would scrape AP's blog categories.
The user is greeted and shown the previous scraped categories in a list form.
The user is prompt to type a category number to get the category URL.
A user can copy/paste the link to his/her browser.
If a user would like to see the categories again it can type 'list' for another list view.
If the user types an incorrect number, it would receive an error message.
If the user would like to end the program, it can type 'exit' and a good bye message is displayed.
Once I had my plan in place I was ready to rock n roll this project. I started installing bundle gem and gave it the name of analytics-pros-blog-scrape. The very first lesson I learned with this project is to keep file and folder names concise. From now on, I'm keeping them short and sweet. Going back to bundle gem, installing it was perfect because it gave me a file structure to start with. For example, I had Bin and Lib files, a README in place, a Gemfile, Code of Conduct, License, etc. so it saved a bunch of time from manually creating them.
Next, I gave executable permission to my Bin file using the chmod +x command to tell my operating system how to interpret the file using Ruby, and I also gave it permission to run the program once a user types ./bin/blog-scrape. After this, in order for me to start the program, I just created a class named cli.rb where I added the code of what my program was going to run in a method named call which I then used inside my blog-scrape file. I did this to instantiate new instances of the cli with the call method I previously mentioned and a .new method as well. I know it may sound like a lot but to make it easy for you to understand the syntax is like follows: BlogScrape::CLI.new.call, where BlogScrape is the folder. Cli is the main class of the program responsible for user interaction, .new instantiates the new instances of the class, and .call runs all of the program methods.
The previous was just the beginning of the project. I then kept working simultaneously on all of my different classes. I created files for my cli class, scrape class, and the category class. Let's now get deeper into all of them.
I coded all of the user interaction methods within the cli class. Here, I created the following 4 methods:
Present_categories <-- responsible for calling the class Category where I create the actual categories I'm scraping from the AP blog using an each loop to get their names and URLs.
List_categories <-- I greet the user and present a list of the blog categories
Menu <-- I prompt the user to type a category number. Once on a user types a number if she/he wants to see the list of categories again they can type list.
Goodbye <--If the user would like to end the program, they can type exit at any time to be shown a good bye message.
Coding my scraping class inside the scrape.rb file was definitely where I had the most fun from all of my different methods. I used Nokogiri which is a Ruby gem that helps us parse HTML and collect data from it, and Open URI which is a module in Ruby that allows us to programmatically make HTTP requests. I only built two methods on this class, one for the base Nokogiri where I opened the Analytics Pros blog page I scraped, and then another method for scraping the actual categories. To do this, I just went to the landing page, used my Chrome dev console to look for the right CSS selectors of the categories to then use them within the method and that was it.
Finally, I coded my category class within the category.rb file. This one is responsible for initializing the name and url of the categories I scraped. I also have a method to create the actual categories by looping what I scraped with an each method which I then end by saving its results into a save method that actually references another method where I add each instance of the class to a class variable named all.
That's it. I finished my very first program. It was a challenge but I embraced it and hope to keep getting better and better. If you want to take a look at the program head to my GitHub page and clone the repo to your machine. Feel free to let me know what you think sending me an email or connecting through Twitter, LinkedIn or @Rafa at MeasureSlack. I want to continue practicing scraping because it is a big part of what I do with digital analytics so lets se what happens.