In April 2014, I participated in a 24-hour hackathon called the Nerdery Overnight Website Challenge, where I joined a team of friends to develop a Rails-based web application. That was my first exposure to Ruby on Rails, and I wasn't too much help in terms of writing form helpers, link tag helpers, or applying any of the idiosyncrasies that almost every Rails developer learns on his/her first app.
Nearly a year later, I call myself an amateur Rails dev. I can slap a blog together in two days. My front-end skills are pretty decent, and at work, I write QUnits and refactor JavaScript like nobody's business. In the past year, I've worked with great ardor in developing I want to be a VC, formerly known as "a blog about tech startups, software development, and scale." Those three topics, especially at this moment in history, tie together quite well, but sometime in the past few months, it became "tech, startups, software development, and scale."
In developing I want to be a VC, I used Nokogiri and its analogs, Mechanize and Readability for Ruby. Feedjira makes an appearance for some RSS feed reading, too, but most of what eventually became a robust news aggregator stems from web scraping.
Why, though?
I want to be a VC came about due to a problem I was having. I love reading tech news with Twitter, but even with rich media, I wanted to get a better peek at the articles I was reading. While the full article text is scraped in the app, only the first 300 words are posted, and by clicking the title of the article or the (more...) link, you are taken to the article itself. This provides a limitless scrolling experience on a UI that's easy on the eyes and keeps me out of hot water with the content creators.
In November 2014, I finished aggregating the 42 news sources I wanted to read, but I was having some issues. Errors and 404s popped up in a seemingly random fashion on some articles, my parsing algorithm would fail because the code was bulky, and the organization of the CSS styles was disgusting. I spent time refactoring, error catching, and rescuing code that was slipping through the digital cracks, all while delaying my deployment. Then life got in the way - I started working on my Coursera courses, the holiday season hit, and I was coming up with excuses to avoid the painful task of making I want to be a VC perfect. This past weekend, though, I spent about 8 hours refactoring that CSS and at 2 am on Saturday night, my assets wouldn't precompile. In fact, as of this post, the subheading is directly up against the main heading. I decided to say 'screw it' to perfection, aggregate my first batch of in-production news articles, and deploy. As far as I see it, this is better than inaction or inertia.
What's my vision for my news aggregator?
First, for the record, I think it sounds cooler to just say 'blog.' One word, one syllable. Is there a similar contraction for 'news aggregator?' Nagg? Newsagg? Naggregator? As long as I don't get sued for copyright infringement (yes, I even asked my attorney), I would like to apply machine learning techniques to outdo even Embedly, the API I was originally using to aggregate the articles, instead of using parsing. I would like to understand what makes articles go viral and analyze digital media trends and how they intersect with our daily lives.
Eventually, I'd also like to make good on the promise of the domain's name and aggregate the output of venture capitalists and VC firms, potentially gaining Mattermark- or Quid-like insight into industry trends and topics. For now, though, I've got some refactoring to do.