Archive for category Data
My major project of 2012, TheKnot.com Search BETA, launched earlier this month, the morning of December 14th, and it was a journey worth documenting. It began back in May when the company decided to start our own labs or internal incubator groups. One began right away in New York City and the other started a couple weeks later in Austin. I had the opportunity to join the Austin group with a team of developers and product people that I highly respected. I was grinning the whole time with excitement. With the direction that we were independent from the company but had access to the data we sequestered ourselves in a room for a few weeks. Over pots of coffee and pints of beer after work we brainstormed our some good ideas for our brides to find inspiration for their weddings. Our group started out as Judy Galani, Product Strategist, Forrest Andrews and Matt Oehlers, Senior Software Engineers, Erin Bender, Freelance Art Director, and myself as Interactive Media Developer. We “sold” the idea and working prototype to our “investors” by October and it was then time to go from prototype to production.
We had to take the design in house from this point on, but after a fond farewell and thank you to Erin we started to add in more management to the group. It was a greenfield project so our back-end developers Matt and Forrest were excited to push forward into new territories. Within a day or two we were able to deploy projects straight from Visual Studio to the AWS servers, I was blown away. Forrest focused on our MongoDB NoSQL data store and the API that I would later pull all the front-end content from, ASP.NET MVC. Matt worked on massaging the data into Apache Solr and in a matter of days we had an impressive result set coming back for queries.
The front-end technology was built to be a responsive two page website that used a combination ofjQuery and Underscore.js along with HTML and CSS to create a multi-device interface that would be familiar and easy to use. The main grid view uses Masonry to create the Pinterest-esque layout for search results and Underscore.js templates to load in a different layout for different data types. The single page view focuses on a low browser width touch interface to give a very image focused search experience. To combine and minify the page scripts down I used the BundleCollection library in ASP.NET MVC which worked great. There are still a large number of extra scripts on the page but I’ve tried to load them asynchronously and not let them interfere with using the interface immediately.
The site has been up and working great for a few weeks now and launched along side a new look and feel for TheKnot.com. Go take a look, and of course let me know if you find any bugs here or on the feedback tab on the bottom!
So as one of my sites grew I quickly found out that my zip code database was horribly incomplete with only 33234 records and it even had duplicates for some zip codes. I did some searching and came across a post about a zip code csv file but the post goes to what looks like an empty page with no file. A little digging lead me to the zip code file download page that gives you a csv with entries like this:
Awesome! This csv file has 43191 unique records and has entries for each zip code my users have reported as missing. I started by looking up how to import a csv into MSSQL and followed an article called Using BULK INSERT to Load a Text File from which I came up with my query.
BULK INSERT ZipCodes FROM 'c:\zipcodes.csv' WITH (FIELDTERMINATOR = '"",""', ROWTERMINATOR = '\n')
It didn’t work and after a couple hours of trial and error I threw my hands up in the air and abandoned the bulk insert and fell back on another option of using phpMyAdmin. It sounds crazy but I setup the database in MySQL and imported the csv on the first attempt! Now to get it back to to MSSQL I exported the database as single row imports that looked like this:
INSERT INTO `ZipCodes` VALUES('78701', 'Austin', 'TX', '30.27127', '-97.74103', -6, 1);
A few quick find and replaces to change the database name and remove the ` characters and I was able to run them as queries on my MSSQL database in blocks of 5500 to prevent timeouts.
INSERT INTO [databaseName].[dbo].[ZipCodes] VALUES('78701', 'Austin', 'TX', '30.27127', '-97.74103', -6, 1);
It’s crucial to have a fairly complete zip code database for distance searching on my free online dating site and I’m glad I got this sorted out and updated.