In my previous post, I analyzed some NYC traffic data using Pandas. In this post, I would like to perform the same analysis using q. As far as graphing is concerned, I won’t be showing that. q is not really used for graphing. You can use GUIs like qPad or qStudio to chart the data on your own.
This post will help you see how one can achieve same results using different methods. It’s nice to have these kind of options. Earlier a major disadvantage for not using q used to be its cost but now you can just use the 32 bit version that is now available for free!
One thing to note is that the analysis done in this post is very straight foreword. Both Pandas and q are capable of handing much more complex analysis. The point of these posts is to show you the different tools available to you.
For my analysis, I will beĀ using the same data set that I used for the earlier post.
Let’s begin!
First, we need to prepare our data before we can start analyzing it.
/ Load the file and define the formats for the columns especially date and time. a:("DTSIFF****IIIIIIII*****I*****";enlist ",")0:`:NYPD_Motor_Vehicle_Collisions.csv / Replace spaces in column names with underscores a:(`${ssr[x;" ";"_"]} each string cols a) xcol a / Rolf has kindly informed me that there is an existing function called .Q.id that removes spaces. / i.e. ZIP CODE will become ZIPCODE / Here is how to do it / .Q.id a / cols .Q.id a will return / q)cols .Q.id a /`DATE`TIME`BOROUGH`ZIPCODE`LATITUDE`LONGITUDE`LOCATION`ONSTREETNAME`CROSSSTREETNAME... / Delete unwanted columns delete LATITUDE,LONGITUDE,LOCATION,UNIQUE_KEY,ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME from `a
With Pandas, I grouped the data by year and summed all the columns. For the sake of better presentation, I will only sum two columns this time:
q)select sum NUMBER_OF_PERSONS_INJURED,sum NUMBER_OF_PERSONS_KILLED by DATE.year from a year| NUMBER_OF_PERSONS_INJURED NUMBER_OF_PERSONS_KILLED ----| -------------------------------------------------- 2012| 27445 137 2013| 55112 297 2014| 51198 262 2015| 20931 103
Summing the data by month but only for the year 2014 gives:
q)select sum NUMBER_OF_PERSONS_INJURED,sum NUMBER_OF_PERSONS_KILLED by DATE.month from a where DATE.year = 2014 month | NUMBER_OF_PERSONS_INJURED NUMBER_OF_PERSONS_KILLED -------| -------------------------------------------------- 2014.01| 3920 23 2014.02| 3106 13 2014.03| 3853 18 2014.04| 4048 22 2014.05| 4658 26 2014.06| 4985 22 2014.07| 4603 29 2014.08| 4476 22 2014.09| 4464 28 2014.10| 4718 20 2014.11| 4242 26 2014.12| 4125 13
Again, for the sake of presentation, I am not going to sum all the columns. Here is the data sorted in descending order by the sum of NUMBER_OF_PERSONS_KILLED and grouped by ZIP_CODE
q)`NUMBER_OF_PERSONS_KILLED xdesc select sum NUMBER_OF_PERSONS_KILLED,sum NUMBER_OF_PERSONS_INJURED by ZIP_CODE from a ZIP_CODE| NUMBER_OF_PERSONS_KILLED NUMBER_OF_PERSONS_INJURED --------| -------------------------------------------------- | 215 37707 11236 | 18 1945 11434 | 12 1448 11203 | 11 2311 10025 | 10 602 <span id="t87880cb3">If you buy Kamagra, you can be certain that cialis 100mg pills <a href="http://robertrobb.com/throwing-more-money-at-a-shutdown-economy-wont-accomplish-much/">robertrobb.com</a> these years hold a lot of promise. Patients with poor memory focus are advised to consume just two capsules a day with milk or water for three to four months <a href="http://robertrobb.com/duceys-rhetoric-on-taxes-and-spending-doesnt-match-his-actions/">http://robertrobb.com/duceys-rhetoric-on-taxes-and-spending-doesnt-match-his-actions/</a> order cialis online as one of the hot topic that is been discussed everywhere. ordering viagra from canada <a href="http://robertrobb.com/prop-127s-big-gamble-with-the-states-economy/">check out address</a> If you suffer from erectile dysfunction and would like to get him or her in your bed, or, while masturbating, every day thoughts on class mates or co-workers are haunting you, hormones or ovulation may be to blame. The capsules are excellent to allow you to achieve <a href="http://robertrobb.com/2015/01/">cheap cipla tadalafil</a> an erection when sexually aroused. 3. </span>11206 | 10 1432 10022 | 9 834 11207 | 9 3265 10029 | 8 919 11101 | 8 1414 11216 | 8 1231 11223 | 8 1034 11230 | 8 1198 11234 | 8 1359 11373 | 8 900 10456 | 7 900 10460 | 7 891 10466 | 7 904 10468 | 7 1008 11208 | 7 1724 11365 | 7 324 11418 | 7 431 10001 | 6 856 10461 | 6 819 11105 | 6 271 ..
Now, we get to graphing.
q)select sum NUMBER_OF_PERSONS_KILLED,sum NUMBER_OF_MOTORIST_KILLED by DATE.year from a year| NUMBER_OF_PERSONS_KILLED NUMBER_OF_MOTORIST_KILLED ----| -------------------------------------------------- 2012| 137 59 2013| 297 110 2014| 262 109 2015| 103 46
Here is a graph of the above data in qPad.
Grouping the data by BOROUGH for the year 2014
q)select sum NUMBER_OF_PERSONS_KILLED,sum NUMBER_OF_MOTORIST_KILLED by BOROUGH from a where DATE.year=2014 BOROUGH | NUMBER_OF_PERSONS_KILLED NUMBER_OF_MOTORIST_KILLED -------------| -------------------------------------------------- | 79 50 BRONX | 19 8 BROOKLYN | 67 21 MANHATTAN | 35 5 QUEENS | 52 20 STATEN ISLAND| 10 5
Lets look at the total NUMBER_OF_PERSONS_KILLED in Manhattan over the different years
q)select sum NUMBER_OF_PERSONS_KILLED by DATE.year from a where BOROUGH=`MANHATTAN year| NUMBER_OF_PERSONS_KILLED ----| ------------------------ 2012| 21 2013| 39 2014| 35 2015| 14
Finally, we look the sum of NUMBER_OF_PERSONS_KILLED in different boroughs where the VEHICLE_TYPE_CODE_1 was “Passenger Vehicle”
q)select sum NUMBER_OF_PERSONS_KILLED by BOROUGH from a where VEHICLE_TYPE_CODE_1 like "PASSENGER VEHICLE" BOROUGH | NUMBER_OF_PERSONS_KILLED -------------| ------------------------ | 92 BRONX | 35 BROOKLYN | 79 MANHATTAN | 32 QUEENS | 79 STATEN ISLAND| 13
If you are interested, you can find some more advanced qsql queries here.