State of Iowa Liquor Sales

GA project term suggested to choose one of two options:

to forecast sales for the state budgeting purposes
to give recommendations on where and why opening a news liquor store

I decided to go with the first option.

A short introduction to the project and data

The state of Iowa controls wholesales of alcoholic beverages with content of alcohol higher than 10%. That excludes wines and beers. Thus all stronger alcoholic beverages either imported or purchased by the state owned Iowa Alcoholic Beverages Division that adds universal markup of 50%, that increases the price for alcoholic beverages 1.5 times when sold to stores and wholesale distributors.

The dataset consists of about 2.7mln observations (records) of each individual purchase from the state by retailers or wholesalers like COSCO or Walmart. The data is stored at my Google Drive and you can read walking through this Jupyter Notebook. Each observation has 18 features describing the date, store number, store location town, zip code, county number, alcoholic beverage category, vendor number, item number, item name, volume of bottle, state cost per bottle, bottle cost, bottle retail price, number of bottles sold, revenue from yeach transaction and total volume sold.

For the purposes of this presentation, I used the extract of 10% of the initial data that has aboutu 270 thousand observations. You can try and re-run the project with the original set of data store here - https://drive.google.com/file/d/0B8xhteSGeVx6QWpXMUcxWGxJdk0/view?usp=sharing

The make takeaways are:

forecasting of sales per each store is not necessary for the taxation purposes,
as reality has shown, the revenue is connected to the way the states manages sales of liquors,
there is an strong ethical issue of increasing sales of alcohol and decreasing the risks associated with its consumption (health, crime, road accidents).
the state has one of the most strict liquor sales regulations, that pushes those living closer to the borders of the state to go to other states to purchase ../data/2_project_Iowa_ABD.pdf)
two small cities with largest and smalles consuption of alcohol are located at the borders with the states with stricter and more liberal legislations.

Don't hesitate to check the presentation (../data/2_project_Iowa_ABD.pdf)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = 10, 10
import seaborn as sns
plt.style.use('ggplot')
% matplotlib inline
%load_ext giphy_magic
% matplotlib inline
from scipy import stats
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split

## Load the data into a DataFrame
df = pd.read_csv('../data/Iowa_Liquor_sales_sample_10pct.csv',na_filter = True, low_memory=False)#, infer_datetime_format=True, dtype={‘County Number’:np.int64, ‘Category’: np.int64, 'State Bottle Cost' : np.float64, 'State Bottle Retail' : np.float64, 'Sale (Dollars)' : np.float64}, low_memory = False)
print df.shape
df.head()

(270955, 18)

Exploratory Data Analysis

Let's see if the data reflects the true types, if not let's bring some of the data into the workable format.

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 270955 entries, 0 to 270954
Data columns (total 18 columns):
Date                     270955 non-null object
Store Number             270955 non-null int64
City                     270955 non-null object
Zip Code                 270955 non-null object
County Number            269878 non-null float64
County                   269878 non-null object
Category                 270887 non-null float64
Category Name            270323 non-null object
Vendor Number            270955 non-null int64
Item Number              270955 non-null int64
Item Description         270955 non-null object
Bottle Volume (ml)       270955 non-null int64
State Bottle Cost        270955 non-null object
State Bottle Retail      270955 non-null object
Bottles Sold             270955 non-null int64
Sale (Dollars)           270955 non-null object
Volume Sold (Liters)     270955 non-null float64
Volume Sold (Gallons)    270955 non-null float64
dtypes: float64(4), int64(5), object(9)
memory usage: 37.2+ MB

## Transform the dates if needed, e.g.
df.Date = pd.to_datetime(df["Date"], format = "%m/%d/%Y")
#I tried to convert all the dollar amounts into numbers. But was not succesfull to turn the County and Category numbers into integerts
#df['County Number'] = df['County Number'].astype(int)
#df['Category'] = df['Category'].astype(int)
df['State Bottle Cost'] = df['State Bottle Cost'].map(lambda x: str(x)[1:]).astype(float)
df['State Bottle Retail'] = df['State Bottle Retail'].map(lambda x: str(x)[1:]).astype(float)
df['Sale (Dollars)'] = df['Sale (Dollars)'].map(lambda x: str(x)[1:]).astype(float)
df['month'] = df.Date.dt.month
df['year'] = df.Date.dt.year
df.head()

def eda(dataframe):
    print "missing values \n", dataframe.isnull().sum()
    print "dataframe index \n", dataframe.index
    print "dataframe types \n", dataframe.dtypes
    print "dataframe shape \n", dataframe.shape
    print "dataframe describe \n", dataframe.describe()
    for item in dataframe:
        print item
        print dataframe[item].nunique()

eda(df)

missing values 
Date                        0
Store Number                0
City                        0
Zip Code                    0
County Number            1077
County                   1077
Category                   68
Category Name             632
Vendor Number               0
Item Number                 0
Item Description            0
Bottle Volume (ml)          0
State Bottle Cost           0
State Bottle Retail         0
Bottles Sold                0
Sale (Dollars)              0
Volume Sold (Liters)        0
Volume Sold (Gallons)       0
month                       0
year                        0
dtype: int64
dataframe index 
RangeIndex(start=0, stop=270955, step=1)
dataframe types 
Date                     datetime64[ns]
Store Number                      int64
City                             object
Zip Code                         object
County Number                   float64
County                           object
Category                        float64
Category Name                    object
Vendor Number                     int64
Item Number                       int64
Item Description                 object
Bottle Volume (ml)                int64
State Bottle Cost               float64
State Bottle Retail             float64
Bottles Sold                      int64
Sale (Dollars)                  float64
Volume Sold (Liters)            float64
Volume Sold (Gallons)           float64
month                             int64
year                              int64
dtype: object
dataframe shape 
(270955, 20)
dataframe describe 
        Store Number  County Number      Category  Vendor Number  \
count  270955.000000  269878.000000  2.708870e+05   270955.00000   
mean     3590.263701      57.231642  1.043888e+06      256.43443   
std       947.662050      27.341205  5.018211e+04      141.01489   
min      2106.000000       1.000000  1.011100e+06       10.00000   
25%      2604.000000      31.000000  1.012200e+06      115.00000   
50%      3722.000000      62.000000  1.031200e+06      260.00000   
75%      4378.000000      77.000000  1.062310e+06      380.00000   
max      9023.000000      99.000000  1.701100e+06      978.00000   

         Item Number  Bottle Volume (ml)  State Bottle Cost  \
count  270955.000000       270955.000000      270955.000000   
mean    45974.963300          924.830341           9.773281   
std     52757.043086          493.088489           7.047119   
min       168.000000           50.000000           0.890000   
25%     26827.000000          750.000000           5.500000   
50%     38176.000000          750.000000           8.000000   
75%     64573.000000         1000.000000          11.920000   
max    995507.000000         6000.000000         425.000000   

       State Bottle Retail   Bottles Sold  Sale (Dollars)  \
count        270955.000000  270955.000000   270955.000000   
mean             14.675760       9.871285      128.902375   
std              10.570145      24.040912      383.027369   
min               1.340000       1.000000        1.340000   
25%               8.270000       2.000000       30.450000   
50%              12.270000       6.000000       70.560000   
75%              17.880000      12.000000      135.000000   
max             637.500000    2508.000000    36392.400000   

       Volume Sold (Liters)  Volume Sold (Gallons)          month  \
count         270955.000000          270955.000000  270955.000000   
mean               8.981351               2.372830       5.821103   
std               28.913690               7.638182       3.621291   
min                0.100000               0.030000       1.000000   
25%                1.500000               0.400000       3.000000   
50%                5.250000               1.390000       5.000000   
75%               10.500000               2.770000       9.000000   
max             2508.000000             662.540000      12.000000   

                year  
count  270955.000000  
mean     2015.193246  
std         0.394845  
min      2015.000000  
25%      2015.000000  
50%      2015.000000  
75%      2015.000000  
max      2016.000000  
Date
274
Store Number
1400
City
385
Zip Code
415
County Number
99
County
99
Category
83
Category Name
71
Vendor Number
116
Item Number
2696
Item Description
2173
Bottle Volume (ml)
29
State Bottle Cost
1086
State Bottle Retail
1112
Bottles Sold
137
Sale (Dollars)
6580
Volume Sold (Liters)
265
Volume Sold (Gallons)
261
month
12
year
2

Let me see, what effect of dropping the lines would beÂ¶

print 'Total sum of sales with missing category is ---- $', df['Sale (Dollars)'][pd.isnull(df['Category']) == True].sum(), ', that is ', df['Sale (Dollars)'][pd.isnull(df['Category']) == True].sum()/(df['Sale (Dollars)'].sum()), 'of total sales'
print 'Total sum of sales with missing category name is $', df['Sale (Dollars)'][pd.isnull(df['Category Name']) == True].sum(),', that is ', df['Sale (Dollars)'][pd.isnull(df['Category Name']) == True].sum()/(df['Sale (Dollars)'].sum()), 'of total sales'
print 'Total sum of sales with missing County is ------ $', df['Sale (Dollars)'][pd.isnull(df['County']) == True].sum(),', that is ', df['Sale (Dollars)'][pd.isnull(df['Category Name']) == True].sum()/(df['Sale (Dollars)'].sum()), 'of total sales'
print 'While total sales are -------------------------- $', df['Sale (Dollars)'].sum()
dropped_sum = (df['Sale (Dollars)'][pd.isnull(df['Category']) == True].sum() + df['Sale (Dollars)'][pd.isnull(df['Category Name']) == True].sum() + df['Sale (Dollars)'][pd.isnull(df['County']) == True].sum())
print 'Total loss of sales numbers from missing data -- $',dropped_sum, 'That is ', dropped_sum/df['Sale (Dollars)'].sum()

Total sum of sales with missing category is ---- $ 13789.92 , that is  0.000394824104261 of total sales
Total sum of sales with missing category name is $ 124113.99 , that is  0.00355355179204 of total sales
Total sum of sales with missing County is ------ $ 116833.68 , that is  0.00355355179204 of total sales
While total sales are -------------------------- $ 34926742.95
Total loss of sales numbers from missing data -- $ 254737.59 That is  0.0072934825433

# Here I gave another thought and decided to finally drop the missing lines and duplicates
df.dropna(inplace = True)
df.drop_duplicates(inplace = True)
df.shape

(269223, 20)

There is another dataset with population per county that I wanted to incorporate into the datasetÂ¶

population = pd.read_csv('../data/Total_County_Population_by_Year.csv',index_col = 'County', na_filter = True)
population.head()

Below to run once only! - County population is merged with the rest of dataframeÂ¶

I know I do it after the presentation and after the submission date. I was not happy with everthing I did. So I am redoing itÂ¶

df = df.join(population, on = "County", how = "outer")
df.head()

Now let's play with pivot tables.Â¶

dfpivot = pd.pivot_table(df, values = ['County','Sale (Dollars)','Bottles Sold', 'Volume Sold (Liters)'], index = ['Store Number'], aggfunc=np.sum)
dfpivot['av_price_unit'] = dfpivot['Sale (Dollars)']/dfpivot['Bottles Sold']
dfpivot['av_vol_unit'] = dfpivot['Volume Sold (Liters)'] / dfpivot['Bottles Sold']
dfpivot['Population'] = df['Population']
dfpivot['County'] = df['County']
dfpivot['liquor_capita'] = dfpivot['Volume Sold (Liters)'] / dfpivot['Population']


dfpivot = pd.DataFrame(dfpivot)
#dfpivot = dfpivot.merge(population, on = 'County', how = 'inner')

dfpivot.head()

df[(df.Date.dt.quarter == 1) & (df.Date.dt.year == 2015)].groupby('Store Number')

<pandas.core.groupby.DataFrameGroupBy object at 0x10487b790>

#print '5 counties with top average price per bottle bought \n'
#dfpivot.sort_values('av_price_unit', ascending = False).head()

# print '5 counties with least average price per bottle bought \n'
# dfpivot.sort_values('av_price_unit', ascending = False).tail()

# print '5 counties with top average volume per bottle bought \n'
# dfpivot.sort_values('av_vol_unit', ascending = False).head()

# print '5 counties with least average volume per bottle bought \n'
# dfpivot.sort_values('av_vol_unit', ascending = False).tail()

Calculate the yearly liquor sales for each store using the provided data. You can add up the transactions for eachÂ¶

Let's see what summaries we can produce from this set:Â¶

df.head()

I grouped the sales by each store for the year 2015

VolumeL = pd.DataFrame({'VolumeL':df.groupby(['Store Number','County','Population'])['Volume Sold (Liters)'].sum()})
Bottles = pd.DataFrame({'Bottles':df.groupby(['Store Number','County','Population'])['Bottles Sold'].sum()})
Sales_Y15Q1 = pd.DataFrame({'Q1Y15':df[(df.Date >= '2015-01-01') & (df.Date < '2015-04-01')].groupby(['Store Number','County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y15Q2 = pd.DataFrame({'Q2Y15':df[(df.Date >= '2015-04-01') & (df.Date < '2015-07-01')].groupby(['Store Number','County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y15Q3 = pd.DataFrame({'Q3Y15':df[(df.Date >= '2015-07-01') & (df.Date < '2015-10-01')].groupby(['Store Number','County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y15Q4 = pd.DataFrame({'Q4Y15':df[(df.Date >= '2015-10-01') & (df.Date < '2016-01-01')].groupby(['Store Number','County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y16Q1 = pd.DataFrame({'Q1Y16':df[(df.Date >= '2016-01-01') & (df.Date < '2016-04-01')].groupby(['Store Number','County', 'Population'])['Sale (Dollars)'].sum()})
Total_Sales = pd.concat([Bottles, VolumeL ,Sales_Y15Q1, Sales_Y15Q2, Sales_Y15Q3, Sales_Y15Q4, Sales_Y16Q1], axis =1, join = 'outer')

Total_Sales.fillna(0,inplace = True)

Total_Sales.head()
Total_Sales.tail()

# print Total_Sales.Q1Y16.sort_values(ascending = False).head()
# print 'Total Sales in Q1Y16 $', Total_Sales.Q1Y16.sort_values(ascending = False).sum()

REVISED:Â¶

I will:

Group the Total_Sales by Counties now,
Add average bottle volume
Average vol / capita And build model based on 4 variables: 3 above and Q1Y15 sales. Q1Y16 is predicted value.

VolumeL = pd.DataFrame({'VolumeL':df.groupby(['County','Population'])['Volume Sold (Liters)'].sum()})
Bottles = pd.DataFrame({'Bottles':df.groupby(['County','Population'])['Bottles Sold'].sum()})
Sales_Y15Q1 = pd.DataFrame({'Q1Y15':df[(df.Date >= '2015-01-01') & (df.Date < '2015-04-01')].groupby(['County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y15Q2 = pd.DataFrame({'Q2Y15':df[(df.Date >= '2015-04-01') & (df.Date < '2015-07-01')].groupby(['County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y15Q3 = pd.DataFrame({'Q3Y15':df[(df.Date >= '2015-07-01') & (df.Date < '2015-10-01')].groupby(['County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y15Q4 = pd.DataFrame({'Q4Y15':df[(df.Date >= '2015-10-01') & (df.Date < '2016-01-01')].groupby(['County', 'Population'])['Sale (Dollars)'].sum()})
Sales_Y16Q1 = pd.DataFrame({'Q1Y16':df[(df.Date >= '2016-01-01') & (df.Date < '2016-04-01')].groupby(['County', 'Population'])['Sale (Dollars)'].sum()})
County_Sales = pd.concat([Bottles, VolumeL ,Sales_Y15Q1, Sales_Y15Q2, Sales_Y15Q3, Sales_Y15Q4, Sales_Y16Q1], axis =1, join = 'outer')
County_Sales.reset_index(level = 1, inplace = True)
County_Sales['AvgBotVol'] = County_Sales.VolumeL / County_Sales.Bottles
County_Sales['AvgVolCap'] = County_Sales.VolumeL / County_Sales['Population']

County_Sales.head()

print Total_Sales.sum()
k = Total_Sales.Q1Y16.sum()/Total_Sales.Q1Y15.sum()
print 'YoY increment from 2015 to 2016:', k
Q2Y16 = Total_Sales.Q2Y15.sum()*Total_Sales.Q1Y16.sum()/Total_Sales.Q1Y15.sum()
print 'Forecast of sales for Q2 Y16: $', Q2Y16
FY16 = Total_Sales.Q3Y15.sum() + Total_Sales.Q4Y15.sum() + Total_Sales.Q1Y16.sum() + Q2Y16
print 'Thus the forecast fot the fiscal year 2016: $', FY16

Bottles    2658010.00
VolumeL    2420108.31
Q1Y15      6147480.47
Q2Y15      7308799.87
Q3Y15      6718786.91
Q4Y15      8295874.82
Q1Y16      6208995.80
dtype: float64
YoY increment from 2015 to 2016: 1.01000659218
Forecast of sales for Q2 Y16: $ 7381936.04963
Thus the forecast fot the fiscal year 2016: $ 28605593.5796

Preliminary result: The forecast amount of sales in FY2016 - USD289,790,002 vs real USD288,908,790

Difference of $881,211, that's 1.2 percent error

For annual reports please click here - https://abd.iowa.gov/annual-reports

Now let me try to run the linear regression on the dataframe with just quarterly sales to predict sales of each store in Q2Y16Â¶

# For forecasting I will be using the per store sales 
# Let's see the shape of the Working_Stores
print County_Sales.shape
County_Sales.head()

(99, 10)

y = County_Sales.Q1Y16
X = County_Sales[['Q1Y15', 'AvgBotVol', 'AvgVolCap']]
# create training and testing vars
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
print X_train.shape, y_train.shape
print X_test.shape, y_test.shape

(69, 3) (69,)
(30, 3) (30,)

# fit a model
lm = linear_model.LinearRegression()
print lm
model = lm.fit(X_train, y_train)
print model
predictions = lm.predict(X_test)
print predictions
print X_train.shape
print y_train.shape

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
[ 26831.94669702  26253.57062783   9798.8960754    7440.79262964
  77241.48878451  30212.24138306  56470.75342467  99048.95006273
  12534.46829682   5823.97060365  13365.85443148  14754.82192909
  74619.23305879  11477.90883066  31153.10972357  30316.68594541
  14418.48062607  11902.39930987  15568.40906036   5363.22662336
  39507.44802931  20207.30998635  28881.96106603  34560.39940023
  20555.32738915   9774.03297333   3946.93126442  26269.6456897
  14161.76473531  17092.16946972]
(69, 3)
(69,)

#validate the model on the cv = k-1
## The line / model
plt.scatter(y_test, predictions)
plt.xlabel("True Values")
plt.ylabel("Predictions")

<matplotlib.text.Text at 0x1048a41d0>

So here are the results and the model is:Â¶

print "Score:", model.score(X_test, y_test)  
print 'Model coefficient is:', lm.coef_
print lm.intercept_
print 'Based on multilinear regression, the forecast for fiscal year 2016 is $'

Score: 0.874231431656
Model coefficient is: [  1.00834934e+00  -1.62269828e+04   1.99972543e+03]
15133.7585652
Based on multilinear regression, the forecast for fiscal year 2016 is $

Thus, the MLR looks like this:Â¶

Sales2016 = (Sales2015)1.042768 + (AvgBotVol)1,123,697 - (AvgVolCap)*6,917.840 - 90,023.Â¶

Let's try to apply for each countyÂ¶

County_Sales['FY2016'] = (County_Sales.Q1Y15+County_Sales.Q2Y15+County_Sales.Q3Y15+County_Sales.Q4Y15)*lm.coef_[0]+County_Sales.AvgBotVol*lm.coef_[1]+County_Sales.AvgVolCap*lm.coef_[2]+lm.intercept_
County_Sales.head()
County_Sales.FY2016.sum()

28733016.270820655

The Final Part - RegularizationÂ¶

from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn import metrics

# Perform 6-fold cross validation
print County_Sales.shape
scores = cross_val_score(lm, X_train, y_train, cv=6)
print "Cross-validated scores:", scores
print "Average: ", scores.mean()

(99, 11)
Cross-validated scores: [ 0.99866802  0.99118501  0.97591599  0.9975724   0.98556389  0.95918441]
Average:  0.984681619309

# LASSO regression

# Let me play with alpha = 0.01

# try alpha=0.01 and examine coefficients
from sklearn.linear_model import Lasso
lassoreg = Lasso(alpha=0.01, normalize=True)
lassoreg.fit(X_train, y_train)
print lassoreg.coef_

[  1.00835052e+00  -1.62254089e+04   1.99899112e+03]

# calculate MSE (for alpha=0.01)
y_pred = lassoreg.predict(X_test)
print np.sqrt(metrics.mean_squared_error(y_test, y_pred))

7344.77250749

# Not bad. Now I am happier.

I hope I am done. I am happier with the current results. I would present this model, if I had to.Â¶

	Date	Store Number	City	Zip Code	County Number	County	Category	Category Name	Vendor Number	Item Number	Item Description	Bottle Volume (ml)	State Bottle Cost	State Bottle Retail	Bottles Sold	Sale (Dollars)	Volume Sold (Liters)	Volume Sold (Gallons)
0	11/04/2015	3717	SUMNER	50674	9.0	Bremer	1051100.0	APRICOT BRANDIES	55	54436	Mr. Boston Apricot Brandy	750	$4.50	$6.75	12	$81.00	9.0	2.38
1	03/02/2016	2614	DAVENPORT	52807	82.0	Scott	1011100.0	BLENDED WHISKIES	395	27605	Tin Cup	750	$13.75	$20.63	2	$41.26	1.5	0.40
2	02/11/2016	2106	CEDAR FALLS	50613	7.0	Black Hawk	1011200.0	STRAIGHT BOURBON WHISKIES	65	19067	Jim Beam	1000	$12.59	$18.89	24	$453.36	24.0	6.34
3	02/03/2016	2501	AMES	50010	85.0	Story	1071100.0	AMERICAN COCKTAILS	395	59154	1800 Ultimate Margarita	1750	$9.50	$14.25	6	$85.50	10.5	2.77
4	08/18/2015	3654	BELMOND	50421	99.0	Wright	1031080.0	VODKA 80 PROOF	297	35918	Five O'clock Vodka	1750	$7.20	$10.80	12	$129.60	21.0	5.55

	Date	Store Number	City	Zip Code	County Number	County	Category	Category Name	Vendor Number	Item Number	Item Description	Bottle Volume (ml)	State Bottle Cost	State Bottle Retail	Bottles Sold	Sale (Dollars)	Volume Sold (Liters)	Volume Sold (Gallons)	month	year
0	2015-11-04	3717	SUMNER	50674	9.0	Bremer	1051100.0	APRICOT BRANDIES	55	54436	Mr. Boston Apricot Brandy	750	4.50	6.75	12	81.00	9.0	2.38	11	2015
1	2016-03-02	2614	DAVENPORT	52807	82.0	Scott	1011100.0	BLENDED WHISKIES	395	27605	Tin Cup	750	13.75	20.63	2	41.26	1.5	0.40	3	2016
2	2016-02-11	2106	CEDAR FALLS	50613	7.0	Black Hawk	1011200.0	STRAIGHT BOURBON WHISKIES	65	19067	Jim Beam	1000	12.59	18.89	24	453.36	24.0	6.34	2	2016
3	2016-02-03	2501	AMES	50010	85.0	Story	1071100.0	AMERICAN COCKTAILS	395	59154	1800 Ultimate Margarita	1750	9.50	14.25	6	85.50	10.5	2.77	2	2016
4	2015-08-18	3654	BELMOND	50421	99.0	Wright	1031080.0	VODKA 80 PROOF	297	35918	Five O'clock Vodka	1750	7.20	10.80	12	129.60	21.0	5.55	8	2015

	Population
County
Adair	7228
Adams	3796
Allamakee	13886
Appanoose	12529
Audubon	5773

	Date	Store Number	City	Zip Code	County Number	County	Category	Category Name	Vendor Number	Item Number	...	Bottle Volume (ml)	State Bottle Cost	State Bottle Retail	Bottles Sold	Sale (Dollars)	Volume Sold (Liters)	Volume Sold (Gallons)	month	year	Population
0	2015-11-04	3717	SUMNER	50674	9.0	Bremer	1051100.0	APRICOT BRANDIES	55	54436	...	750	4.50	6.75	12	81.00	9.0	2.38	11	2015	24722
15	2015-06-10	3717	SUMNER	50674	9.0	Bremer	1051100.0	APRICOT BRANDIES	55	54436	...	750	4.50	6.75	4	27.00	3.0	0.79	6	2015	24722
226	2016-02-24	4670	TRIPOLI	50676	9.0	Bremer	1011300.0	TENNESSEE WHISKIES	85	26827	...	1000	18.38	27.57	6	165.42	6.0	1.59	2	2016	24722
546	2016-02-02	2651	WAVERLY	50677	9.0	Bremer	1071100.0	AMERICAN COCKTAILS	395	58838	...	1750	8.20	12.30	6	73.80	10.5	2.77	2	2016	24722
579	2015-03-10	2651	WAVERLY	50677	9.0	Bremer	1022100.0	TEQUILA	434	89387	...	1000	6.92	10.38	12	124.56	12.0	3.17	3	2015	24722

	Bottles Sold	Sale (Dollars)	Volume Sold (Liters)	av_price_unit	av_vol_unit	Population	County	liquor_capita
Store Number
2106	12573	176517.45	11836.10	14.039406	0.941390	80133.0	Dallas	0.147706
2113	830	11376.12	836.85	13.706169	1.008253	26643.0	Boone	0.031410
2130	9144	139440.02	8436.27	15.249346	0.922602	102782.0	Woodbury	0.082079
2152	670	8625.74	720.87	12.874239	1.075925	97125.0	Dubuque	0.007422
2178	2408	29912.68	2437.92	12.422209	1.012425	10676.0	Guthrie	0.228355

			Bottles	VolumeL	Q1Y15	Q2Y15	Q3Y15	Q4Y15	Q1Y16
Store Number	County	Population
9002	Scott	172126	2286	1291.5	4865.16	10206.36	10180.68	12448.32	1382.64
9010	Warren	48626	360	221.4	3861.60	1155.60	740.76	0.00	177.48
9013	Warren	48626	264	201.0	0.00	677.16	165.12	1375.20	361.56
9018	Dubuque	97125	300	225.0	0.00	0.00	0.00	6435.00	0.00
9023	Carroll	20498	24	18.0	0.00	0.00	0.00	0.00	633.36

	Population	Bottles	VolumeL	Q1Y15	Q2Y15	Q3Y15	Q4Y15	Q1Y16	AvgBotVol	AvgVolCap
County
Adair	7228	4638	4350.84	10222.00	10820.66	11879.83	10730.49	11797.02	0.938085	0.601942
Adams	3796	955	981.38	2253.60	2015.54	3504.52	2471.48	2196.57	1.027623	0.258530
Allamakee	13886	7450	7868.04	14907.98	23686.20	19409.71	21914.58	19081.53	1.056113	0.566617
Appanoose	12529	8171	8038.16	15228.09	25218.66	20258.66	16328.23	21396.23	0.983743	0.641564
Audubon	5773	1694	1717.80	5510.75	2783.12	3239.32	4896.07	3226.28	1.014050	0.297558

	Population	Bottles	VolumeL	Q1Y15	Q2Y15	Q3Y15	Q4Y15	Q1Y16	AvgBotVol	AvgVolCap
County
Adair	7228	4638	4350.84	10222.00	10820.66	11879.83	10730.49	11797.02	0.938085	0.601942
Adams	3796	955	981.38	2253.60	2015.54	3504.52	2471.48	2196.57	1.027623	0.258530
Allamakee	13886	7450	7868.04	14907.98	23686.20	19409.71	21914.58	19081.53	1.056113	0.566617
Appanoose	12529	8171	8038.16	15228.09	25218.66	20258.66	16328.23	21396.23	0.983743	0.641564
Audubon	5773	1694	1717.80	5510.75	2783.12	3239.32	4896.07	3226.28	1.014050	0.297558