本文共 5122 字,大约阅读时间需要 17 分钟。
这个学习代码来自于Google的COLAB, 原链接点击.
1、用多个特征代替单个特征,来提高模型的有效性
2、调试输入数据的异常值 3、用测试集验证模型是否过度拟合于验证集准备环境,如果不清楚可以看我写的第一篇博客,点击
特征预处理:
检查数据:
打乱代码:
california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe.index))
模型代码:
def train_model(learning_rate, steps, batch_size, training_examples, training_targets, validation_examples, validation_targets): """ Trains a linear regression model of multiple features. In addition to training, this function also prints training progress information, as well as a plot of tht training and validation loss over time. Args: learning_rate: A float, the learning rate steps: A non-zero int, the total number of training steps. A training step consists of a forward and backward pass using a single batch. batch_size: A non-zero int, the batch size training_examples: A dataframe containing one or more columns from california_housing_dataframe to use as input feature for training trainging_targets: A dataframe containing exactly one column from california_housing_dataframe to use as a target for training. validation_examples: A dataframe containing one or more columns from california_housing_dataframe to use as input feature for validation validation_targets: A dataframe containing exactly one column from california_housing_dataframe to use as a target for validation. """ # step1: initialize some data and prepare input function periods = 10 steps_per_period = steps / periods # Create a linear regressor object. my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0) linear_regressor = tf.estimator.LinearRegressor( feature_columns=construct_feature_columns(training_examples), optimizer=my_optimizer ) # Create input function training_input_fn = lambda: my_input_fn(features=training_examples, targets=training_targets["median_house_value"], batch_size=batch_size) predict_training_input_fn = lambda: my_input_fn(features=training_examples, targets=training_targets["median_house_value"], batch_size=batch_size, num_epochs=1, shuffle=False) predict_validation_input_fn = lambda: my_input_fn(features=validation_examples, targets=validation_targets["median_house_value"], batch_size=batch_size, num_epochs=1, shuffle=False) # print('Training model...') print('RMSE( on the training data):') training_rmse = [] validation_rmse = [] for period in range(0, periods): linear_regressor.train( input_fn=training_input_fn, steps=steps_per_period, ) # 2. Take a break and compute predictions training_predictions = linear_regressor.predict(input_fn=predict_training_input_fn) training_predictions = np.array([item['predictions'][0] for item in training_predictions]) validation_predictions = linear_regressor.predict(input_fn=predict_validation_input_fn) validation_predictions = np.array([item['predictions'][0] for item in validation_predictions]) # Compute the training and validition loss training_root_mean_squared_error = math.sqrt( metrics.mean_squared_error(training_predictions, training_targets)) validation_root_mean_squared_error = math.sqrt( metrics.mean_squared_error(validation_predictions, validation_targets)) # Occasionally print the current loss print('Period %02d : %.02f' % (period, training_root_mean_squared_error)) # Add the loss metrics from this period to our list. training_rmse.append(training_root_mean_squared_error) validation_rmse.append(validation_root_mean_squared_error) print('Model training finished') # Output a graph of loss metrics over periods. plt.ylabel('RMSE') plt.xlabel('Periods') plt.title('Root Mean Squared Error vs Periods') plt.tight_layout() plt.plot(training_rmse, label="training") plt.plot(validation_rmse, label="validation") plt.legend() return linear_regressor
1、数据训练分析关键步骤: 基础检查---特征预处理---异常剔除---编写输入函数---构建tf特征列---模型训练---模型优化---模型测试集测试---模型优化
2、在输入方法测试模型时候注意两个参数 num_epochs=1和shuffle=False,因为测试集上模型运行一遍就OK。 3、在分割训练集合和验证集合数据时候,一定要将原始数据打乱 4、在输入my_input_fn时候,targets要加上特征,应该是targets['median_house_value'],这样就是Pandas的series格式 5、Google公开的代码无论是范例或者开源代码,代码的质量都很高,这也是我为什么要自己花时间重写一遍。转载地址:http://rekai.baihongyu.com/