博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
【Tensorflow】训练多特征模型
阅读量:4171 次
发布时间:2019-05-26

本文共 5122 字,大约阅读时间需要 17 分钟。

说明:

这个学习代码来自于Google的COLAB, 原链接点击.

课程目的:

1、用多个特征代替单个特征,来提高模型的有效性

2、调试输入数据的异常值
3、用测试集验证模型是否过度拟合于验证集

准备环境,如果不清楚可以看我写的第一篇博客,点击

特征预处理:

检查数据:

打乱代码:

california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe.index))

模型代码:

def train_model(learning_rate, steps, batch_size, training_examples,                 training_targets, validation_examples, validation_targets):    """ Trains a linear regression model of multiple features.         In addition to training, this function also prints training progress information,     as well as a plot of tht training and validation loss over time.    Args:        learning_rate: A float, the learning rate        steps: A non-zero int, the total number of training steps. A training step         consists of a forward and backward pass using a single batch.         batch_size: A non-zero int, the batch size         training_examples: A dataframe containing one or more columns from california_housing_dataframe to use         as input feature for training         trainging_targets: A dataframe containing exactly one column from california_housing_dataframe to use         as a target for training.         validation_examples: A dataframe containing one or more columns from california_housing_dataframe to use         as input feature for validation         validation_targets: A dataframe containing exactly one column from california_housing_dataframe to use         as a target for validation.     """         # step1: initialize some data and prepare input function         periods = 10     steps_per_period = steps / periods        # Create a linear regressor object.    my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)    my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)    linear_regressor  = tf.estimator.LinearRegressor(        feature_columns=construct_feature_columns(training_examples),        optimizer=my_optimizer    )        # Create input function     training_input_fn = lambda: my_input_fn(features=training_examples,                                             targets=training_targets["median_house_value"],                                             batch_size=batch_size)    predict_training_input_fn = lambda: my_input_fn(features=training_examples,                                                     targets=training_targets["median_house_value"],                                                     batch_size=batch_size,                                                    num_epochs=1,                                                    shuffle=False)    predict_validation_input_fn = lambda: my_input_fn(features=validation_examples,                                                       targets=validation_targets["median_house_value"],                                                       batch_size=batch_size,                                                      num_epochs=1,                                                      shuffle=False)        #     print('Training model...')    print('RMSE( on the training data):')        training_rmse = []    validation_rmse = []    for period in range(0, periods):        linear_regressor.train(            input_fn=training_input_fn,            steps=steps_per_period,        )                # 2. Take a break and compute predictions        training_predictions = linear_regressor.predict(input_fn=predict_training_input_fn)        training_predictions = np.array([item['predictions'][0] for item in training_predictions])                validation_predictions = linear_regressor.predict(input_fn=predict_validation_input_fn)        validation_predictions = np.array([item['predictions'][0] for item in validation_predictions])                           # Compute the training and validition loss         training_root_mean_squared_error = math.sqrt(            metrics.mean_squared_error(training_predictions, training_targets))        validation_root_mean_squared_error = math.sqrt(            metrics.mean_squared_error(validation_predictions, validation_targets))                # Occasionally print the current loss         print('Period %02d : %.02f' % (period, training_root_mean_squared_error))                # Add the loss metrics from this period to our list.         training_rmse.append(training_root_mean_squared_error)        validation_rmse.append(validation_root_mean_squared_error)            print('Model training finished')        # Output a graph of loss metrics over periods.    plt.ylabel('RMSE')    plt.xlabel('Periods')    plt.title('Root Mean Squared Error vs Periods')    plt.tight_layout()    plt.plot(training_rmse, label="training")    plt.plot(validation_rmse, label="validation")    plt.legend()        return linear_regressor

个人几点总结:

1、数据训练分析关键步骤: 基础检查---特征预处理---异常剔除---编写输入函数---构建tf特征列---模型训练---模型优化---模型测试集测试---模型优化

2、在输入方法测试模型时候注意两个参数 num_epochs=1和shuffle=False,因为测试集上模型运行一遍就OK。
3、在分割训练集合和验证集合数据时候,一定要将原始数据打乱
4、在输入my_input_fn时候,targets要加上特征,应该是targets['median_house_value'],这样就是Pandas的series格式
5、Google公开的代码无论是范例或者开源代码,代码的质量都很高,这也是我为什么要自己花时间重写一遍。

转载地址:http://rekai.baihongyu.com/

你可能感兴趣的文章
2018.5.52
查看>>
《python基础教程》答案(第四章)
查看>>
2018.5.53
查看>>
2018.5.54
查看>>
2018.5.55
查看>>
2018.5.58
查看>>
2018.12.5
查看>>
2018.12.6
查看>>
人智导(四):约束满足问题
查看>>
2018.12.7
查看>>
2018.12.8
查看>>
2018.12.9
查看>>
2018.12.29
查看>>
2018.12.31
查看>>
2019.1.2
查看>>
2019.1.4
查看>>
2019.1.9
查看>>
2019.1.12
查看>>
Java语言程序设计与数据结构》编程练习答案(第二十章)(二)
查看>>
2019.2.25
查看>>