Python – リスト内包表記(List comprehension)

今回は業務の中で頻繁に使用する、特にjupyter notebookにてよく使用するリスト内包表記(List comprehension)について記載していきます。

0. 前準備

某ML courseの問題をSampleとして使用していきます。

In this exercise you’ll try to build a neural network that predicts the price of a house according to a simple formula.

Imagine that house pricing is as easy as:

A house has a base cost of 50k, and every additional bedroom adds a cost of 50k. This will make a 1 bedroom house cost 100k, a 2 bedroom house cost 150k etc.

How would you create a neural network that learns this relationship so that it would predict a 7 bedroom house as costing close to 400k etc.

1. シンプルなfor loop

まず初めに、シンプルなfor loopでtraining dataを作成してみます。

In [2]: list_x1 = []
   ...: for i in range(1, 11):
   ...:     if i < 7 or i == 10:
   ...:         list_x1.append(i)
   ...:
   ...: xs = np.array(list_x1, dtype=float)
   ...: xs
Out[2]: array([ 1.,  2.,  3.,  4.,  5.,  6., 10.])

設問では7bedroom houseの価格を予測したいとのことなので、7番目を除外し、精度を上げるために10bedroomも追加しています。また、tensorflowで予測するので、numpy配列を作成します。

2. リスト内包表記(List comprehension) – 可読性重視

では、本題のリスト内包表記についてです。少しだけ可読性を重視するために、numpyに配列をセットする前に、配列の変数を作成してみます。

In [3]: list_x2 = [i for i in range(1, 11) if i < 7 or i == 10]
   ...: list_x2
Out[3]: [1, 2, 3, 4, 5, 6, 10]

次にnumpyにセットしていきます。

In [4]: xs = np.array(list_x2, dtype=float)
   ...: xs
Out[4]: array([ 1.,  2.,  3.,  4.,  5.,  6., 10.])

ということで、まずはfeatureデータが作成できました。次にtargetデータを作成していきます。

3. リスト内包表記(List comprehension)

リスト内包表記を使用する理由は、コード量を減らしてシンプルにすることだと思うので、こちらが業務でよく使用するパターンかなと思います。

In [5]: ys = np.array([y * 50 + 50 for y in range(1, 11) if y < 7 or y == 10], dtype=float)
   ...: ys
Out[5]: array([100., 150., 200., 250., 300., 350., 550.])

とてもシンプルなコードでtargetデータが作成できたと思います。

4. おまけ (tensorflow model作成)

おまけとしてtensorflowでmodelを作成してみます。まずは、ライブラリーをインストールしてから、modelを作成していきます。

In [6]: import tensorflow as tf

In [7]: model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])
2022-xx-xx xx:xx:xx.531173: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

作成したmodelをコンパイルします。

In [8]: # Compile your model
   ...: # Set the optimizer to Stochastic Gradient Descent
   ...: # and use Mean Squared Error as the loss function
   ...: model.compile(optimizer='sgd', loss='mean_squared_error')

実際にtrainしていきます。

In [10]: # Train your model for 1000 epochs by feeding the i/o tensors
    ...: model.fit(xs, ys, epochs=1000)
Epoch 995/1000
1/1 [==============================] - 0s 1ms/step - loss: 0.0083
Epoch 996/1000
1/1 [==============================] - 0s 1ms/step - loss: 0.0082
Epoch 997/1000
1/1 [==============================] - 0s 1ms/step - loss: 0.0082
Epoch 998/1000
1/1 [==============================] - 0s 989us/step - loss: 0.0081
Epoch 999/1000
1/1 [==============================] - 0s 1ms/step - loss: 0.0080
Epoch 1000/1000
1/1 [==============================] - 0s 994us/step - loss: 0.0079
Out[9]: <keras.callbacks.History at 0x7fcd4d72a940>

1000回実行するとlossが少なくなり精度が向上しているのがわかります。業務でこんなに高い数値は見たことがありませんが。。

最後にtrainしたmodelでpredictしてみます。

In [10]: new_y = 7.0
    ...: prediction = model.predict([new_y])[0]
    ...: prediction
1/1 [==============================] - 0s 82ms/step
Out[10]: array([400.0246], dtype=float32)

これだけの精度が出れば、素晴らしいですね。

まとめ

ということで今回は、Python – リスト内包表記(List comprehension)について記載してみました。途中、tensorflowでのhellow worldになってしまいましたが、以上となります。

(Visited 37 times, 1 visits today)