イラストを艦これとアズールレーンに分類する

アジェンダ

nijiflowをベースに、機械学習で艦これとアズールレーンの画像の分類器を作り、学習済みグラフ等を公開した。
また、これのTwitterBotを作成し公開している。
データセットはPixivのタグ付け済み画像情報を元に作成した。
艦これ・アズールレーンそれぞれ6500画像を使用した。
学習は、GPU(google colaboratory)環境にて約48h行い、約10000epocを実行した。
(テストデータによる)認識率は8割を超える。



結果

INFO:tensorflow:Restoring parameters from drive/fleetclassify/checkpoints/fleetclassify_v1_1.0_224_8/model.ckpt-110969

eval/Precision[0.855130792]
eval/Recall[0.86558044]
eval/Accuracy[0.862]
INFO:tensorflow:Finished evaluation at 2018-05-07-12:22:21



TIPS:

- colaboratoryを使いすると規約違反でGPUが使用停止されるのでやめよう
- colaboratoryで大きい学習データをdriveに置いて使うと90分で止まることがある
- MobileNet/nijiflowの学習の再開はコマンドを一行省略でできる

TODO: 

- nijinetの98%(?)から見ると精度がまだ低いように思う
- 学習済みグラフの学習状況をplotするスクリプトが欲しい


艦これとアズールレーンの画像を分類する分類器を作りました。
スクリプトと学習済みグラフを公開しています。
( https://github.com/MichinariNukazawa/daisy_fleetclassify
また、分類器を気軽に試せるように、twitter bot(@DFleetclassify)を公開しています。
(が、簡易なおためし用Botなので専用サーバも立てておらず、不安定かつ不定休です。)

fleetclassifyは nijiflow を元にしています。nijiflow自体は、MobileNetを用いた、転移学習による2D/3Dイラスト判別器です。詳細はリンク先参照。
また、nijiflowの詳細は、SIG2Dにより頒布され、後日公開されるSIG2D Letter1に記載されています。

公開前なのもあって、自分がやった作業についてざっくり書き残しておきます。

fleetclassifyの精度は8割程度で、nijiflowと比べると低いです。(nijiflowはMobileNetを使っていながら98%を実現している。)


# 学習前までの手順

SIG2D Letter1が公開前なので、その前処理までの手順を記載する。
主には、TensorflowのGPU実行環境の作成と、 学習データ作成を行う。

# 環境構築 

```
sudo apt install python3-pip -y
pip3 install tensorflow-gpu
```


# nijiflowの取得

git clone --depth=1 -b niji https://github.com/fallthrough/models


# 学習データ作成

## 学習用画像の収集

pixivより学習データに使用する画像を収集した。
https://github.com/MichinariNukazawa/pixivpy_wrapper

pixivpy_wrapperリポジトリ内の
`kancolle.sh`により`${HOME}/pixiv_data/image__艦これ`が、
`azure.sh`により`${HOME}/pixiv_data/image__アズールレーン`
が作成される。
ディレクトリには、画像ファイルとそのメタデータ`data.json`が作成される。
雑事として、`data.json`は厳密なjsonではないので、読み込みの際に小細工が必要。


## データの前処理

nijiflowは、
`models/research/slim/create_niji_dataset.py`
でTensorFlowの学習データバイナリを生成している。
これは、nijiflow独自(?)の単純なフォーマットの画像一覧テキストファイルを読みこむ。
pixivpy_wrapperのダウンロードデータを、このnijiflowデータファイルに変換する。
これにより、nijiflowの前処理にそのまま乗っかることができる。

変換を行う`util/nijiflow_source_from_path.py`を書いた。
pixivpy_wrapperのダウンロード・ディレクトリから、`data.json`を読んで`nijiflow.list`を作成する。

```
python3 nijiflow_source_from_path.py \
    0 ${HOME}/pixiv_data/image__艦これ         アズールレーン,アズレン
python3 nijiflow_source_from_path.py \
    1 ${HOME}/pixiv_data/image__アズールレーン 艦これ,艦隊これくしょん
```
それぞれの指定ディレクトリ内に`nijiflow.list`が書き出される。

大筋は以下の通り。
- イラスト以外(うごイラ、漫画)の除外
- アズールレーンx艦これの画像の除外
 (艦これアズレンのキャラクターが一枚の絵に入っているような、クロスオーバー的な二次創作イラストを学習から除くため)
-jpg以外の画像の除外
 (pixiv画像にはpngが含まれている。nijiflowがjpgのみ使用していたので、単に除外した)
-(TODO) image.modeがRGBでない画像の除外
 ("L"が20ファイルほど混じっている模様)

ファイルフォーマットは以下の通り。
一行ごとに1つ、ソースファイルの相対パスと分類ID。半角空白(?)で区切るtsvファイルフォーマット。
```tsv:nijiflow.list
68375709_p0.jpg 0
68375681_p0.jpg 0
68375623_p0.jpg 0
68375609_p0.jpg 0
68375495_p0.jpg 0
68375432_p0.jpg 0
```


## 学習データ化

nijiflowのスクリプトを使ってtensorflowの学習データセットを作成する。
データセットは複数のファイルを持ったディレクトリである。
```
python3 models/research/slim/create_niji_dataset.py \
    --output_dir=drive/fleetclassify/fleetclassify_dataset \
    ${HOME}/pixiv_data/image__艦これ/nijiflow.list \
    ${HOME}/pixiv_data/image__アズールレーン/nijiflow.list
```


# 学習の実行

TensorFlow-gpuが、CUDA9.0を要求する。Ubuntu18.04で作業していたのだが、デフォルトのCUDA9.1では駄目とのこと。
(それでCUDAの入れ替えインストール中にUbuntu環境を壊してしまったので)ここから先はgoogle collabにて行った。

ドライバだけ入れないことで回避できるとのこと。
https://medium.com/@taylordenouden/installing-tensorflow-gpu-on-ubuntu-18-04-89a142325138


## チェックポイント(学習済み元モデルデータ)を取得

nijinetの手順に従い、チェックポイントをダウンロードして展開する。
```
mkdir -p drive/fleetclassify/checkpoints/pretrained
pushd drive/fleetclassify/checkpoints/pretrained
wget http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz
tar xvzf mobilenet_v1_1.0_224_2017_06_14.tar.gz
popd
```


以上。この後は学習を行う。



# 以下、作業中に出会ったエラーとその解決



# 9:

 複数枚の投稿を取り除く。漫画はともかく、まとめは他作品のキャラも入っているので。ファイルの事前圧縮は、少なくともデータセットのサイズには影響なかった。




####

```
WARNING:tensorflow:From /home/nuka/.local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
WARNING:tensorflow:From train_image_classifier.py:400: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
Traceback (most recent call last):
  File "train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/home/nuka/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "train_image_classifier.py", line 406, in main
    FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir)
  File "/home/nuka/flow/models/research/slim/datasets/dataset_factory.py", line 59, in get_dataset
    reader)
  File "/home/nuka/flow/models/research/slim/datasets/niji.py", line 85, in get_split
    with open(os.path.join(dataset_dir, 'metadata.json')) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/tmp/datasets/niji/metadata.json'
```
単なるデータセット引数の指定ミスだった。
`--dataset_dir=${HOME}/flow/nijiflow_data/fleetclassify \`




####

```
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/MobilenetV1/Logits/Dropout_1b/dropout/div_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
     [[Node: gradients/MobilenetV1/Logits/Dropout_1b/dropout/div_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/MobilenetV1/Logits/Dropout_1b/dropout/div_grad/Shape, gradients/MobilenetV1/Logits/Dropout_1b/dropout/div_grad/Shape_1)]]

```
あらかじめGPU学習が設定されていたエラー。GPU版をインストールすることで解決。
`pip3 install --upgrade tensorflow-gpu`

####

```
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.
```
ローカルのUbuntu18.04環境が原因だった模様。
cuda9.1では駄目で、cuda9.0を入れなければならない、とのことです。
https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1704&target_type=debnetwork
`sudo apt-get install cuda-9-0 `


####



```
INFO:tensorflow:global step 380: loss = 0.6584 (1.684 sec/step)
INFO:tensorflow:global step 390: loss = 0.6717 (1.288 sec/step)
INFO:tensorflow:global step 400: loss = 0.6621 (2.100 sec/step)
INFO:tensorflow:global step 410: loss = 0.6558 (1.993 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.DataLossError'>, corrupted record at 78840124
     [[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "models/research/slim/train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "models/research/slim/train_image_classifier.py", line 572, in main
    sync_optimizer=optimizer if FLAGS.sync_replicas else None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 784, in train
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 828, in stop
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
    enqueue_callable()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1249, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 78840124
     [[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]
```

```

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.DataLossError'>, corrupted record at 99555679
     [[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]
INFO:tensorflow:Caught OutOfRangeError. Stopping Training. FIFOQueue '_3_prefetch_queue/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
     [[Node: fifo_queue_Dequeue = QueueDequeueV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefetch_queue/fifo_queue)]]

Caused by op 'fifo_queue_Dequeue', defined at:
  File "models/research/slim/train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "models/research/slim/train_image_classifier.py", line 474, in main
    clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
  File "/content/models/research/slim/deployment/model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "models/research/slim/train_image_classifier.py", line 456, in clone_fn
    images, labels = batch_queue.dequeue()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 440, in dequeue
    self._queue_ref, self._dtypes, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3730, in queue_dequeue_v2
    timeout_ms=timeout_ms, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_3_prefetch_queue/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
     [[Node: fifo_queue_Dequeue = QueueDequeueV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefetch_queue/fifo_queue)]]

INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "models/research/slim/train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "models/research/slim/train_image_classifier.py", line 572, in main
    sync_optimizer=optimizer if FLAGS.sync_replicas else None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 784, in train
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 828, in stop
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
    enqueue_callable()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1249, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 99555679
     [[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]

```
テストデータからEXIFの壊れたファイルを除いた。それが原因だったと思われる。
(テストデータ生成部はEXIFが壊れているのをチェック等していない。学習中に使っているのかは不明だが。)

####

```
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, drive/fleetflow/fleetclassify_dataset/niji_train_00000-of-00100.tfrecord; No such file or directory
     [[Node: parallel_read/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](parallel_read/TFRecordReaderV2, parallel_read/filenames)]]
```
GoogleDriveのファイルが壊れていたのが原因だった。上げ直して解決。




####


```
INFO:tensorflow:Restoring parameters from drive/fleetflow/checkpoints/pretrained/mobilenet_v1_1.0_224.ckpt
2018-04-30 02:46:43.464872: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Out of range: Read less bytes than requested
Traceback (most recent call last):
  File "models/research/slim/train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "models/research/slim/train_image_classifier.py", line 572, in main
    sync_optimizer=optimizer if FLAGS.sync_replicas else None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 747, in train
    master, start_standard_services=False, config=session_config) as sess:
  File "/usr/lib/python3.6/contextlib.py", line 83, in __enter__
    raise RuntimeError("generator didn't yield") from None
RuntimeError: generator didn't yield
```
不明。とりあえず再実行中で解決?。



#### google colaboratoryが切れた場合

UbuntuのFireFoxで、ダイアログが出なかったので気づかなかった。
```
INFO:tensorflow:global step 130: loss = 0.6706 (7.589 sec/step)
INFO:tensorflow:global step 140: loss = 0.5755 (2.013 sec/step)
INFO:tensorflow:Saving checkpoint to path drive/fleetflow/checkpoints/fleetclassify_v1_1.0_224_1/model.ckpt
INFO:tensorflow:global_step/sec: 0.233577
INFO:tensorflow:Recording summary at step 141.
2018-04-30 04:03:20.286157: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:134 : Unknown: drive/fleetflow/checkpoints/fleetclassify_v1_1.0_224_1/model.ckpt-140.data-00000-of-00001.tempstate1704126194757243596; Input/output error
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.UnknownError'>, drive/fleetflow/fleetclassify_dataset/niji_train_00017-of-00100.tfrecord; Input/output error
     [[Node: parallel_read/ReaderReadV2_3 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](parallel_read/TFRecordReaderV2_3, parallel_read/filenames)]]
INFO:tensorflow:global_step/sec: 0.000113881
INFO:tensorflow:Caught OutOfRangeError. Stopping Training. FIFOQueue '_2_prefetch_queue/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
     [[Node: fifo_queue_Dequeue = QueueDequeueV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefetch_queue/fifo_queue)]]

Caused by op 'fifo_queue_Dequeue', defined at:
  File "models/research/slim/train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "models/research/slim/train_image_classifier.py", line 474, in main
    clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
  File "/content/models/research/slim/deployment/model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "models/research/slim/train_image_classifier.py", line 456, in clone_fn
    images, labels = batch_queue.dequeue()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 440, in dequeue
    self._queue_ref, self._dtypes, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3730, in queue_dequeue_v2
    timeout_ms=timeout_ms, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_2_prefetch_queue/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
     [[Node: fifo_queue_Dequeue = QueueDequeueV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefetch_queue/fifo_queue)]]

INFO:tensorflow:Finished training! Saving model to disk.
2018-04-30 06:24:16.213965: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:109 : Unknown: drive/fleetflow/checkpoints/fleetclassify_v1_1.0_224_1; Input/output error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: drive/fleetflow/checkpoints/fleetclassify_v1_1.0_224_1; Input/output error
     [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, MobilenetV1/Conv2d_0/BatchNorm/beta, MobilenetV1/Conv2d_0/BatchNorm/beta/RMSProp, MobilenetV1/Conv2d_0/BatchNorm/beta/RMSProp_1, MobilenetV1/Conv2d_0/BatchNorm/gamma, MobilenetV1/Conv2d_0/BatchNorm/gamma/RMSProp, MobilenetV1/Conv2d_0/BatchNorm/gamma/RMSProp_1, MobilenetV1/Conv2d_0/BatchNorm/moving_mean, MobilenetV1/Conv2d_0/BatchNorm/moving_variance, MobilenetV1/Conv2d_0/weights, MobilenetV1/Conv2d_0/weights/RMSProp, MobilenetV1/Conv2d_0/weights/RMSProp_1, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta/RMSProp, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta/RMSProp_1,

MobilenetV1/Conv2d_9_pointwise/BatchNorm/gamma/RMSProp, MobilenetV1/Conv2d_9_pointwise/BatchNorm/gamma/RMSProp_1, MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_mean, MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_variance, MobilenetV1/Conv2d_9_pointwise/weights, MobilenetV1/Conv2d_9_pointwise/weights/RMSProp, MobilenetV1/Conv2d_9_pointwise/weights/RMSProp_1, MobilenetV1/Logits/Conv2d_1c_1x1/biases, MobilenetV1/Logits/Conv2d_1c_1x1/biases/RMSProp, MobilenetV1/Logits/Conv2d_1c_1x1/biases/RMSProp_1, MobilenetV1/Logits/Conv2d_1c_1x1/weights, MobilenetV1/Logits/Conv2d_1c_1x1/weights/RMSProp, MobilenetV1/Logits/Conv2d_1c_1x1/weights/RMSProp_1, global_step)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 990, in managed_session
    yield sess
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 780, in train
    sv.saver.save(sess, sv.save_path, global_step=sv.global_step)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1676, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1140, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: drive/fleetflow/checkpoints/fleetclassify_v1_1.0_224_1; Input/output error
     [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, MobilenetV1/Conv2d_0/BatchNorm/beta, MobilenetV1/Conv2d_0/BatchNorm/beta/RMSProp, MobilenetV1/Conv2d_0/BatchNorm/beta/RMSProp_1, MobilenetV1/Conv2d_0/BatchNorm/gamma, MobilenetV1/Conv2d_0/BatchNorm/gamma/RMSProp, MobilenetV1/Conv2d_0/BatchNorm/gamma/RMSProp_1, MobilenetV1/Conv2d_0/BatchNorm/moving_mean, MobilenetV1/Conv2d_0/BatchNorm/moving_variance, MobilenetV1/Conv2d_0/weights, MobilenetV1/Conv2d_0/weights/RMSProp, MobilenetV1/Conv2d_0/weights/RMSProp_1, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta/RMSProp, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta/RMSProp_1, MobilenetV1/Conv2d_10_depthwise/BatchNorm/gamma,

MobilenetV1/Conv2d_9_pointwise/BatchNorm/gamma/RMSProp_1, MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_mean, MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_variance, MobilenetV1/Conv2d_9_pointwise/weights, MobilenetV1/Conv2d_9_pointwise/weights/RMSProp, MobilenetV1/Conv2d_9_pointwise/weights/RMSProp_1, MobilenetV1/Logits/Conv2d_1c_1x1/biases, MobilenetV1/Logits/Conv2d_1c_1x1/biases/RMSProp, MobilenetV1/Logits/Conv2d_1c_1x1/biases/RMSProp_1, MobilenetV1/Logits/Conv2d_1c_1x1/weights, MobilenetV1/Logits/Conv2d_1c_1x1/weights/RMSProp, MobilenetV1/Logits/Conv2d_1c_1x1/weights/RMSProp_1, global_step)]]

Caused by op 'save/SaveV2', defined at:
  File "models/research/slim/train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "models/research/slim/train_image_classifier.py", line 563, in main
    saver=tf.train.Saver(max_to_keep=1000000),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1311, in __init__
    self.build()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1320, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1357, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 806, in _build_internal
    save_tensor = self._AddSaveOps(filename_tensor, saveables)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 326, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 242, in save_op
    tensors)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1680, in save_v2
    shape_and_slices=shape_and_slices, tensors=tensors, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

UnknownError (see above for traceback): drive/fleetflow/checkpoints/fleetclassify_v1_1.0_224_1; Input/output error
     [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, MobilenetV1/Conv2d_0/BatchNorm/beta, MobilenetV1/Conv2d_0/BatchNorm/beta/RMSProp, MobilenetV1/Conv2d_0/BatchNorm/beta/RMSProp_1, MobilenetV1/Conv2d_0/BatchNorm/gamma, MobilenetV1/Conv2d_0/BatchNorm/gamma/RMSProp, MobilenetV1/Conv2d_0/BatchNorm/gamma/RMSProp_1, MobilenetV1/Conv2d_0/BatchNorm/moving_mean, MobilenetV1/Conv2d_0/BatchNorm/moving_variance, MobilenetV1/Conv2d_0/weights, MobilenetV1/Conv2d_0/weights/RMSProp, MobilenetV1/Conv2d_0/weights/RMSProp_1, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta/RMSProp, MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta/RMSProp_1, MobilenetV1/Conv2d_10_depthwise/BatchNorm/gamma, MobilenetV1/Conv2d_10_depthwise/BatchNorm/gamma/RMSProp, MobilenetV1/Conv2d_10_depthwise/BatchNorm/gamma/RMSProp_1,

MobilenetV1/Conv2d_9_pointwise/BatchNorm/gamma/RMSProp, MobilenetV1/Conv2d_9_pointwise/BatchNorm/gamma/RMSProp_1, MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_mean, MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_variance, MobilenetV1/Conv2d_9_pointwise/weights, MobilenetV1/Conv2d_9_pointwise/weights/RMSProp, MobilenetV1/Conv2d_9_pointwise/weights/RMSProp_1, MobilenetV1/Logits/Conv2d_1c_1x1/biases, MobilenetV1/Logits/Conv2d_1c_1x1/biases/RMSProp, MobilenetV1/Logits/Conv2d_1c_1x1/biases/RMSProp_1, MobilenetV1/Logits/Conv2d_1c_1x1/weights, MobilenetV1/Logits/Conv2d_1c_1x1/weights/RMSProp, MobilenetV1/Logits/Conv2d_1c_1x1/weights/RMSProp_1, global_step)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "models/research/slim/train_image_classifier.py", line 576, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "models/research/slim/train_image_classifier.py", line 572, in main
    sync_optimizer=optimizer if FLAGS.sync_replicas else None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 784, in train
    ignore_live_threads=ignore_live_threads)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 1000, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 828, in stop
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise

    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
    enqueue_callable()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1249, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: drive/fleetflow/fleetclassify_dataset/niji_train_00017-of-00100.tfrecord; Input/output error
     [[Node: parallel_read/ReaderReadV2_3 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](parallel_read/TFRecordReaderV2_3, parallel_read/filenames)]]

```


####

```
python3 models/research/slim/train_image_classifier.py --train_dir=drive/fleetclassify/checkpoints/fleetclassify_v1_1.0_224_$(date +'%Y%m%d_%H%M') --dataset_dir=drive/fleetclassify/fleetclassify_dataset --dataset_name=niji --dataset_split_name=train --model_name=mobilenet_v1 --preprocessing_name=mobilenet_v1 --save_interval_secs=600 --save_summaries_secs=600 --checkpoint_path=drive/fleetclassify/checkpoints/pretrained/mobilenet_v1_1.0_224.ckpt --checkpoint_exclude_scopes=MobilenetV1/Logits
python3: Relink `/lib/x86_64-linux-gnu/libudev.so.1' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
Segmentation fault (コアダンプ)

```
cuDNNがインストールされていなかったのが原因だった。
https://medium.com/@taylordenouden/installing-tensorflow-gpu-on-ubuntu-18-04-89a142325138


####


```
Traceback (most recent call last):
  File "models/research/slim/export_inference_graph.py", line 59, in <module>
    import tensorflow as tf
ImportError: No module named tensorflow
```

python3をpythonと打っていた(3つけ忘れ)。

以上です。

0 件のコメント:

コメントを投稿

Linuxコマンドライン上でSVGベクタ画像をJPG等へラスタライズ変換する

 Linuxコマンドライン上でSVGベクタ画像をJPG等へラスタライズ変換することができるが、上手く変換されない場合がある。   以前作った魔法陣イラスト素材をイラスト素材ストックサイトへ登録しようと思い立ち、改めて素材用にラスタライズ変換をかけようとした。   ラスタライズ変換...