Updates and whatnot #3: Tensorflow woes

8 minute read

I recently decided to go back to investigate the problems I had with the CNN model mentioned in the last update post. Grappling with Tensorflow for the past few days hasn’t been particularly enjoyable but I definitely learned several new things out of it. Or rather discovered my mistakes, alongside not consulting the proper documentation.

But before I go into that, I’ll note down a couple of other things I’ve been working towards as well, including the curious new minor release of Blender: 2.93.


I’ve wanted to try developing in Blender for quite a while now, especially because it’s a pretty nifty 3D package that I’ve grown rather attached to and also because its Python API is exposed for addon development. Initially, I tried to dive straight into the C++ API but I figured I wasn’t quite ready for it yet after scanning through the source code (available here!). The Python API, on the other hand, was much more beginner-friendly, and also proved to be a comfortable diving board for understanding the inner workings of Blender. And that would transition well into actually figuring out the structure of the source.

Then again, about a week after I began studying into the Python API, Blender released a new minor version 2.93 so now there’s new bits of API to look into (although it seems not too much has changed). It’s a major version stable release next (3.0, presumably), so I should probably master this before then. Anywho, I figured I’d just write down roughly the useful bits I’ve gleaned from just messing around.

First off is definitely starting Blender from the commandline, which is also a tip they give you. What this does is definitely helpful in debugging as error messages appear in the system shell, not built-in Blender shell. Basically, either navigate to where Blender is in the file directory and execute blender or ./blender (depending on OS), or add the path to the executable to the PATH variable. Presumably though, you’d have already gone through the Blender Python API Quickstart guide, so you should know this already. In particular, the tips given in the Before Starting section are of tremendous help: Developer Extras and Python Tooltips are of special note, when you need to know the name of the function to call.

Anyways, with regards to manipulating the objects, it is possible to manipulate the object (by applying built-in Blender functions, like transforms and modifiers), as well as primitive manipulations of the object’s vertices (like simple transforms). All of this is done with the Blender module bpy, while of secondary importance is the mathutils module, which contains the vector and matrix classes and operations. The most prominent (and useful) way to access an object from what I understand so far is to use the current context, which is done with bpy.context. Then you can access certain objects (and other elements) in the scene, which you can then work with using other functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
context = bpy.context

"""
Active object is the object that was last selected
Generally indicated in the viewport by a yellow dot in the object's center
e.g.
> bpy.data.objects['Cube']
"""
obj1 = context.active_object

"""
Selected objects are returned as a list
These are outlined in yellow in the viewport, so the list is empty if no objects are selected
e.g.
> [bpy.data.objects['Cube'], bpy.data.objects['Light'], bpy.data.objects['Camera']]
> []
"""
objs = context.selected_objects

"""
Scene contains basically every object and element in the current setting (even hidden ones)
e.g.
> bpy.data.scenes['Scene']
"""
scene = context.scene

"""
Then you can access a list of the objects in the scene
.items() gives the name and object pairs
.keys() gives the names of the objects
.values() gives the objects
"""
items = scene.objects.items()           # > [('Cube', bpy.data.objects['Cube']), ('Light', bpy.data.objects['Light']), ('Camera', bpy.data.objects['Camera'])]
objs_2 = scene.objects.values()         # > [bpy.data.objects['Cube'], bpy.data.objects['Light'], bpy.data.objects['Camera']]

So far, I’ve figured out messing around with the locations of the objects and its mesh (vertices). Basically, each object is a collection of vertices. The object has a set of coordinates (in its “center”) that determines its location and each vertex also has its own coordinates as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
coord1 = obj1.location                      # Vector((0.0, 0.0, 0.0))
# note that the Vector class is from the mathutils module

"""
You can do vector manipulation with it as well
or access each vector component individually (e.g. using XYZ coordinates)
"""
coord1.x += 1                               # moves the object by 1 unit along the x-axis
coord1 += mathutils.Vector((1., 1., 1.))    # moves the object by 1 unit along each axis

"""
You can also access the mesh information of the object
This includes the vertices, among other things
"""
verts = obj1.data.vertices                  # a list-like structure of vertices, can access like arrays
verts[0].co                                 # obtain the vertex value this way

"""
Then you can perform any vector math on it like the coordinates above
Except this time it only affects the selected vertex
"""
verts[0].co += 3
verts[0].co += mathutils.Vector(1., 0., -1.)

And finally, things I haven’t really touched are the built-in Blender operators in bpy.ops and the data access related to the Blender file in bpy.data. bpy.ops contains pretty much every operation you can think of in Blender, from selecting and hiding objects to modifier operations (like the displace or array modifier) and Edit mode operators as well (such as extrude or loop cuts). With just that alone already makes scripting in Blender so massively powerful, but it still seems rather limiting to me. I had thought it would be somewhat lower level than this (as in accessing the bare bones of Blender), but in this way, it seemed to be more like a one-stop shortcut creator for combining operations into a sequence one could frequently use, such as procedurally generating some trees or a landscape.

Then again, I suppose it is what it is. There is still the bmesh standalone module for working directly with the BMesh in Blender, and so I’ve been looking into the documentation for that as well. As far as I can tell, BMesh is the current method for interacting with the lower-level Blender than is given in the standard high-level API.


So as written above, I went back to investigate my code for training the CNN. And as it turns out, there was a minor mistake that affected the entire training. I attribute half of this mistake to the vagueness of the Tensorflow documentation and half to my carelessness in having not checked the final data after transformation. The problem basically boiled down to the fact that, in the preprocessing stage after loading in the images, I was using the built-in Keras preprocessing functions. One of those included a random zoom function: tf.keras.preprocessing.image.random_zoom, which accepts two arguments (to be reductive). This is extracted from the documentation.

1
2
3
4
5
6
7
8
9
"""
Arguments:
x: Input tensor. Must be 3D.
zoom_range: Tuple of floats; zoom range for width and height.
"""
tf.keras.preprocessing.image.random_zoom(
    x, zoom_range, row_axis=1, col_axis=2, channel_axis=0, fill_mode='nearest',
    cval=0.0, interpolation_order=1
)

The mistake I made was with the zoom_range argument. I had understood it to mean the amount of image to be cropped out, whereas it actually meant the amount of image to remain. And so, it was easily found out when I plotted the final images with matplotlib, only to find massively zoomed in pictures of hands (mostly, palms). It’s times like this where I wished the documentation were a bit better, especially more so since, right after fixing that, I tried to used the tf.keras.preprocessing.image.random_brightness function for preprocessing as well, which only resulted in more messed up final images.

I also found out that I had not been using Tensorflow GPU all this time, instead running it entirely on CPU, which explained the training times. As it turns out, since I was primarily using version 2.0.0, it was entirely possible to miss the warnings saying that GPU wasn’t being used as that version never threw an error if that was so. That feature was only implemented in versions 2.1.0+, and I somehow fortunately had decided to try switching over to using the newly released Tensorflow 2.5.0. And as it turns out, I had missed out on quite a lot by not using GPU. Training time was cut down approximately by 25% switching from a quad-core CPU to a Nvidia GTX 960. All that had to be done was install the correct version of CUDA and the CUDA toolkit. Now, it seemed so straightforward.

That being said, with all those problems fixed, the model still seemed to be overfitting the training and validation sets. I was only able to achieve a best of about 46% accuracy on the test set, of which I suspect still owes to a too small dataset. I’ve been thinking of trying out the ImageDataGenerator class again, especially now that I’m on the latest version of Tensorflow. What I do have to check is if the data is randomly preprocessed again for every epoch.

I’ve reuploaded (not forked) parts of the project onto my personal GitHub, containing only my code. The GT GitHub server doesn’t seem to allow forks from outside, and I don’t want to make that repo public. Aside from trying out another method to prepare the data, the project files also have to be restructured. It should be better to separate the training code from the model and data preparation. May be better to be more modular.

You can find the project here.

Categories:

Updated: