ios - Need Help... Apple ML giving out weird results

just a little background of what I’m trying to create:

I am aiming to create a model that converts sign language to text, using the WLASL dataset. Now, from the get-go, downloading this model from kaggle, while the dataset seems quite comprehensive, the amount of videos per class range from 5-13, which is obviously quite less to train on. I decided to try out Apple Create ML instead of something like tensorflow or even more complex deep learning frameworks as this would be much more simple. Since the dataset is quite limited in terms of videos per class, I used all 6 data augmentations in the “Hand Action Classifier” (Horizontally Flip, Rotate, Translate, Scale, Interpolate Frames, Drop Frames). While I knew this could not save the model, it would definitely increase the accuracy by a lot. Note, that I am not using all 2000 classes (words) from the dataset, rather, I just used a subset of 300. I got 16% validation accuracy, and 90% training accuracy with all augmentations, so my model was clearly overfitting. So I tried the same with 25 classes, and this time I got 42% validation accuracy, with 100% training accuracy. Again, overfitting. I went over to the live preview, and almost every sign I tried was predicted wrong.

Now, I decided to use the “model sources” in the sidebar. I am not really sure what they are for, but here’s what I tried:

I split the subset of the data into 2 seperate model sources (16 classes but the number is still high), and got got 83% validation accuracy and 90% validation accuracy respectively. Both of these model sources are using all data augmentations. My model is clearly overfitting, having 100% training accuracy in both sources, but splitting it into two models clearly increased my accuracy, and when i tested this in the “live preview”, every ASL sign that I did myself, it was able to guess EVERY SINGLE WORD accurately with over 90% confidence.

So my question is, even with my limited data (while augmentations do increase it by a lot, obviously the performance difference should not be this much), how have my models performed so well? Moreover, is splitting one model into separate model sources even viable? I am not sure what the use of the “model sources” even was, and so I tried this, and somehow I got better results. If it is viable, how can I implement them into one swift app. I am just a little confused right now, and so hopefully someone can tell me what is going on. If this is not a viable solution, can somebody provide another one as to how I can use this dataset? Prior knowledge about it would be incredibly helpful, but even if you don’t, could you please help me?

Thanks so much guys 🙂

PS: Here is the link to it -:

Kaggle Link: https://www.kaggle.com/datasets/risangbaskoro/wlasl-processed

Original paper github page: https://github.com/dxli94/WLASL

Sorry for such a long message. If you need any images for more insight or better knowledge to provide better help, I will happily give them.

Once again, thank you so much