Respected Prof. Kulkarni,

Here is the statistics about the data used.

1. All two-member compounds from http://sanskrit.uohyd.ernet.in/Corpus/SHMT/Samaas-Tagging/ were scraped.

2. Total of 19378 such compounds were culled out.
3. The set was randomly shuffled for homogenization, because the data is from prose, poetry, different genres of literature.

4. 80 percent of this data was used for training.

5. 20 percent of this data was used for evaluation. (Both separate datasets).

The classification on training data is around 1-2 % higher than that for evaluation data.