Machine Learning in production

Application areas and open data records

Many Machine Learning projects cannot be successfully completed due to a lack of experience in the field even when sufficient amounts of data are available. When data is incomplete, of low quality, or unstructured it becomes even more difficult to train models and thus, the gain of experience in dealing with Machine Learning algorithms is slowed down further.

Here, openly, and freely available data sets can help in gathering initial experience and testing of own Machine Learning approaches. Due to existing confidentiality obligations, at present there only is a limited amount of publicly accessible data sets in production, which are stored on multiple different platforms such as kaggle, ucirvine or openml without a clear structure.

In a joined effort between the Fraunhofer IPT and the Fraunhofer FFB, a table of publicly available data records for production has been compiled based on extensive investigations: 135 data sets are currently available in this way. The table is continuously maintained and updated to incorporate newly released data sets. 

If you want to refer to this overview of publicly available datasets, please cite the related paper.