Publication: Creating Own AI Datasets from Different Language Sources Efficiently: Advantages, Disadvantages and Best Pracitces to Make Own Datasets

Creating Own AI Datasets from Different Language Sources Efficiently: Advantages, Disadvantages and Best Pracitces to Make Own Datasets

Zoltan Balogh Eniko Nagy Jozsef Cserko Gyorgy Molnar

Publication Name: Cando EPE 2023 Proceedings IEEE 6th International Conference and Workshop Obuda on Electrical and Power Engineering

Publication Date: 2023-01-01

Volume: Unknown

Issue: Unknown

Page Range: 155-158

Description:

Artificial Intelligence (AI) has become a ubiquitous technology that has the potential to revolutionize many industries. However, AI requires access to vast amounts of data to achieve its full potential. By understanding these data sources and the best practices for working with them, we can unlock the full potential of AI and create groundbreaking solutions that benefit society as a whole. Data sources can be utilized to train AI algorithms, including structured and unstructured data, but finding relevant datasets is challenging. The main problem is the international sources. Most of the datasets are available only in English or other common languages. However these are good sources, but translating them is expensive and time-consuming. This paper shows how to generate own datasets from open-source data with translation.

Open Access: Yes

DOI: 10.1109/CANDO-EPE60507.2023.10417973

Authors - 4

Zoltan Balogh

55148441300

Eniko Nagy

57239381300

Jozsef Cserko

57746677000

Gyorgy Molnar

7102876405