Creating a Custom Dataset on Hugging Face

Last Updated : 30 Mar, 2026

Creating a custom dataset is useful when existing datasets do not meet specific requirements. Hugging Face provides simple tools to create, manage and share datasets for machine learning tasks. It supports formats like CSV, JSON and text.

  • Building chatbots with personalised responses
  • Image classification using custom images
  • Recommendation systems based on user data

Implementation

Step 1: Importing Libraries for dataset creation and data handling.

  • pandas is used to structure data
  • datasets is used to convert them into Hugging Face format
Python
from datasets import Dataset     
import pandas as pd             

Step 2: Creating a Sample Dataset with multiple text samples and labels

Python
data = {
    "text": [
        "I love machine learning",
        "Hugging Face makes AI easy",
        "Natural language processing is interesting",
        "Deep learning models are powerful",
        "AI is transforming industries",
        "Data science is exciting",
        "Python is widely used in AI",
        "Models require good datasets",
        "Learning AI step by step is helpful",
        "Custom datasets improve performance"
    ],
    "label": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
}

Step 3: Converting into DataFrame so to provide data a structured tabular format for easier processing.

Python
df = pd.DataFrame(data)  
Screenshot-from-2026-03-28-12-56-59
DataFrame

Step 4: Converting the DataFrame into a Hugging Face dataset for using it in ML tasks.

Python
dataset = Dataset.from_pandas(df) 

Step 5: Viewing the dataset structure and verifying the data.

Python
print(dataset) 

Step 6: Saving the dataset locally so it can be reused later.

Python
dataset.save_to_disk("my_dataset")  

Step 7: Uploading the dataset to Hugging Face so it can be shared and accessed online.

  • Use login() to sign in to your Hugging Face account
  • Enter your access token (generated from account settings)
  • Upload the dataset to your profile using push_to_hub()
Python
from huggingface_hub import login 

login()   
dataset.push_to_hub("your-username/my_dataset")   

The complete source code can be accessed here.

Comment

Explore