kaggledatasets Documentation

Installation

kaggledatasets requires Python 3.5+

kaggledatasets is available in the Python Package Index via

$ pip install kaggledatasets

The easiest way to get started on most systems is to create a virtualenv

$ python3 -m venv kd_venv
$ source kd_venv/bin/activate
(kd_venv) $ pip install kaggledatasets

This will install a version of all TF and PyTorch dependencies depending on your system. See kaggledatasets for more information.

If you need a different version of kaggledatasets, follow the instructions on the kaggledatasets website to install the appropriate version of kaggledatasets.github.io

Install From Source

$ git clone git@github.com:kaggledatasets/kaggledatasets.git
$ cd kaggledatasets
$ source kd_venv
(kd_venv) $ python3 setup.py install

Once that is installed, you can run the unit tests. We recommend using X as a runner.

To resume development in an already checked-out repo:

$ cd kaggledatasets
$ source kd_venv

To exit the virtual environment:

(kd_venv) $ deactivate

Coming Soon

Cloud VM Setup

This guide will cover all the setup work you have to do in order to be able to easily install kaggledatasets on a cloud VM . Note that while these instructions worked when they were written, they may become incorrect or out of date. If they do, please send us a Pull Request!

After following these instructions, you should be good to either follow the Installation instructions or the Install From Source instructions

Amazon Web Services

Coming Soon

Google Cloud Engine

Coming Soon

Microsoft Azure

Coming Soon

Configuration

Local System (Windows/Linux/macOS)

This guide will cover all the setup work you have to do in order to be able to easily install and configure kaggledatasets on your local machine.

Google Colab Setup

This guide will cover all the setup work you have to do in order to be able to easily install and configure kaggledatasets on Google Colab

Frequently Asked Questions

Contributing Guide

👍🎉 First off, thanks for taking the time to contribute! 🎉👍

Before you start

  • Comment on the issue that you plan to work on so we can assign it to you and there isn’t unnecessary duplication of work.
  • When you plan to work on something larger (for example, adding new features to Dataset Class), please respond on the issue (or create one if there isn’t one) to explain your plan and give others a chance to discuss.
  • If you’re fixing some smaller issue - please check the list of pending Pull Requests to avoid unnecessary duplication.

How can you help

You can help in multiple ways: * Reproducing bugs, finding its root cause and providing fixes to that, this will be appreciated a lot (see the issues with label: bug) * Sending Pull Requests for new kaggle datasets and/or requested features (see the issues with label: dataset request or enhancement) * Doing Code Reviews on the Pull Requests from the developers of this community and verifying if PRs are working correctly or not

Datasets

Adding a Kaggle Dataset is a great way of making it more accessible to the various communities. `Add a new Kaggle Dataset <>`_ guide will be available soon.

Tests

We use pylint for ensuring kaggledatasets is nice and easy to use and work on long-term, all modules should have clear tests for public members.

Pull Requests

All contributions are done through Pull Requests here on GitHub.

Code Reviews

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.

Persons of Interest

Looking for contributors who can be our Persons of Interest

General Maintainers

Module-level maintainers

Core

Structured

Tutorials

kaggledatasets

kaggledatasets

kaggledatasets.core

kaggledatasets.core.dataset

kaggledatasets.core.config

kaggledatasets.core.downloader

kaggledatasets.core.fileops

kaggledatasets.structured

The kaggledatasets.structured subpackage consists of popular datasets and common functions for structured datasets like CSV, JSON, SQLITE, etc.

kaggledatasets.image

The kaggledatasets.image subpackage consists of popular datasets and common image transformations for image datasets.

kaggledatasets.audio

The kaggledatasets.audio subpackage consists of popular datasets and common audio transformations.

kaggledatasets.text

The kaggledatasets.text subpackage consists of data processing utilities and popular datasets for natural language.

Indices and tables