AI, ML & Big Data

AI, ML & Big Data

1. Machine Learning – Part I

Reproduced from GitHub https://github.com/

A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php

Further resources:

  • For a list of free machine learning books available for download, go here.
  • For a list of professional machine learning events, go here.
  • For a list of (mostly) free machine learning courses available online, go here.
  • For a list of blogs and newsletters on data science and machine learning, go here.
  • For a list of free-to-attend meetups and local events, go here.

Table of Contents

Frameworks and Libraries

Tools

Credits

APL

General-Purpose Machine Learning

  • naive-apl – Naive Bayesian Classifier implementation in APL. [Deprecated]

C

General-Purpose Machine Learning

  • Darknet – Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.
  • Recommender – A C library for product recommendations/suggestions using collaborative filtering (CF).
  • Hybrid Recommender System – A hybrid recommender system based upon scikit-learn algorithms. [Deprecated]
  • neonrvm – neonrvm is an open source machine learning library based on RVM technique. It’s written in C programming language and comes with Python programming language bindings.
  • cONNXr – An ONNX runtime written in pure C (99) with zero dependencies focused on small embedded devices. Run inference on your machine learning models no matter which framework you train it with. Easy to install and compiles everywhere, even in very old devices.
  • libonnx – A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Computer Vision

  • CCV – C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.
  • VLFeat – VLFeat is an open and portable library of computer vision algorithms, which has a Matlab toolbox.

C++

Computer Vision

  • DLib – DLib has C++ and Python interfaces for face detection and training general object detectors.
  • EBLearn – Eblearn is an object-oriented C++ library that implements various machine learning models [Deprecated]
  • OpenCV – OpenCV has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
  • VIGRA – VIGRA is a genertic cross-platform C++ computer vision and machine learning library for volumes of arbitrary dimensionality with Python bindings.
  • Openpose – A real-time multi-person keypoint detection library for body, face, hands, and foot estimation

General-Purpose Machine Learning

  • BanditLib – A simple Multi-armed Bandit library. [Deprecated]
  • Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind. [DEEP LEARNING]
  • CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box. It is easy to install, contains fast inference implementation and supports CPU and GPU (even multi-GPU) computation.
  • CNTK – The Computational Network Toolkit (CNTK) by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph.
  • CUDA – This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
  • DeepDetect – A machine learning API and server written in C++11. It makes state of the art machine learning easy to work with and integrate into existing applications.
  • Distributed Machine learning Tool Kit (DMTK) – A distributed machine learning (parameter server) framework by Microsoft. Enables training models on large data sets across multiple machines. Current tools bundled with it include: LightLDA and Distributed (Multisense) Word Embedding.
  • DLib – A suite of ML tools designed to be easy to imbed in other applications.
  • DSSTNE – A software library created by Amazon for training and deploying deep neural networks using GPUs which emphasizes speed and scale over experimental flexibility.
  • DyNet – A dynamic neural network library working well with networks that have dynamic structures that change for every training instance. Written in C++ with bindings in Python.
  • Fido – A highly-modular C++ machine learning library for embedded electronics and robotics.
  • igraph – General purpose graph library.
  • Intel(R) DAAL – A high performance software library developed by Intel and optimized for Intel’s architectures. Library provides algorithmic building blocks for all stages of data analytics and allows to process data in batch, online and distributed modes.
  • LightGBM – Microsoft’s fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
  • libfm – A generic approach that allows to mimic most factorization models by feature engineering.
  • MLDB – The Machine Learning Database is a database designed for machine learning. Send it commands over a RESTful API to store data, explore it using SQL, then train machine learning models and expose them as APIs.
  • mlpack – A scalable C++ machine learning library.
  • MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
  • ParaMonte – A general-purpose library with C/C++ interface for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
  • proNet-core – A general-purpose network embedding framework: pair-wise representations optimization Network Edit.
  • PyCUDA – Python interface to CUDA
  • ROOT – A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualization and storage.
  • shark – A fast, modular, feature-rich open-source C++ machine learning library.
  • Shogun – The Shogun Machine Learning Toolbox.
  • sofia-ml – Suite of fast incremental algorithms.
  • Stan – A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling.
  • Timbl – A software package/C++ library implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification, and IGTree, a decision-tree approximation of IB1-IG. Commonly used for NLP.
  • Vowpal Wabbit (VW) – A fast out-of-core learning system.
  • Warp-CTC – A fast parallel implementation of Connectionist Temporal Classification (CTC), on both CPU and GPU.
  • XGBoost – A parallelized optimized general purpose gradient boosting library.
  • ThunderGBM – A fast library for GBDTs and Random Forests on GPUs.
  • ThunderSVM – A fast SVM library on GPUs and CPUs.
  • LKYDeepNN – A header-only C++11 Neural Network library. Low dependency, native traditional chinese document.
  • xLearn – A high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertising and recommender systems.
  • Featuretools – A library for automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning using reusable feature engineering “primitives”.
  • skynet – A library for learning neural networks, has C-interface, net set in JSON. Written in C++ with bindings in Python, C++ and C#.
  • Feast – A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.
  • Hopsworks – A data-intensive platform for AI with the industry’s first open-source feature store. The Hopsworks Feature Store provides both a feature warehouse for training and batch based on Apache Hive and a feature serving database, based on MySQL Cluster, for online applications.
  • Polyaxon – A platform for reproducible and scalable machine learning and deep learning.

Natural Language Processing

  • BLLIP Parser – BLLIP Natural Language Parser (also known as the Charniak-Johnson parser).
  • colibri-core – C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
  • CRF++ – Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks. [Deprecated]
  • CRFsuite – CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data. [Deprecated]
  • frog – Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer.
  • libfolia – C++ library for the FoLiA format
  • MeTA – MeTA : ModErn Text Analysis is a C++ Data Sciences Toolkit that facilitates mining big text data.
  • MIT Information Extraction Toolkit – C, C++, and Python tools for named entity recognition and relation extraction
  • ucto – Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format.

Speech Recognition

  • Kaldi – Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

Sequence Analysis

  • ToPS – This is an object-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet. [Deprecated]

Gesture Detection

  • grt – The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition.

Common Lisp

General-Purpose Machine Learning

  • mgl – Neural networks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes.
  • mgl-gpr – Evolutionary algorithms. [Deprecated]
  • cl-libsvm – Wrapper for the libsvm support vector machine library. [Deprecated]
  • cl-online-learning – Online learning algorithms (Perceptron, AROW, SCW, Logistic Regression).
  • cl-random-forest – Implementation of Random Forest in Common Lisp.

Clojure

Natural Language Processing

  • Clojure-openNLP – Natural Language Processing in Clojure (opennlp).
  • Infections-clj – Rails-like inflection library for Clojure and ClojureScript.

General-Purpose Machine Learning

  • tech.ml – A machine learning platform based on tech.ml.dataset, supporting not just ml algorithms, but also relevant ETL processing; wraps multiple machine learning libraries
  • clj-ml – A machine learning library for Clojure built on top of Weka and friends.
  • clj-boost – Wrapper for XGBoost
  • Touchstone – Clojure A/B testing library.
  • Clojush – The Push programming language and the PushGP genetic programming system implemented in Clojure.
  • lambda-ml – Simple, concise implementations of machine learning techniques and utilities in Clojure.
  • Infer – Inference and machine learning in Clojure. [Deprecated]
  • Encog – Clojure wrapper for Encog (v3) (Machine-Learning framework that specializes in neural-nets). [Deprecated]
  • Fungp – A genetic programming library for Clojure. [Deprecated]
  • Statistiker – Basic Machine Learning algorithms in Clojure. [Deprecated]
  • clortex – General Machine Learning library using Numenta’s Cortical Learning Algorithm. [Deprecated]
  • comportex – Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm. [Deprecated]

Deep Learning

  • MXNet – Bindings to Apache MXNet – part of the MXNet project
  • Deep Diamond – A fast Clojure Tensor & Deep Learning library
  • jutsu.ai – Clojure wrapper for deeplearning4j with some added syntactic sugar.
  • cortex – Neural networks, regression and feature learning in Clojure.
  • Flare – Dynamic Tensor Graph library in Clojure (think PyTorch, DynNet, etc.)
  • dl4clj – Clojure wrapper for Deeplearning4j.

Data Analysis

  • tech.ml.dataset – Clojure dataframe library and pipeline for data processing and machine learning
  • Tablecloth – A dataframe grammar wrapping tech.ml.dataset, inspired by several R libraries
  • Panthera – Clojure API wrapping Python’s Pandas library
  • Incanter – Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
  • PigPen – Map-Reduce for Clojure.
  • Geni – a Clojure dataframe library that runs on Apache Spark

Data Visualization

  • Hanami : Clojure(Script) library and framework for creating interactive visualization applications based in Vega-Lite (VGL) and/or Vega (VG) specifications. Automatic framing and layouts along with a powerful templating system for abstracting visualization specs
  • Saite – Clojure(Script) client/server application for dynamic interactive explorations and the creation of live shareable documents capturing them using Vega/Vega-Lite, CodeMirror, markdown, and LaTeX
  • Oz – Data visualisation using Vega/Vega-Lite and Hiccup, and a live-reload platform for literate-programming
  • Envision – Clojure Data Visualisation library, based on Statistiker and D3.
  • Pink Gorilla Notebook – A Clojure/Clojurescript notebook application/-library based on Gorilla-REPL
  • clojupyter – A Jupyter kernel for Clojure – run Clojure code in Jupyter Lab, Notebook and Console.
  • notespace – Notebook experience in your Clojure namespace
  • Delight – A listener that streams your spark events logs to delight, a free and improved spark UI

Interop

  • Java Interop – Clojure has Native Java Interop from which Java’s ML ecosystem can be accessed
  • JavaScript Interop – ClojureScript has Native JavaScript Interop from which JavaScript’s ML ecosystem can be accessed
  • Libpython-clj – Interop with Python
  • ClojisR – Interop with R and Renjin (R on the JVM)

Misc

  • Neanderthal – Fast Clojure Matrix Library (native CPU, GPU, OpenCL, CUDA)
  • kixistats – A library of statistical distribution sampling and transducing functions
  • fastmath – A collection of functions for mathematical and statistical computing, macine learning, etc., wrapping several JVM libraries
  • matlib – a Clojure library of optimisation and control theory tools and convenience functions based on Neanderthal.

Extra

  • Scicloj – Curated list of ML related resources for Clojure.

Crystal

General-Purpose Machine Learning

  • machine – Simple machine learning algorithm.
  • crystal-fann – FANN (Fast Artificial Neural Network) binding.

Elixir

General-Purpose Machine Learning

  • Simple Bayes – A Simple Bayes / Naive Bayes implementation in Elixir.
  • emel – A simple and functional machine learning library written in Elixir.
  • Tensorflex – Tensorflow bindings for the Elixir programming language.

Natural Language Processing

  • Stemmer – An English (Porter2) stemming implementation in Elixir.

Erlang

General-Purpose Machine Learning

  • Disco – Map Reduce in Erlang. [Deprecated]

Fortran

General-Purpose Machine Learning

Data Analysis / Data Visualization

  • ParaMonte – A general-purpose Fortran library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.

Go

Natural Language Processing

  • snowball – Snowball Stemmer for Go.
  • word-embedding – Word Embeddings: the full implementation of word2vec, GloVe in Go.
  • sentences – Golang implementation of Punkt sentence tokenizer.
  • go-ngram – In-memory n-gram index with compression. [Deprecated]
  • paicehusk – Golang implementation of the Paice/Husk Stemming Algorithm. [Deprecated]
  • go-porterstemmer – A native Go clean room implementation of the Porter Stemming algorithm. [Deprecated]

General-Purpose Machine Learning

  • birdland – A recommendation library in Go.
  • eaopt – An evolutionary optimization library.
  • leaves – A pure Go implementation of the prediction part of GBRTs, including XGBoost and LightGBM.
  • gobrain – Neural Networks written in Go.
  • go-featureprocessing – Fast and convenient feature processing for low latency machine learning in Go.
  • go-mxnet-predictor – Go binding for MXNet c_predict_api to do inference with a pre-trained model.
  • go-ml-benchmarks — benchmarks of machine learning inference for Go
  • go-ml-transpiler – An open source Go transpiler for machine learning models.
  • golearn – Machine learning for Go.
  • goml – Machine learning library written in pure Go.
  • gorgonia – Deep learning in Go.
  • goro – A high-level machine learning library in the vein of Keras.
  • gorse – An offline recommender system backend based on collaborative filtering written in Go.
  • therfoo – An embedded deep learning library for Go.
  • neat – Plug-and-play, parallel Go framework for NeuroEvolution of Augmenting Topologies (NEAT). [Deprecated]
  • go-pr – Pattern recognition package in Go lang. [Deprecated]
  • go-ml – Linear / Logistic regression, Neural Networks, Collaborative Filtering and Gaussian Multivariate Distribution. [Deprecated]
  • GoNN – GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN. [Deprecated]
  • bayesian – Naive Bayesian Classification for Golang. [Deprecated]
  • go-galib – Genetic Algorithms library written in Go / Golang. [Deprecated]
  • Cloudforest – Ensembles of decision trees in Go/Golang. [Deprecated]
  • go-dnn – Deep Neural Networks for Golang (powered by MXNet)

Spatial analysis and geometry

  • go-geom – Go library to handle geometries.
  • gogeo – Spherical geometry in Go.

Data Analysis / Data Visualization

  • dataframe-go – Dataframes for machine-learning and statistics (similar to pandas).
  • gota – Dataframes.
  • gonum/mat – A linear algebra package for Go.
  • gonum/optimize – Implementations of optimization algorithms.
  • gonum/plot – A plotting library.
  • gonum/stat – A statistics library.
  • SVGo – The Go Language library for SVG generation.
  • glot – Glot is a plotting library for Golang built on top of gnuplot.
  • globe – Globe wireframe visualization.
  • gonum/graph – General-purpose graph library.
  • go-graph – Graph library for Go/Golang language. [Deprecated]
  • RF – Random forests implementation in Go. [Deprecated]

Computer vision

  • GoCV – Package for computer vision using OpenCV 4 and beyond.

Reinforcement learning

  • gold – A reinforcement learning library.

Haskell

General-Purpose Machine Learning

  • haskell-ml – Haskell implementations of various ML algorithms. [Deprecated]
  • HLearn – a suite of libraries for interpreting machine learning models according to their algebraic structure. [Deprecated]
  • hnn – Haskell Neural Network library.
  • hopfield-networks – Hopfield Networks for unsupervised learning in Haskell. [Deprecated]
  • DNNGraph – A DSL for deep neural networks. [Deprecated]
  • LambdaNet – Configurable Neural Networks in Haskell. [Deprecated]

Java

Natural Language Processing

  • Cortical.io – Retina: an API performing complex NLP operations (disambiguation, classification, streaming text filtering, etc…) as quickly and intuitively as the brain.
  • IRIS – Cortical.io’s FREE NLP, Retina API Analysis Tool (written in JavaFX!) – See the Tutorial Video.
  • CoreNLP – Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words.
  • Stanford Parser – A natural language parser is a program that works out the grammatical structure of sentences.
  • Stanford POS Tagger – A Part-Of-Speech Tagger (POS Tagger).
  • Stanford Name Entity Recognizer – Stanford NER is a Java implementation of a Named Entity Recognizer.
  • Stanford Word Segmenter – Tokenization of raw text is a standard pre-processing step for many NLP tasks.
  • Tregex, Tsurgeon and Semgrex – Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”).
  • Stanford Phrasal: A Phrase-Based Translation System
  • Stanford English Tokenizer – Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.
  • Stanford Tokens Regex – A tokenizer divides text into a sequence of tokens, which roughly correspond to “words”.
  • Stanford Temporal Tagger – SUTime is a library for recognizing and normalizing time expressions.
  • Stanford SPIED – Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion.
  • Twitter Text Java – A Java implementation of Twitter’s text processing library.
  • MALLET – A Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
  • OpenNLP – a machine learning based toolkit for the processing of natural language text.
  • LingPipe – A tool kit for processing text using computational linguistics.
  • ClearTK – ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA. [Deprecated]
  • Apache cTAKES – Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text.
  • NLP4J – The NLP4J project provides software and resources for natural language processing. The project started at the Center for Computational Language and EducAtion Research, and is currently developed by the Center for Language and Information Research at Emory University. [Deprecated]
  • CogcompNLP – This project collects a number of core libraries for Natural Language Processing (NLP) developed in the University of Illinois’ Cognitive Computation Group, for example illinois-core-utilities which provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc, illinois-edison a library for feature extraction from illinois-core-utilities data structures and many other packages.

General-Purpose Machine Learning

  • aerosolve – A machine learning library by Airbnb designed from the ground up to be human friendly.
  • AMIDST Toolbox – A Java Toolbox for Scalable Probabilistic Machine Learning.
  • Datumbox – Machine Learning framework for rapid development of Machine Learning and Statistical applications.
  • ELKI – Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
  • Encog – An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
  • FlinkML in Apache Flink – Distributed machine learning library in Flink.
  • H2O – ML engine that supports distributed learning on Hadoop, Spark or your laptop via APIs in R, Python, Scala, REST/JSON.
  • htm.java – General Machine Learning library using Numenta’s Cortical Learning Algorithm.
  • liblinear-java – Java version of liblinear.
  • Mahout – Distributed machine learning.
  • Meka – An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
  • MLlib in Apache Spark – Distributed machine learning library in Spark
  • Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
  • Neuroph – Neuroph is lightweight Java neural network framework
  • ORYX – Lambda Architecture Framework using Apache Spark and Apache Kafka with a specialization for real-time large-scale machine learning.
  • Samoa SAMOA is a framework that includes distributed machine learning for data streams with an interface to plug-in different stream processing platforms.
  • RankLib – RankLib is a library of learning to rank algorithms. [Deprecated]
  • rapaio – statistics, data mining and machine learning toolbox in Java.
  • RapidMiner – RapidMiner integration into Java code.
  • Stanford Classifier – A classifier is a machine learning tool that will take data items and place them into one of k classes.
  • Smile – Statistical Machine Intelligence & Learning Engine.
  • SystemML – flexible, scalable machine learning (ML) language.
  • Weka – Weka is a collection of machine learning algorithms for data mining tasks.
  • LBJava – Learning Based Java is a modeling language for the rapid development of software systems, offers a convenient, declarative syntax for classifier and constraint definition directly in terms of the objects in the programmer’s application.

Speech Recognition

  • CMU Sphinx – Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.

Data Analysis / Data Visualization

  • Flink – Open source platform for distributed stream and batch data processing.
  • Hadoop – Hadoop/HDFS.
  • Onyx – Distributed, masterless, high performance, fault tolerant data processing. Written entirely in Clojure.
  • Spark – Spark is a fast and general engine for large-scale data processing.
  • Storm – Storm is a distributed realtime computation system.
  • Impala – Real-time Query for Hadoop.
  • DataMelt – Mathematics software for numeric computation, statistics, symbolic calculations, data analysis and data visualization.
  • Dr. Michael Thomas Flanagan’s Java Scientific Library [Deprecated]

Deep Learning

Javascript

Natural Language Processing

  • Twitter-text – A JavaScript implementation of Twitter’s text processing library.
  • natural – General natural language facilities for node.
  • Knwl.js – A Natural Language Processor in JS.
  • Retext – Extensible system for analyzing and manipulating natural language.
  • NLP Compromise – Natural Language processing in the browser.
  • nlp.js – An NLP library built in node over Natural, with entity extraction, sentiment analysis, automatic language identify, and so more

Data Analysis / Data Visualization

  • D3.js
  • High Charts
  • NVD3.js
  • dc.js
  • chartjs
  • dimple
  • amCharts
  • D3xter – Straight forward plotting built on D3. [Deprecated]
  • statkit – Statistics kit for JavaScript. [Deprecated]
  • datakit – A lightweight framework for data analysis in JavaScript
  • science.js – Scientific and statistical computing in JavaScript. [Deprecated]
  • Z3d – Easily make interactive 3d plots built on Three.js [Deprecated]
  • Sigma.js – JavaScript library dedicated to graph drawing.
  • C3.js – customizable library based on D3.js for easy chart drawing.
  • Datamaps – Customizable SVG map/geo visualizations using D3.js. [Deprecated]
  • ZingChart – library written on Vanilla JS for big data visualization.
  • cheminfo – Platform for data visualization and analysis, using the visualizer project.
  • Learn JS Data
  • AnyChart
  • FusionCharts
  • Nivo – built on top of the awesome d3 and Reactjs libraries

General-Purpose Machine Learning

  • Auto ML – Automated machine learning, data formatting, ensembling, and hyperparameter optimization for competitions and exploration- just give it a .csv file!
  • Convnet.js – ConvNetJS is a Javascript library for training Deep Learning models[DEEP LEARNING] [Deprecated]
  • Clusterfck – Agglomerative hierarchical clustering implemented in Javascript for Node.js and the browser. [Deprecated]
  • Clustering.js – Clustering algorithms implemented in Javascript for Node.js and the browser. [Deprecated]
  • Decision Trees – NodeJS Implementation of Decision Tree using ID3 Algorithm. [Deprecated]
  • DN2A – Digital Neural Networks Architecture. [Deprecated]
  • figue – K-means, fuzzy c-means and agglomerative clustering.
  • Gaussian Mixture Model – Unsupervised machine learning with multivariate Gaussian mixture model.
  • Node-fann – FANN (Fast Artificial Neural Network Library) bindings for Node.js [Deprecated]
  • Keras.js – Run Keras models in the browser, with GPU support provided by WebGL 2.
  • Kmeans.js – Simple Javascript implementation of the k-means algorithm, for node.js and the browser. [Deprecated]
  • LDA.js – LDA topic modeling for Node.js
  • Learning.js – Javascript implementation of logistic regression/c4.5 decision tree [Deprecated]
  • machinelearn.js – Machine Learning library for the web, Node.js and developers
  • mil-tokyo – List of several machine learning libraries.
  • Node-SVM – Support Vector Machine for Node.js
  • Brain – Neural networks in JavaScript [Deprecated]
  • Brain.js – Neural networks in JavaScript – continued community fork of Brain.
  • Bayesian-Bandit – Bayesian bandit implementation for Node and the browser. [Deprecated]
  • Synaptic – Architecture-free neural network library for Node.js and the browser.
  • kNear – JavaScript implementation of the k nearest neighbors algorithm for supervised learning.
  • NeuralN – C++ Neural Network library for Node.js. It has advantage on large dataset and multi-threaded training. [Deprecated]
  • kalman – Kalman filter for Javascript. [Deprecated]
  • shaman – Node.js library with support for both simple and multiple linear regression. [Deprecated]
  • ml.js – Machine learning and numerical analysis tools for Node.js and the Browser!
  • ml5 – Friendly machine learning for the web!
  • Pavlov.js – Reinforcement learning using Markov Decision Processes.
  • MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
  • TensorFlow.js – A WebGL accelerated, browser based JavaScript library for training and deploying ML models.
  • JSMLT – Machine learning toolkit with classification and clustering for Node.js; supports visualization (see visualml.io).
  • xgboost-node – Run XGBoost model and make predictions in Node.js.
  • Netron – Visualizer for machine learning models.
  • WebDNN – Fast Deep Neural Network Javascript Framework. WebDNN uses next generation JavaScript API, WebGPU for GPU execution, and WebAssembly for CPU execution.

Misc

  • stdlib – A standard library for JavaScript and Node.js, with an emphasis on numeric computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.
  • sylvester – Vector and Matrix math for JavaScript. [Deprecated]
  • simple-statistics – A JavaScript implementation of descriptive, regression, and inference statistics. Implemented in literate JavaScript with no dependencies, designed to work in all modern browsers (including IE) as well as in Node.js.
  • regression-js – A javascript library containing a collection of least squares fitting methods for finding a trend in a set of data.
  • Lyric – Linear Regression library. [Deprecated]
  • GreatCircle – Library for calculating great circle distance.
  • MLPleaseHelp – MLPleaseHelp is a simple ML resource search engine. You can use this search engine right now at https://jgreenemi.github.io/MLPleaseHelp/, provided via Github Pages.
  • Pipcook – A JavaScript application framework for machine learning and its engineering.

Demos and Scripts

  • The Bot – Example of how the neural network learns to predict the angle between two points created with Synaptic.
  • Half Beer – Beer glass classifier created with Synaptic.
  • NSFWJS – Indecent content checker with TensorFlow.js
  • Rock Paper Scissors – Rock Paper Scissors trained in the browser with TensorFlow.js

Julia

General-Purpose Machine Learning

  • MachineLearning – Julia Machine Learning library. [Deprecated]
  • MLBase – A set of functions to support the development of machine learning algorithms.
  • PGM – A Julia framework for probabilistic graphical models.
  • DA – Julia package for Regularized Discriminant Analysis.
  • Regression – Algorithms for regression analysis (e.g. linear regression and logistic regression). [Deprecated]
  • Local Regression – Local regression, so smooooth!
  • Naive Bayes – Simple Naive Bayes implementation in Julia. [Deprecated]
  • Mixed Models – A Julia package for fitting (statistical) mixed-effects models.
  • Simple MCMC – basic mcmc sampler implemented in Julia. [Deprecated]
  • Distances – Julia module for Distance evaluation.
  • Decision Tree – Decision Tree Classifier and Regressor.
  • Neural – A neural network in Julia.
  • MCMC – MCMC tools for Julia. [Deprecated]
  • Mamba – Markov chain Monte Carlo (MCMC) for Bayesian analysis in Julia.
  • GLM – Generalized linear models in Julia.
  • Gaussian Processes – Julia package for Gaussian processes.
  • Online Learning [Deprecated]
  • GLMNet – Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet.
  • Clustering – Basic functions for clustering data: k-means, dp-means, etc.
  • SVM – SVM for Julia. [Deprecated]
  • Kernel Density – Kernel density estimators for julia.
  • MultivariateStats – Methods for dimensionality reduction.
  • NMF – A Julia package for non-negative matrix factorization.
  • ANN – Julia artificial neural networks. [Deprecated]
  • Mocha – Deep Learning framework for Julia inspired by Caffe. [Deprecated]
  • XGBoost – eXtreme Gradient Boosting Package in Julia.
  • ManifoldLearning – A Julia package for manifold learning and nonlinear dimensionality reduction.
  • MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
  • Merlin – Flexible Deep Learning Framework in Julia.
  • ROCAnalysis – Receiver Operating Characteristics and functions for evaluation probabilistic binary classifiers.
  • GaussianMixtures – Large scale Gaussian Mixture Models.
  • ScikitLearn – Julia implementation of the scikit-learn API.
  • Knet – Koç University Deep Learning Framework.
  • Flux – Relax! Flux is the ML library that doesn’t make you tensor
  • MLJ – A Julia machine learning framework

Natural Language Processing

  • Topic Models – TopicModels for Julia. [Deprecated]
  • Text Analysis – Julia package for text analysis.
  • Word Tokenizers – Tokenizers for Natural Language Processing in Julia
  • Corpus Loaders – A julia package providing a variety of loaders for various NLP corpora.
  • Embeddings – Functions and data dependencies for loading various word embeddings
  • Languages – Julia package for working with various human languages
  • WordNet – A Julia package for Princeton’s WordNet

Data Analysis / Data Visualization

  • Graph Layout – Graph layout algorithms in pure Julia.
  • LightGraphs – Graph modeling and analysis.
  • Data Frames Meta – Metaprogramming tools for DataFrames.
  • Julia Data – library for working with tabular data in Julia. [Deprecated]
  • Data Read – Read files from Stata, SAS, and SPSS.
  • Hypothesis Tests – Hypothesis tests for Julia.
  • Gadfly – Crafty statistical graphics for Julia.
  • Stats – Statistical tests for Julia.
  • RDataSets – Julia package for loading many of the data sets available in R.
  • DataFrames – library for working with tabular data in Julia.
  • Distributions – A Julia package for probability distributions and associated functions.
  • Data Arrays – Data structures that allow missing values. [Deprecated]
  • Time Series – Time series toolkit for Julia.
  • Sampling – Basic sampling algorithms for Julia.

Misc Stuff / Presentations

  • DSP – Digital Signal Processing (filtering, periodograms, spectrograms, window functions).
  • JuliaCon Presentations – Presentations for JuliaCon.
  • SignalProcessing – Signal Processing tools for Julia.
  • Images – An image library for Julia.
  • DataDeps – Reproducible data setup for reproducible science.

Lua

General-Purpose Machine Learning

  • Torch7
    • cephes – Cephes mathematical functions library, wrapped for Torch. Provides and wraps the 180+ special mathematical functions from the Cephes mathematical library, developed by Stephen L. Moshier. It is used, among many other places, at the heart of SciPy. [Deprecated]
    • autograd – Autograd automatically differentiates native Torch code. Inspired by the original Python version.
    • graph – Graph package for Torch. [Deprecated]
    • randomkit – Numpy’s randomkit, wrapped for Torch. [Deprecated]
    • signal – A signal processing toolbox for Torch-7. FFT, DCT, Hilbert, cepstrums, stft.
    • nn – Neural Network package for Torch.
    • torchnet – framework for torch which provides a set of abstractions aiming at encouraging code re-use as well as encouraging modular programming.
    • nngraph – This package provides graphical computation for nn library in Torch7.
    • nnx – A completely unstable and experimental package that extends Torch’s builtin nn library.
    • rnn – A Recurrent Neural Network library that extends Torch’s nn. RNNs, LSTMs, GRUs, BRNNs, BLSTMs, etc.
    • dpnn – Many useful features that aren’t part of the main nn package.
    • dp – A deep learning library designed for streamlining research and development using the Torch7 distribution. It emphasizes flexibility through the elegant use of object-oriented design patterns. [Deprecated]
    • optim – An optimization library for Torch. SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp and more.
    • unsup – A package for unsupervised learning in Torch. Provides modules that are compatible with nn (LinearPsd, ConvPsd, AutoEncoder, …), and self-contained algorithms (k-means, PCA). [Deprecated]
    • manifold – A package to manipulate manifolds.
    • svm – Torch-SVM library. [Deprecated]
    • lbfgs – FFI Wrapper for liblbfgs. [Deprecated]
    • vowpalwabbit – An old vowpalwabbit interface to torch. [Deprecated]
    • OpenGM – OpenGM is a C++ library for graphical modeling, and inference. The Lua bindings provide a simple way of describing graphs, from Lua, and then optimizing them with OpenGM. [Deprecated]
    • spaghetti – Spaghetti (sparse linear) module for torch7 by @MichaelMathieu [Deprecated]
    • LuaSHKit – A lua wrapper around the Locality sensitive hashing library SHKit [Deprecated]
    • kernel smoothing – KNN, kernel-weighted average, local linear regression smoothers. [Deprecated]
    • cutorch – Torch CUDA Implementation.
    • cunn – Torch CUDA Neural Network Implementation.
    • imgraph – An image/graph library for Torch. This package provides routines to construct graphs on images, segment them, build trees out of them, and convert them back to images. [Deprecated]
    • videograph – A video/graph library for Torch. This package provides routines to construct graphs on videos, segment them, build trees out of them, and convert them back to videos. [Deprecated]
    • saliency – code and tools around integral images. A library for finding interest points based on fast integral histograms. [Deprecated]
    • stitch – allows us to use hugin to stitch images and apply same stitching to a video sequence. [Deprecated]
    • sfm – A bundle adjustment/structure from motion package. [Deprecated]
    • fex – A package for feature extraction in Torch. Provides SIFT and dSIFT modules. [Deprecated]
    • OverFeat – A state-of-the-art generic dense feature extractor. [Deprecated]
    • wav2letter – a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research.
  • Numeric Lua
  • Lunatic Python
  • SciLua
  • Lua – Numerical Algorithms [Deprecated]
  • Lunum [Deprecated]

Demos and Scripts

  • Core torch7 demos repository.
    • linear-regression, logistic-regression
    • face detector (training and detection as separate demos)
    • mst-based-segmenter
    • train-a-digit-classifier
    • train-autoencoder
    • optical flow demo
    • train-on-housenumbers
    • train-on-cifar
    • tracking with deep nets
    • kinect demo
    • filter-bank visualization
    • saliency-networks
  • Training a Convnet for the Galaxy-Zoo Kaggle challenge(CUDA demo)
  • Music Tagging – Music Tagging scripts for torch7.
  • torch-datasets – Scripts to load several popular datasets including:
    • BSR 500
    • CIFAR-10
    • COIL
    • Street View House Numbers
    • MNIST
    • NORB
  • Atari2600 – Scripts to generate a dataset with static frames from the Arcade Learning Environment.

Matlab

Computer Vision

  • Contourlets – MATLAB source code that implements the contourlet transform and its utility functions.
  • Shearlets – MATLAB code for shearlet transform.
  • Curvelets – The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles.
  • Bandlets – MATLAB code for bandlet transform.
  • mexopencv – Collection and a development kit of MATLAB mex functions for OpenCV library.

Natural Language Processing

  • NLP – A NLP library for Matlab.

General-Purpose Machine Learning

  • Training a deep autoencoder or a classifier on MNIST digits – Training a deep autoencoder or a classifier on MNIST digits[DEEP LEARNING].
  • Convolutional-Recursive Deep Learning for 3D Object Classification – Convolutional-Recursive Deep Learning for 3D Object Classification[DEEP LEARNING].
  • Spider – The spider is intended to be a complete object orientated environment for machine learning in Matlab.
  • LibSVM – A Library for Support Vector Machines.
  • ThunderSVM – An Open-Source SVM Library on GPUs and CPUs
  • LibLinear – A Library for Large Linear Classification.
  • Machine Learning Module – Class on machine w/ PDF, lectures, code
  • Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind.
  • Pattern Recognition Toolbox – A complete object-oriented environment for machine learning in Matlab.
  • Pattern Recognition and Machine Learning – This package contains the matlab implementation of the algorithms described in the book Pattern Recognition and Machine Learning by C. Bishop.
  • Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly with MATLAB.
  • MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
  • Machine Learning in MatLab/Octave – examples of popular machine learning algorithms (neural networks, linear/logistic regressions, K-Means, etc.) with code examples and mathematics behind them being explained.

Data Analysis / Data Visualization

  • ParaMonte – A general-purpose MATLAB library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
  • matlab_bgl – MatlabBGL is a Matlab package for working with graphs.
  • gaimc – Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL’s mex functions.

.NET

Computer Vision

  • OpenCVDotNet – A wrapper for the OpenCV project to be used with .NET applications.
  • Emgu CV – Cross platform wrapper of OpenCV which can be compiled in Mono to be run on Windows, Linus, Mac OS X, iOS, and Android.
  • AForge.NET – Open source C# framework for developers and researchers in the fields of Computer Vision and Artificial Intelligence. Development has now shifted to GitHub.
  • Accord.NET – Together with AForge.NET, this library can provide image processing and computer vision algorithms to Windows, Windows RT and Windows Phone. Some components are also available for Java and Android.

Natural Language Processing

  • Stanford.NLP for .NET – A full port of Stanford NLP packages to .NET and also available precompiled as a NuGet package.

General-Purpose Machine Learning

  • Accord-Framework -The Accord.NET Framework is a complete framework for building machine learning, computer vision, computer audition, signal processing and statistical applications.
  • Accord.MachineLearning – Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications. This package is part of the Accord.NET Framework.
  • DiffSharp – An automatic differentiation (AD) library providing exact and efficient derivatives (gradients, Hessians, Jacobians, directional derivatives, and matrix-free Hessian- and Jacobian-vector products) for machine learning and optimization applications. Operations can be nested to any level, meaning that you can compute exact higher-order derivatives and differentiate functions that are internally making use of differentiation, for applications such as hyperparameter optimization.
  • Encog – An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
  • GeneticSharp – Multi-platform genetic algorithm library for .NET Core and .NET Framework. The library has several implementations of GA operators, like: selection, crossover, mutation, reinsertion and termination.
  • Infer.NET – Infer.NET is a framework for running Bayesian inference in graphical models. One can use Infer.NET to solve many different kinds of machine learning problems, from standard problems like classification, recommendation or clustering through to customized solutions to domain-specific problems. Infer.NET has been used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision, and many others.
  • ML.NET – ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers. ML.NET was originally developed in Microsoft Research and evolved into a significant framework over the last decade and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel and more.
  • Neural Network Designer – DBMS management system and designer for neural networks. The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feedback. The chat bots can even scrape the internet for information to return in their output as well as to use for learning.
  • Synapses – Neural network library in F#.
  • Vulpes – Deep belief and deep learning implementation written in F# and leverages CUDA GPU execution with Alea.cuBase.
  • MxNet.Sharp – .NET Standard bindings for Apache MxNet with Imperative, Symbolic and Gluon Interface for developing, training and deploying Machine Learning models in C#. https://mxnet.tech-quantum.com/

Data Analysis / Data Visualization

  • numl – numl is a machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering.
  • Math.NET Numerics – Numerical foundation of the Math.NET project, aiming to provide methods and algorithms for numerical computations in science, engineering and everyday use. Supports .Net 4.0, .Net 3.5 and Mono on Windows, Linux and Mac; Silverlight 5, WindowsPhone/SL 8, WindowsPhone 8.1 and Windows 8 with PCL Portable Profiles 47 and 344; Android/iOS with Xamarin.
  • Sho – Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.

Objective C

General-Purpose Machine Learning

  • YCML – A Machine Learning framework for Objective-C and Swift (OS X / iOS).
  • MLPNeuralNet – Fast multilayer perceptron neural network library for iOS and Mac OS X. MLPNeuralNet predicts new examples by trained neural networks. It is built on top of the Apple’s Accelerate Framework, using vectorized operations and hardware acceleration if available. [Deprecated]
  • MAChineLearning – An Objective-C multilayer perceptron library, with full support for training through backpropagation. Implemented using vDSP and vecLib, it’s 20 times faster than its Java equivalent. Includes sample code for use from Swift.
  • BPN-NeuralNetwork – It implemented 3 layers of neural networks ( Input Layer, Hidden Layer and Output Layer ) and it was named Back Propagation Neural Networks (BPN). This network can be used in products recommendation, user behavior analysis, data mining and data analysis. [Deprecated]
  • Multi-Perceptron-NeuralNetwork – it implemented multi-perceptrons neural network (ニューラルネットワーク) based on Back Propagation Neural Networks (BPN) and designed unlimited-hidden-layers.
  • KRHebbian-Algorithm – It is a non-supervisor and self-learning algorithm (adjust the weights) in the neural network of Machine Learning. [Deprecated]
  • KRKmeans-Algorithm – It implemented K-Means clustering and classification algorithm. It could be used in data mining and image compression. [Deprecated]
  • KRFuzzyCMeans-Algorithm – It implemented Fuzzy C-Means (FCM) the fuzzy clustering / classification algorithm on Machine Learning. It could be used in data mining and image compression. [Deprecated]

OCaml

General-Purpose Machine Learning

  • Oml – A general statistics and machine learning library.
  • GPR – Efficient Gaussian Process Regression in OCaml.
  • Libra-Tk – Algorithms for learning and inference with discrete probabilistic models.
  • TensorFlow – OCaml bindings for TensorFlow.

Perl

Data Analysis / Data Visualization

General-Purpose Machine Learning

Perl 6

Data Analysis / Data Visualization

General-Purpose Machine Learning

PHP

Natural Language Processing

  • jieba-php – Chinese Words Segmentation Utilities.

General-Purpose Machine Learning

  • PHP-ML – Machine Learning library for PHP. Algorithms, Cross Validation, Neural Network, Preprocessing, Feature Extraction and much more in one library.
  • PredictionBuilder – A library for machine learning that builds predictions using a linear regression.
  • Rubix ML – A high-level machine learning (ML) library that lets you build programs that learn from data using the PHP language.
  • 19 Questions – A machine learning / bayesian inference assigning attributes to objects.

Python

Computer Vision

  • Scikit-Image – A collection of algorithms for image processing in Python.
  • Jobtensor – A powerful tool for learning Python
  • Scikit-Opt – Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,Artificial Fish Swarm Algorithm in Python)
  • SimpleCV – An open source computer vision framework that gives access to several high-powered computer vision libraries, such as OpenCV. Written on Python and runs on Mac, Windows, and Ubuntu Linux.
  • Vigranumpy – Python bindings for the VIGRA C++ computer vision library.
  • OpenFace – Free and open source face recognition with deep neural networks.
  • PCV – Open source Python module for computer vision. [Deprecated]
  • face_recognition – Face recognition library that recognizes and manipulates faces from Python or from the command line.
  • dockerface – Easy to install and use deep learning Faster R-CNN face detection for images and video in a docker container.
  • Detectron – FAIR’s software system that implements state-of-the-art object detection algorithms, including Mask R-CNN. It is written in Python and powered by the Caffe2 deep learning framework. [Deprecated]
  • detectron2 – FAIR’s next-generation research platform for object detection and segmentation. It is a ground-up rewrite of the previous version, Detectron, and is powered by the PyTorch deep learning framework.
  • albumentations – А fast and framework agnostic image augmentation library that implements a diverse set of augmentation techniques. Supports classification, segmentation, detection out of the box. Was used to win a number of Deep Learning competitions at Kaggle, Topcoder and those that were a part of the CVPR workshops.
  • pytessarct – Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images. Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine.
  • imutils – A library containing Convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.
  • PyTorchCV – A PyTorch-Based Framework for Deep Learning in Computer Vision.
  • Self-supervised learning
  • neural-style-pt – A PyTorch implementation of Justin Johnson’s neural-style (neural style transfer).
  • Detecto – Train and run a computer vision model with 5-10 lines of code.
  • neural-dream – A PyTorch implementation of DeepDream.
  • Openpose – A real-time multi-person keypoint detection library for body, face, hands, and foot estimation
  • Deep High-Resolution-Net – A PyTorch implementation of CVPR2019 paper “Deep High-Resolution Representation Learning for Human Pose Estimation”
  • dream-creator – A PyTorch implementation of DeepDream. Allows individuals to quickly and easily train their own custom GoogleNet models with custom datasets for DeepDream.
  • Lucent – Tensorflow and OpenAI Clarity’s Lucid adapted for PyTorch.
  • lightly – Lightly is a computer vision framework for self-supervised learning.
  • Learnergy – Energy-based machine learning models built upon PyTorch.

Natural Language Processing

  • pkuseg-python – A better version of Jieba, developed by Peking University.
  • NLTK – A leading platform for building Python programs to work with human language data.
  • Pattern – A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
  • Quepy – A python framework to transform natural language questions to queries in a database query language.
  • TextBlob – Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
  • YAlign – A sentence aligner, a friendly tool for extracting parallel sentences from comparable corpora. [Deprecated]
  • jieba – Chinese Words Segmentation Utilities.
  • SnowNLP – A library for processing Chinese text.
  • spammy – A library for email Spam filtering built on top of nltk
  • loso – Another Chinese segmentation library. [Deprecated]
  • genius – A Chinese segment based on Conditional Random Field.
  • KoNLPy – A Python package for Korean natural language processing.
  • nut – Natural language Understanding Toolkit. [Deprecated]
  • Rosetta – Text processing tools and wrappers (e.g. Vowpal Wabbit)
  • BLLIP Parser – Python bindings for the BLLIP Natural Language Parser (also known as the Charniak-Johnson parser). [Deprecated]
  • PyNLPl – Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for FoLiA, but also ARPA language models, Moses phrasetables, GIZA++ alignments.
  • PySS3 – Python package that implements a novel white-box machine learning model for text classification, called SS3. Since SS3 has the ability to visually explain its rationale, this package also comes with easy-to-use interactive visualizations tools (online demos).
  • python-ucto – Python binding to ucto (a unicode-aware rule-based tokenizer for various languages).
  • python-frog – Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
  • python-zpar – Python bindings for ZPar, a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.
  • colibri-core – Python binding to C++ library for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
  • spaCy – Industrial strength NLP with Python and Cython.
  • PyStanfordDependencies – Python interface for converting Penn Treebank trees to Stanford Dependencies.
  • Distance – Levenshtein and Hamming distance computation. [Deprecated]
  • Fuzzy Wuzzy – Fuzzy String Matching in Python.
  • jellyfish – a python library for doing approximate and phonetic matching of strings.
  • editdistance – fast implementation of edit distance.
  • textacy – higher-level NLP built on Spacy.
  • stanford-corenlp-python – Python wrapper for Stanford CoreNLP [Deprecated]
  • CLTK – The Classical Language Toolkit.
  • Rasa – A “machine learning framework to automate text-and voice-based conversations.”
  • yase – Transcode sentence (or other sequence) to list of word vector .
  • Polyglot – Multilingual text (NLP) processing toolkit.
  • DrQA – Reading Wikipedia to answer open-domain questions.
  • Dedupe – A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
  • Snips NLU – Natural Language Understanding library for intent classification and entity extraction
  • NeuroNER – Named-entity recognition using neural networks providing state-of-the-art-results
  • DeepPavlov – conversational AI library with many pre-trained Russian NLP models.
  • BigARTM – topic modelling platform.
  • NALP – A Natural Adversarial Language Processing framework built over Tensorflow.

General-Purpose Machine Learning

  • Shapley -> A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
  • igel -> A delightful machine learning tool that allows you to train/fit, test and use models without writing code
  • ML Model building -> A Repository Containing Classification, Clustering, Regression, Recommender Notebooks with illustration to make them.
  • ML/DL project template
  • PyTorch Geometric Temporal -> A temporal extension of PyTorch Geometric for dynamic graph representation learning.
  • Little Ball of Fur -> A graph sampling extension library for NetworkX with a Scikit-Learn like API.
  • Karate Club -> An unsupervised machine learning extension library for NetworkX with a Scikit-Learn like API.
  • Auto_ViML -> Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced “auto vimal”, is a comprehensive and scalable Python AutoML toolkit with imbalanced handling, ensembling, stacking and built-in feature selection. Featured in Medium article.
  • PyOD -> Python Outlier Detection, comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. Featured for Advanced models, including Neural Networks/Deep Learning and Outlier Ensembles.
  • steppy -> Lightweight, Python library for fast and reproducible machine learning experimentation. Introduces a very simple interface that enables clean machine learning pipeline design.
  • steppy-toolkit -> Curated collection of the neural networks, transformers and models that make your machine learning work faster and more effective.
  • CNTK – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit. Documentation can be found here.
  • Couler – Unified interface for constructing and managing machine learning workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
  • auto_ml – Automated machine learning for production and analytics. Lets you focus on the fun parts of ML, while outputting production-ready code, and detailed analytics of your dataset and results. Includes support for NLP, XGBoost, CatBoost, LightGBM, and soon, deep learning.
  • machine learning – automated build consisting of a web-interface, and set of programmatic-interface API, for support vector machines. Corresponding dataset(s) are stored into a SQL database, then generated model(s) used for prediction(s), are stored into a NoSQL datastore.
  • XGBoost – Python bindings for eXtreme Gradient Boosting (Tree) Library.
  • Apache SINGA – An Apache Incubating project for developing an open source machine learning library.
  • Bayesian Methods for Hackers – Book/iPython notebooks on Probabilistic Programming in Python.
  • Featureforge A set of tools for creating and testing machine learning features, with a scikit-learn compatible API.
  • MLlib in Apache Spark – Distributed machine learning library in Spark
  • Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
  • scikit-learn – A Python module for machine learning built on top of SciPy.
  • metric-learn – A Python module for metric learning.
  • SimpleAI Python implementation of many of the artificial intelligence algorithms described in the book “Artificial Intelligence, a Modern Approach”. It focuses on providing an easy to use, well documented and tested library.
  • astroML – Machine Learning and Data Mining for Astronomy.
  • graphlab-create – A library with various machine learning models (regression, clustering, recommender systems, graph analytics, etc.) implemented on top of a disk-backed DataFrame.
  • BigML – A library that contacts external servers.
  • pattern – Web mining module for Python.
  • NuPIC – Numenta Platform for Intelligent Computing.
  • Pylearn2 – A Machine Learning library based on Theano. [Deprecated]
  • keras – High-level neural networks frontend for TensorFlowCNTK and Theano.
  • Lasagne – Lightweight library to build and train neural networks in Theano.
  • hebel – GPU-Accelerated Deep Learning Library in Python. [Deprecated]
  • Chainer – Flexible neural network framework.
  • prophet – Fast and automated time series forecasting framework by Facebook.
  • gensim – Topic Modelling for Humans.
  • topik – Topic modelling toolkit. [Deprecated]
  • PyBrain – Another Python Machine Learning Library.
  • Brainstorm – Fast, flexible and fun neural networks. This is the successor of PyBrain.
  • Surprise – A scikit for building and analyzing recommender systems.
  • implicit – Fast Python Collaborative Filtering for Implicit Datasets.
  • LightFM – A Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback.
  • Crab – A flexible, fast recommender engine. [Deprecated]
  • python-recsys – A Python library for implementing a Recommender System.
  • thinking bayes – Book on Bayesian Analysis.
  • Image-to-Image Translation with Conditional Adversarial Networks – Implementation of image to image (pix2pix) translation from the paper by isola et al.[DEEP LEARNING]
  • Restricted Boltzmann Machines -Restricted Boltzmann Machines in Python. [DEEP LEARNING]
  • Bolt – Bolt Online Learning Toolbox. [Deprecated]
  • CoverTree – Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree [Deprecated]
  • nilearn – Machine learning for NeuroImaging in Python.
  • neuropredict – Aimed at novice machine learners and non-expert programmers, this package offers easy (no coding needed) and comprehensive machine learning (evaluation and full report of predictive performance WITHOUT requiring you to code) in Python for NeuroImaging and any other type of features. This is aimed at absorbing much of the ML workflow, unlike other packages like nilearn and pymvpa, which require you to learn their API and code to produce anything useful.
  • imbalanced-learn – Python module to perform under sampling and oversampling with various techniques.
  • Shogun – The Shogun Machine Learning Toolbox.
  • Pyevolve – Genetic algorithm framework. [Deprecated]
  • Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind.
  • breze – Theano based library for deep and recurrent neural networks.
  • Cortex – Open source platform for deploying machine learning models in production.
  • pyhsmm – library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
  • SKLL – A wrapper around scikit-learn that makes it simpler to conduct experiments.
  • neurolab
  • Spearmint – Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012. [Deprecated]
  • Pebl – Python Environment for Bayesian Learning. [Deprecated]
  • Theano – Optimizing GPU-meta-programming code generating array oriented optimizing math compiler in Python.
  • TensorFlow – Open source software library for numerical computation using data flow graphs.
  • pomegranate – Hidden Markov Models for Python, implemented in Cython for speed and efficiency.
  • python-timbl – A Python extension module wrapping the full TiMBL C++ programming interface. Timbl is an elaborate k-Nearest Neighbours machine learning toolkit.
  • deap – Evolutionary algorithm framework.
  • pydeep – Deep Learning In Python. [Deprecated]
  • mlxtend – A library consisting of useful tools for data science and machine learning tasks.
  • neon – Nervana’s high-performance Python-based Deep Learning framework [DEEP LEARNING]. [Deprecated]
  • Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search.
  • Neural Networks and Deep Learning – Code samples for my book “Neural Networks and Deep Learning” [DEEP LEARNING].
  • Annoy – Approximate nearest neighbours implementation.
  • TPOT – Tool that automatically creates and optimizes machine learning pipelines using genetic programming. Consider it your personal data science assistant, automating a tedious part of machine learning.
  • pgmpy A python library for working with Probabilistic Graphical Models.
  • DIGITS – The Deep Learning GPU Training System (DIGITS) is a web application for training deep learning models.
  • Orange – Open source data visualization and data analysis for novices and experts.
  • MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
  • milk – Machine learning toolkit focused on supervised classification. [Deprecated]
  • TFLearn – Deep learning library featuring a higher-level API for TensorFlow.
  • REP – an IPython-based environment for conducting data-driven research in a consistent and reproducible way. REP is not trying to substitute scikit-learn, but extends it and provides better user experience. [Deprecated]
  • rgf_python – Python bindings for Regularized Greedy Forest (Tree) Library.
  • skbayes – Python package for Bayesian Machine Learning with scikit-learn API.
  • fuku-ml – Simple machine learning library, including Perceptron, Regression, Support Vector Machine, Decision Tree and more, it’s easy to use and easy to learn for beginners.
  • Xcessiv – A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
  • PyTorch – Tensors and Dynamic neural networks in Python with strong GPU acceleration
  • PyTorch Lightning – The lightweight PyTorch wrapper for high-performance AI research.
  • PyTorch Lightning Bolts – Toolbox of models, callbacks, and datasets for AI/ML researchers.
  • skorch – A scikit-learn compatible neural network library that wraps PyTorch.
  • ML-From-Scratch – Implementations of Machine Learning models from scratch in Python with a focus on transparency. Aims to showcase the nuts and bolts of ML in an accessible way.
  • Edward – A library for probabilistic modeling, inference, and criticism. Built on top of TensorFlow.
  • xRBM – A library for Restricted Boltzmann Machine (RBM) and its conditional variants in Tensorflow.
  • CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box. It is easy to install, well documented and supports CPU and GPU (even multi-GPU) computation.
  • stacked_generalization – Implementation of machine learning stacking technique as a handy library in Python.
  • modAL – A modular active learning framework for Python, built on top of scikit-learn.
  • Cogitare: A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python.
  • Parris – Parris, the automated infrastructure setup tool for machine learning algorithms.
  • neonrvm – neonrvm is an open source machine learning library based on RVM technique. It’s written in C programming language and comes with Python programming language bindings.
  • Turi Create – Machine learning from Apple. Turi Create simplifies the development of custom machine learning models. You don’t have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
  • xLearn – A high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems.
  • mlens – A high performance, memory efficient, maximally parallelized ensemble learning, integrated with scikit-learn.
  • Netron – Visualizer for machine learning models.
  • Thampi – Machine Learning Prediction System on AWS Lambda
  • MindsDB – Open Source framework to streamline use of neural networks.
  • Microsoft Recommenders: Examples and best practices for building recommendation systems, provided as Jupyter notebooks. The repo contains some of the latest state of the art algorithms from Microsoft Research as well as from other companies and institutions.
  • StellarGraph: Machine Learning on Graphs, a Python library for machine learning on graph-structured (network-structured) data.
  • BentoML: Toolkit for package and deploy machine learning models for serving in production
  • MiraiML: An asynchronous engine for continuous & autonomous machine learning, built for real-time usage.
  • numpy-ML: Reference implementations of ML models written in numpy
  • creme: A framework for online machine learning.
  • Neuraxle: A framework providing the right abstractions to ease research, development, and deployment of your ML pipelines.
  • Cornac – A comparative framework for multimodal recommender systems with a focus on models leveraging auxiliary data.
  • JAX – JAX is Autograd and XLA, brought together for high-performance machine learning research.
  • Catalyst – High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing. Being able to research/develop something new, rather than write another regular train loop.
  • Fastai – High-level wrapper built on the top of Pytorch which supports vision, text, tabular data and collaborative filtering.
  • scikit-multiflow – A machine learning framework for multi-output/multi-label and stream data.
  • Lightwood – A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with objective to build predictive models with one line of code.
  • bayeso – A simple, but essential Bayesian optimization package, written in Python.
  • mljar-supervised – An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides explanations and markdown reports.
  • evostra – A fast Evolution Strategy implementation in Python.
  • Determined – Scalable deep learning training platform, including integrated support for distributed training, hyperparameter tuning, experiment tracking, and model management.
  • PySyft – A Python library for secure and private Deep Learning built on PyTorch and TensorFlow.
  • PyGrid – Peer-to-peer network of data owners and data scientists who can collectively train AI models using PySyft
  • sktime – A unified framework for machine learning with time series
  • OPFython – A Python-inspired implementation of the Optimum-Path Forest classifier.
  • Opytimizer – Python-based meta-heuristic optimization techniques.
  • Gradio – A Python library for quickly creating and sharing demos of models. Debug models interactively in your browser, get feedback from collaborators, and generate public links without deploying anything.
  • Hub – Fastest unstructured dataset management for TensorFlow/PyTorch. Stream & version-control data. Store even petabyte-scale data in a single numpy-like array on the cloud accessible on any machine. Visit activeloop.ai for more info.
  • Synthia – Multidimensional synthetic data generation in Python.
  • ByteHub – An easy-to-use, Python-based feature store. Optimized for time-series data.

Data Analysis / Data Visualization

  • DataVisualization – A Github Repository Where you can Learn Datavisualizatoin Basics to Intermediate level.
  • Cartopy – Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
  • SciPy – A Python-based ecosystem of open-source software for mathematics, science, and engineering.
  • NumPy – A fundamental package for scientific computing with Python.
  • AutoViz AutoViz performs automatic visualization of any dataset with a single line of Python code. Give it any input file (CSV, txt or json) of any size and AutoViz will visualize it. See Medium article.
  • Numba – Python JIT (just in time) compiler to LLVM aimed at scientific Python by the developers of Cython and NumPy.
  • Mars – A tensor-based framework for large-scale data computation which is often regarded as a parallel and distributed version of NumPy.
  • NetworkX – A high-productivity software for complex networks.
  • igraph – binding to igraph library – General purpose graph library.
  • Pandas – A library providing high-performance, easy-to-use data structures and data analysis tools.
  • ParaMonte – A general-purpose Python library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
  • Open Mining – Business Intelligence (BI) in Python (Pandas web interface) [Deprecated]
  • PyMC – Markov Chain Monte Carlo sampling toolkit.
  • zipline – A Pythonic algorithmic trading library.
  • PyDy – Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
  • SymPy – A Python library for symbolic mathematics.
  • statsmodels – Statistical modeling and econometrics in Python.
  • astropy – A community Python library for Astronomy.
  • matplotlib – A Python 2D plotting library.
  • bokeh – Interactive Web Plotting for Python.
  • plotly – Collaborative web plotting for Python and matplotlib.
  • altair – A Python to Vega translator.
  • d3py – A plotting library for Python, based on D3.js.
  • PyDexter – Simple plotting for Python. Wrapper for D3xterjs; easily render charts in-browser.
  • ggplot – Same API as ggplot2 for R. [Deprecated]
  • ggfortify – Unified interface to ggplot2 popular R packages.
  • Kartograph.py – Rendering beautiful SVG maps in Python.
  • pygal – A Python SVG Charts Creator.
  • PyQtGraph – A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
  • pycascading [Deprecated]
  • Petrel – Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
  • Blaze – NumPy and Pandas interface to Big Data.
  • emcee – The Python ensemble sampling toolkit for affine-invariant MCMC.
  • windML – A Python Framework for Wind Energy Analysis and Prediction.
  • vispy – GPU-based high-performance interactive OpenGL 2D/3D data visualization library.
  • cerebro2 A web-based visualization and debugging platform for NuPIC. [Deprecated]
  • NuPIC Studio An all-in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool! [Deprecated]
  • SparklingPandas Pandas on PySpark (POPS).
  • Seaborn – A python visualization library based on matplotlib.
  • bqplot – An API for plotting in Jupyter (IPython).
  • pastalog – Simple, realtime visualization of neural network training performance.
  • Superset – A data exploration platform designed to be visual, intuitive, and interactive.
  • Dora – Tools for exploratory data analysis in Python.
  • Ruffus – Computation Pipeline library for python.
  • SOMPY – Self Organizing Map written in Python (Uses neural networks for data analysis).
  • somoclu Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters, has python API.
  • HDBScan – implementation of the hdbscan algorithm in Python – used for clustering
  • visualize_ML – A python package for data exploration and data analysis. [Deprecated]
  • scikit-plot – A visualization library for quick and easy generation of common plots in data analysis and machine learning.
  • Bowtie – A dashboard library for interactive visualizations using flask socketio and react.
  • lime – Lime is about explaining what machine learning classifiers (or models) are doing. It is able to explain any black box classifier, with two or more classes.
  • PyCM – PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
  • Dash – A framework for creating analytical web applications built on top of Plotly.js, React, and Flask
  • Lambdo – A workflow engine for solving machine learning problems by combining in one analysis pipeline (i) feature engineering and machine learning (ii) model training and prediction (iii) table population and column evaluation via user-defined (Python) functions.
  • TensorWatch – Debugging and visualization tool for machine learning and data science. It extensively leverages Jupyter Notebook to show real-time visualizations of data in running processes such as machine learning training.
  • dowel – A little logger for machine learning research. Output any object to the terminal, CSV, TensorBoard, text logs on disk, and more with just one call to logger.log().

Misc Scripts / iPython Notebooks / Codebases

Neural Networks

  • nn_builder – nn_builder is a python package that lets you build neural networks in 1 line
  • NeuralTalk – NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
  • Neuron – Neuron is simple class for time series predictions. It’s utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron – Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm.
  • NeuralTalk – NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences. [Deprecated]
  • Neuron – Neuron is simple class for time series predictions. It’s utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron – Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm. [Deprecated]
  • Data Driven Code – Very simple implementation of neural networks for dummies in python without using any libraries, with detailed comments.
  • Machine Learning, Data Science and Deep Learning with Python – LiveVideo course that covers machine learning, Tensorflow, artificial intelligence, and neural networks.
  • TResNet: High Performance GPU-Dedicated Architecture – TResNet models were designed and optimized to give the best speed-accuracy tradeoff out there on GPUs.
  • TResNet: Simple and powerful neural network library for python – Variety of supported types of Artificial Neural Network and learning algorithms.
  • Jina AI An easier way to build neural search in the cloud. Compatible with Jupyter Notebooks.
  • sequitur PyTorch library for creating and training sequence autoencoders in just two lines of code

Kaggle Competition Source Code

Reinforcement Learning

  • DeepMind Lab – DeepMind Lab is a 3D learning environment based on id Software’s Quake III Arena via ioquake3 and other open source software. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning.
  • Gym – OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
  • Serpent.AI – Serpent.AI is a game agent framework that allows you to turn any video game you own into a sandbox to develop AI and machine learning experiments. For both researchers and hobbyists.
  • ViZDoom – ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research in machine visual learning, and deep reinforcement learning, in particular.
  • Roboschool – Open-source software for robot simulation, integrated with OpenAI Gym.
  • Retro – Retro Games in Gym
  • SLM Lab – Modular Deep Reinforcement Learning framework in PyTorch.
  • Coach – Reinforcement Learning Coach by Intel® AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
  • garage – A toolkit for reproducible reinforcement learning research
  • metaworld – An open source robotics benchmark for meta- and multi-task reinforcement learning
  • acme – An Open Source Distributed Framework for Reinforcement Learning that makes build and train your agents easily.
  • Spinning Up – An educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning

Ruby

Natural Language Processing

  • Awesome NLP with Ruby – Curated link list for practical natural language processing in Ruby.
  • Treat – Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby.
  • Stemmer – Expose libstemmer_c to Ruby. [Deprecated]
  • Raspell – raspell is an interface binding for ruby. [Deprecated]
  • UEA Stemmer – Ruby port of UEALite Stemmer – a conservative stemmer for search and indexing.
  • Twitter-text-rb – A library that does auto linking and extraction of usernames, lists and hashtags in tweets.

General-Purpose Machine Learning

  • Awesome Machine Learning with Ruby – Curated list of ML related resources for Ruby.
  • Ruby Machine Learning – Some Machine Learning algorithms, implemented in Ruby. [Deprecated]
  • Machine Learning Ruby [Deprecated]
  • jRuby Mahout – JRuby Mahout is a gem that unleashes the power of Apache Mahout in the world of JRuby. [Deprecated]
  • CardMagic-Classifier – A general classifier module to allow Bayesian and other types of classifications.
  • rb-libsvm – Ruby language bindings for LIBSVM which is a Library for Support Vector Machines.
  • Scoruby – Creates Random Forest classifiers from PMML files.
  • rumale – Rumale is a machine learning library in Ruby

Data Analysis / Data Visualization

  • rsruby – Ruby – R bridge.
  • data-visualization-ruby – Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby. [Deprecated]
  • ruby-plot – gnuplot wrapper for Ruby, especially for plotting ROC curves into SVG files. [Deprecated]
  • plot-rb – A plotting library in Ruby built on top of Vega and D3. [Deprecated]
  • scruffy – A beautiful graphing toolkit for Ruby.
  • SciRuby
  • Glean – A data management tool for humans. [Deprecated]
  • Bioruby
  • Arel [Deprecated]

Misc

Rust

General-Purpose Machine Learning

  • deeplearn-rs – deeplearn-rs provides simple networks that use matrix multiplication, addition, and ReLU under the MIT license.
  • rustlearn – a machine learning framework featuring logistic regression, support vector machines, decision trees and random forests.
  • rusty-machine – a pure-rust machine learning library.
  • leaf – open source framework for machine intelligence, sharing concepts from TensorFlow and Caffe. Available under the MIT license. [Deprecated]
  • RustNN – RustNN is a feedforward neural network library. [Deprecated]
  • RusticSOM – A Rust library for Self Organising Maps (SOM).

R

General-Purpose Machine Learning

  • ahaz – ahaz: Regularization for semiparametric additive hazards regression. [Deprecated]
  • arules – arules: Mining Association Rules and Frequent Itemsets
  • biglasso – biglasso: Extending Lasso Model Fitting to Big Data in R.
  • bmrm – bmrm: Bundle Methods for Regularized Risk Minimization Package.
  • Boruta – Boruta: A wrapper algorithm for all-relevant feature selection.
  • bst – bst: Gradient Boosting.
  • C50 – C50: C5.0 Decision Trees and Rule-Based Models.
  • caret – Classification and Regression Training: Unified interface to ~150 ML algorithms in R.
  • caretEnsemble – caretEnsemble: Framework for fitting multiple caret models as well as creating ensembles of such models. [Deprecated]
  • CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box for R.
  • Clever Algorithms For Machine Learning
  • CORElearn – CORElearn: Classification, regression, feature evaluation and ordinal evaluation.
  • CoxBoost – CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks [Deprecated]
  • Cubist – Cubist: Rule- and Instance-Based Regression Modeling.
  • e1071 – e1071: Misc Functions of the Department of Statistics (e1071), TU Wien
  • earth – earth: Multivariate Adaptive Regression Spline Models
  • elasticnet – elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA.
  • ElemStatLearn – ElemStatLearn: Data sets, functions and examples from the book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman.
  • evtree – evtree: Evolutionary Learning of Globally Optimal Trees.
  • forecast – forecast: Timeseries forecasting using ARIMA, ETS, STLM, TBATS, and neural network models.
  • forecastHybrid – forecastHybrid: Automatic ensemble and cross validation of ARIMA, ETS, STLM, TBATS, and neural network models from the “forecast” package.
  • fpc – fpc: Flexible procedures for clustering.
  • frbs – frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks. [Deprecated]
  • GAMBoost – GAMBoost: Generalized linear and additive models by likelihood based boosting. [Deprecated]
  • gamboostLSS – gamboostLSS: Boosting Methods for GAMLSS.
  • gbm – gbm: Generalized Boosted Regression Models.
  • glmnet – glmnet: Lasso and elastic-net regularized generalized linear models.
  • glmpath – glmpath: L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model.
  • GMMBoost – GMMBoost: Likelihood-based Boosting for Generalized mixed models. [Deprecated]
  • grplasso – grplasso: Fitting user specified models with Group Lasso penalty.
  • grpreg – grpreg: Regularization paths for regression models with grouped covariates.
  • h2o – A framework for fast, parallel, and distributed machine learning algorithms at scale — Deeplearning, Random forests, GBM, KMeans, PCA, GLM.
  • hda – hda: Heteroscedastic Discriminant Analysis. [Deprecated]
  • Introduction to Statistical Learning
  • ipred – ipred: Improved Predictors.
  • kernlab – kernlab: Kernel-based Machine Learning Lab.
  • klaR – klaR: Classification and visualization.
  • L0Learn – L0Learn: Fast algorithms for best subset selection.
  • lars – lars: Least Angle Regression, Lasso and Forward Stagewise. [Deprecated]
  • lasso2 – lasso2: L1 constrained estimation aka ‘lasso’.
  • LiblineaR – LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library.
  • LogicReg – LogicReg: Logic Regression.
  • Machine Learning For Hackers
  • maptree – maptree: Mapping, pruning, and graphing tree models. [Deprecated]
  • mboost – mboost: Model-Based Boosting.
  • medley – medley: Blending regression models, using a greedy stepwise approach.
  • mlr – mlr: Machine Learning in R.
  • ncvreg – ncvreg: Regularization paths for SCAD- and MCP-penalized regression models.
  • nnet – nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models. [Deprecated]
  • pamr – pamr: Pam: prediction analysis for microarrays. [Deprecated]
  • party – party: A Laboratory for Recursive Partitioning
  • partykit – partykit: A Toolkit for Recursive Partitioning.
  • penalized – penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model.
  • penalizedLDA – penalizedLDA: Penalized classification using Fisher’s linear discriminant. [Deprecated]
  • penalizedSVM – penalizedSVM: Feature Selection SVM using penalty functions.
  • quantregForest – quantregForest: Quantile Regression Forests.
  • randomForest – randomForest: Breiman and Cutler’s random forests for classification and regression.
  • randomForestSRC – randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC).
  • rattle – rattle: Graphical user interface for data mining in R.
  • rda – rda: Shrunken Centroids Regularized Discriminant Analysis.
  • rdetools – rdetools: Relevant Dimension Estimation (RDE) in Feature Spaces. [Deprecated]
  • REEMtree – REEMtree: Regression Trees with Random Effects for Longitudinal (Panel) Data. [Deprecated]
  • relaxo – relaxo: Relaxed Lasso. [Deprecated]
  • rgenoud – rgenoud: R version of GENetic Optimization Using Derivatives
  • Rmalschains – Rmalschains: Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R.
  • rminer – rminer: Simpler use of data mining methods (e.g. NN and SVM) in classification and regression. [Deprecated]
  • ROCR – ROCR: Visualizing the performance of scoring classifiers. [Deprecated]
  • RoughSets – RoughSets: Data Analysis Using Rough Set and Fuzzy Rough Set Theories. [Deprecated]
  • rpart – rpart: Recursive Partitioning and Regression Trees.
  • RPMM – RPMM: Recursively Partitioned Mixture Model.
  • RSNNS – RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS).
  • RWeka – RWeka: R/Weka interface.
  • RXshrink – RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression.
  • sda – sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection. [Deprecated]
  • spectralGraphTopology – spectralGraphTopology: Learning Graphs from Data via Spectral Constraints.
  • SuperLearner – Multi-algorithm ensemble learning packages.
  • svmpath – svmpath: svmpath: the SVM Path algorithm. [Deprecated]
  • tgp – tgp: Bayesian treed Gaussian process models. [Deprecated]
  • tree – tree: Classification and regression trees.
  • varSelRF – varSelRF: Variable selection using random forests.
  • XGBoost.R – R binding for eXtreme Gradient Boosting (Tree) Library.
  • Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R.
  • igraph – binding to igraph library – General purpose graph library.
  • MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
  • TDSP-Utilities – Two data science utilities in R from Microsoft: 1) Interactive Data Exploration, Analysis, and Reporting (IDEAR) ; 2) Automated Modeling and Reporting (AMR).

Data Manipulation | Data Analysis | Data Visualization

  • dplyr – A data manipulation package that helps to solve the most common data manipulation problems.
  • ggplot2 – A data visualization package based on the grammar of graphics.
  • tmap for visualizing geospatial data with static maps and leaflet for interactive maps
  • tm and quanteda are the main packages for managing, analyzing, and visualizing textual data.
  • shiny is the basis for truly interactive displays and dashboards in R. However, some measure of interactivity can be achieved with htmlwidgets bringing javascript libraries to R. These include, plotlydygraphshighcharter, and several others.

SAS

General-Purpose Machine Learning

  • Visual Data Mining and Machine Learning – Interactive, automated, and programmatic modeling with the latest machine learning algorithms in and end-to-end analytics environment, from data prep to deployment. Free trial available.
  • Enterprise Miner – Data mining and machine learning that creates deployable models using a GUI or code.
  • Factory Miner – Automatically creates deployable machine learning models across numerous market or customer segments using a GUI.

Data Analysis / Data Visualization

  • SAS/STAT – For conducting advanced statistical analysis.
  • University Edition – FREE! Includes all SAS packages necessary for data analysis and visualization, and includes online SAS courses.

Natural Language Processing

Demos and Scripts

  • ML_Tables – Concise cheat sheets containing machine learning best practices.
  • enlighten-apply – Example code and materials that illustrate applications of SAS machine learning techniques.
  • enlighten-integration – Example code and materials that illustrate techniques for integrating SAS with other analytics technologies in Java, PMML, Python and R.
  • enlighten-deep – Example code and materials that illustrate using neural networks with several hidden layers in SAS.
  • dm-flow – Library of SAS Enterprise Miner process flow diagrams to help you learn by example about specific data mining topics.

Scala

Natural Language Processing

  • ScalaNLP – ScalaNLP is a suite of machine learning and numerical computing libraries.
  • Breeze – Breeze is a numerical processing library for Scala.
  • Chalk – Chalk is a natural language processing library. [Deprecated]
  • FACTORIE – FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
  • Montague – Montague is a semantic parsing library for Scala with an easy-to-use DSL.
  • Spark NLP – Natural language processing library built on top of Apache Spark ML to provide simple, performant, and accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

Data Analysis / Data Visualization

  • MLlib in Apache Spark – Distributed machine learning library in Spark
  • Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
  • Scalding – A Scala API for Cascading.
  • Summing Bird – Streaming MapReduce with Scalding and Storm.
  • Algebird – Abstract Algebra for Scala.
  • xerial – Data management utilities for Scala. [Deprecated]
  • PredictionIO – PredictionIO, a machine learning server for software developers and data engineers.
  • BIDMat – CPU and GPU-accelerated matrix library intended to support large-scale exploratory data analysis.
  • Flink – Open source platform for distributed stream and batch data processing.
  • Spark Notebook – Interactive and Reactive Data Science using Scala and Spark.

General-Purpose Machine Learning

  • DeepLearning.scala – Creating statically typed dynamic neural networks from object-oriented & functional programming constructs.
  • Conjecture – Scalable Machine Learning in Scalding.
  • brushfire – Distributed decision tree ensemble learning in Scala.
  • ganitha – Scalding powered machine learning. [Deprecated]
  • adam – A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
  • bioscala – Bioinformatics for the Scala programming language
  • BIDMach – CPU and GPU-accelerated Machine Learning Library.
  • Figaro – a Scala library for constructing probabilistic models.
  • H2O Sparkling Water – H2O and Spark interoperability.
  • FlinkML in Apache Flink – Distributed machine learning library in Flink.
  • DynaML – Scala Library/REPL for Machine Learning Research.
  • Saul – Flexible Declarative Learning-Based Programming.
  • SwiftLearner – Simply written algorithms to help study ML or write your own implementations.
  • Smile – Statistical Machine Intelligence and Learning Engine.
  • doddle-model – An in-memory machine learning library built on top of Breeze. It provides immutable objects and exposes its functionality through a scikit-learn-like API.
  • TensorFlow Scala – Strongly-typed Scala API for TensorFlow.

Scheme

Neural Networks

Swift

General-Purpose Machine Learning

  • Bender – Fast Neural Networks framework built on top of Metal. Supports TensorFlow models.
  • Swift AI – Highly optimized artificial intelligence and machine learning library written in Swift.
  • Swift for Tensorflow – a next-generation platform for machine learning, incorporating the latest research across machine learning, compilers, differentiable programming, systems design, and beyond.
  • BrainCore – The iOS and OS X neural network framework.
  • swix – A bare bones library that includes a general matrix language and wraps some OpenCV for iOS development. [Deprecated]
  • AIToolbox – A toolbox framework of AI modules written in Swift: Graphs/Trees, Linear Regression, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Algorithms, MDP, Mixture of Gaussians.
  • MLKit – A simple Machine Learning Framework written in Swift. Currently features Simple Linear Regression, Polynomial Regression, and Ridge Regression.
  • Swift Brain – The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X development. This project includes algorithms focused on Bayes theorem, neural networks, SVMs, Matrices, etc…
  • Perfect TensorFlow – Swift Language Bindings of TensorFlow. Using native TensorFlow models on both macOS / Linux.
  • PredictionBuilder – A library for machine learning that builds predictions using a linear regression.
  • Awesome CoreML – A curated list of pretrained CoreML models.
  • Awesome Core ML Models – A curated list of machine learning models in CoreML format.

TensorFlow

General-Purpose Machine Learning

  • Awesome TensorFlow – A list of all things related to TensorFlow.
  • Golden TensorFlow – A page of content on TensorFlow, including academic papers and links to related topics.

Tools

Neural Networks

  • layer – Neural network inference from the command line

Misc

  • Pinecone – Vector database for applications that require real-time, scalable vector embedding and similarity search.
  • CatalyzeX – Browser extension (Chrome and Firefox) that automatically finds and shows code implementations for machine learning papers anywhere: Google, Twitter, Arxiv, Scholar, etc.
  • ML Workspace – All-in-one web-based IDE for machine learning and data science. The workspace is deployed as a docker container and is preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch) and dev tools (e.g., Jupyter, VS Code).
  • Notebooks – A starter kit for Jupyter notebooks and machine learning. Companion docker images consist of all combinations of python versions, machine learning frameworks (Keras, PyTorch and Tensorflow) and CPU/CUDA versions.
  • DVC – Data Science Version Control is an open-source version control system for machine learning projects with pipelines support. It makes ML projects reproducible and shareable.
  • Kedro – Kedro is a data and development workflow framework that implements best practices for data pipelines with an eye towards productionizing machine learning models.
  • guild.ai – Tool to log, analyze, compare and “optimize” experiments. It’s cross-platform and framework independent, and provided integrated visualizers such as tensorboard.
  • Sacred – Python tool to help you configure, organize, log and reproduce experiments. Like a notebook lab in the context of Chemistry/Biology. The community has built multiple add-ons leveraging the proposed standard.
  • MLFlow – platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. Framework and language agnostic, take a look at all the built-in integrations.
  • Weights & Biases – Machine learning experiment tracking, dataset versioning, hyperparameter search, visualization, and collaboration
  • More tools to improve the ML lifecycle: CatalystPachydermIO. The following are Github-alike and targeting teams Weights & BiasesNeptune.MlComet.mlValohai.aiDAGsHub.
  • MachineLearningWithTensorFlow2ed – a book on general purpose machine learning techniques regression, classification, unsupervised clustering, reinforcement learning, auto encoders, convolutional neural networks, RNNs, LSTMs, using TensorFlow 1.14.1.
  • m2cgen – A tool that allows the conversion of ML models into native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart) with zero dependencies.
  • CML – A library for doing continuous integration with ML projects. Use GitHub Actions & GitLab CI to train and evaluate models in production like environments and automatically generate visual reports with metrics and graphs in pull/merge requests. Framework & language agnostic.
  • Pythonizr – An online tool to generate boilerplate machine learning code that uses scikit-learn.

Credits

  • Some of the python libraries were cut-and-pasted from vinta
  • References for Go were mostly cut-and-pasted from gopherdata

2. Machine Learning with Python – Part II

This curated list contains 840 awesome open-source projects with a total of 2.8M stars grouped into 32 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome!

mage_man  Discover other best-of lists or create your own.
mailbox  Subscribe to our newsletter for updates and trending projects.

Contents

Explanation

  • 1st_place_medal2nd_place_medal3rd_place_medal  Combined project-quality score
  • star  Star count from GitHub
  • hatching_chick  New project (less than 6 months old)
  • zzz  Inactive project (6 months no activity)
  • skull  Dead project (12 months no activity)
  • chart_with_upwards_trendchart_with_downwards_trend  Project is trending up or down
  • heavy_plus_sign  Project was recently added
  • exclamation  Warning (e.g. missing/risky license)
  • man_technologist  Contributors count from GitHub
  • twisted_rightwards_arrows  Fork count from GitHub
  • clipboard  Issue count from GitHub
  • stopwatch  Last update timestamp on package manager
  • inbox_tray  Download count from package manager
  • package  Number of dependent projects
  •   Tensorflow related project
  •   Sklearn related project
  •   PyTorch related project
  •   MxNet related project
  •   Apache Spark related project
  •   Jupyter related project
  •   PaddlePaddle related project
  •   Pandas related project

Machine Learning Frameworks

Back to top

General-purpose machine learning and deep learning frameworks.

Tensorflow (1st_place_medal44 · star 160K) – An Open Source Machine Learning Framework for Everyone. Apache-2 PyTorch (1st_place_medal39 · star 47K) – Tensors and Dynamic neural networks in Python with strong GPU.. BSD-3 PySpark (1st_place_medal38 · star 29K) – Apache Spark Python API. Apache-2 scikit-learn (1st_place_medal37 · star 45K) – scikit-learn: machine learning in Python. BSD-3 StatsModels (1st_place_medal36 · star 6.1K) – Statsmodels: statistical modeling and econometrics in Python. BSD-3Keras (1st_place_medal35 · star 51K) – Deep Learning for humans. MIT XGBoost (1st_place_medal35 · star 21K) – Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or.. Apache-2LightGBM (1st_place_medal35 · star 12K) – A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT,.. MITMXNet (2nd_place_medal34 · star 19K) – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning.. Apache-2 Theano (2nd_place_medal34 · star 9.4K) – Theano is a Python library that allows you to define, optimize, and.. BSD-3PyFlink (2nd_place_medal33 · star 16K) – Apache Flink Python API. Apache-2pytorch-lightning (2nd_place_medal33 · star 12K) – The lightweight PyTorch wrapper for high-performance.. Apache-2 Fastai (2nd_place_medal32 · star 21K) – The fastai deep learning library. Apache-2 jax (2nd_place_medal32 · star 12K) – Composable transformations of Python+NumPy programs: differentiate,.. Apache-2Thinc (2nd_place_medal32 · star 2.2K) – A refreshing functional take on deep learning, compatible with your favorite.. MITCatboost (2nd_place_medal31 · star 5.8K) – A fast, scalable, high performance Gradient Boosting on Decision.. Apache-2Chainer (2nd_place_medal31 · star 5.5K) – A flexible framework of neural networks for deep learning. MITPaddlePaddle (2nd_place_medal30 · star 15K) – PArallel Distributed Deep LEarning: Machine Learning.. Apache-2 TFlearn (2nd_place_medal30 · star 9.5K) – Deep learning library featuring a higher-level API for TensorFlow. MIT Vowpal Wabbit (2nd_place_medal30 · star 7.5K) – Vowpal Wabbit is a machine learning system which pushes the.. BSD-3Turi Create (2nd_place_medal28 · star 10K) – Turi Create simplifies the development of custom machine learning.. BSD-3Sonnet (2nd_place_medal28 · star 8.8K) – TensorFlow-based neural network library. Apache-2 dyNET (2nd_place_medal28 · star 3.2K) – DyNet: The Dynamic Neural Network Toolkit. Apache-2tensorpack (3rd_place_medal27 · star 6K · chart_with_downwards_trend) – A Neural Net Training Interface on TensorFlow, with focus.. Apache-2 Ignite (3rd_place_medal27 · star 3.5K) – High-level library to help with training and evaluating neural.. BSD-3 Jina (3rd_place_medal27 · star 2.5K) – An easier way to build neural search on the cloud. Apache-2Flax (3rd_place_medal27 · star 1.5K) – Flax is a neural network ecosystem for JAX that is designed for.. Apache-2 jaxCNTK (3rd_place_medal26 · star 17K · zzz) – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit. MITskorch (3rd_place_medal26 · star 3.8K) – A scikit-learn compatible neural network library that wraps.. BSD-3  mlpack (3rd_place_medal26 · star 3.6K) – mlpack: a scalable C++ machine learning library –. BSD-3Ludwig (3rd_place_medal25 · star 7.6K) – Ludwig is a toolbox that allows to train and evaluate deep.. Apache-2 xLearn (3rd_place_medal25 · star 2.9K · zzz) – High performance, easy-to-use, and scalable machine learning (ML).. Apache-2Neural Network Libraries (3rd_place_medal24 · star 2.4K) – Neural Network Libraries. Apache-2ktrain (3rd_place_medal24 · star 760) – ktrain is a Python library that makes deep learning and AI more.. Apache-2 tensorflow-upstream (3rd_place_medal24 · star 550) – TensorFlow ROCm port. Apache-2 SHOGUN (3rd_place_medal23 · star 2.8K) – Unified and efficient Machine Learning. BSD-3einops (3rd_place_medal23 · star 2.6K) – Deep learning operations reinvented (for pytorch, tensorflow, jax and.. MITfklearn (3rd_place_medal23 · star 1.3K) – fklearn: Functional Machine Learning. Apache-2mace (3rd_place_medal21 · star 4.3K) – MACE is a deep learning inference framework optimized for mobile.. Apache-2Neural Tangents (3rd_place_medal21 · star 1.3K) – Fast and Easy Infinite Neural Networks in Python. Apache-2ThunderSVM (3rd_place_medal20 · star 1.3K) – ThunderSVM: A Fast SVM Library on GPUs and CPUs. Apache-2Haiku (3rd_place_medal20 · star 1K) – JAX-based neural network library. Apache-2Torchbearer (3rd_place_medal20 · star 590) – torchbearer: A model fitting library for PyTorch. MIT Objax (3rd_place_medal19 · star 580) – Objax is a machine learning framework that provides an Object.. Apache-2 jaxelegy (3rd_place_medal17 · star 180) – Elegy is a framework-agnostic Trainer interface for the Jax.. Apache-2  jaxThunderGBM (3rd_place_medal16 · star 580) – ThunderGBM: Fast GBDTs and Random Forests on GPUs. Apache-2NeoML (3rd_place_medal13 · star 570) – Machine learning framework for both deep learning and traditional.. Apache-2Show 7 hidden projects…

Data Visualization

Back to top

General-purpose and task-specific data visualization libraries.

Matplotlib (1st_place_medal41 · star 13K) – matplotlib: plotting with Python. Python-2.0Seaborn (1st_place_medal37 · star 8.2K) – Statistical data visualization using matplotlib. BSD-3Plotly (1st_place_medal35 · star 9.1K) – The interactive graphing library for Python (includes Plotly Express). MITdash (1st_place_medal34 · star 14K) – Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required. MITBokeh (1st_place_medal33 · star 15K) – Interactive Data Visualization in the browser, from Python. BSD-3pyecharts (2nd_place_medal31 · star 11K) – Python Echarts Plotting Library. MIT wordcloud (2nd_place_medal31 · star 7.9K) – A little word cloud generator in Python. MITAltair (2nd_place_medal31 · star 6.5K) – Declarative statistical visualization library for Python. BSD-3UMAP (2nd_place_medal30 · star 4.6K) – Uniform Manifold Approximation and Projection. BSD-3bqplot (2nd_place_medal30 · star 3K) – Plotting library for IPython/Jupyter notebooks. Apache-2 PyQtGraph (2nd_place_medal30 · star 2.3K) – Fast data visualization and GUI tools for scientific / engineering.. MITpandas-profiling (2nd_place_medal29 · star 6.9K) – Create HTML profiling reports from pandas DataFrame.. MIT  VisPy (2nd_place_medal29 · star 2.6K) – High-performance interactive 2D/3D data visualization library. BSD-3 Graphviz (2nd_place_medal29 · star 940) – Simple Python interface for Graphviz. MITdatashader (2nd_place_medal28 · star 2.4K) – Quickly and accurately render even the largest data. BSD-3HoloViews (2nd_place_medal28 · star 1.8K) – With Holoviews, your data visualizes itself. BSD-3 Cufflinks (2nd_place_medal27 · star 2.1K) – Productivity Tools for Plotly + Pandas. MIT PyVista (2nd_place_medal27 · star 720) – 3D plotting and mesh analysis through a streamlined interface for the.. MIT data-validation (2nd_place_medal27 · star 530) – Library for exploring and validating machine learning.. Apache-2  Perspective (3rd_place_medal26 · star 3.3K) – Streaming pivot visualization via WebAssembly. Apache-2 missingno (3rd_place_medal26 · star 2.7K) – Missing data visualization module for Python. MITpythreejs (3rd_place_medal26 · star 710) – A Jupyter – Three.js bridge. BSD-3 Facets Overview (3rd_place_medal25 · star 6.5K) – Visualizations for machine learning datasets. Apache-2 Chartify (3rd_place_medal25 · star 2.8K) – Python library that makes it easy for data scientists to create.. Apache-2HyperTools (3rd_place_medal25 · star 1.6K) – A Python toolbox for gaining geometric insights into high-dimensional.. MIThvPlot (3rd_place_medal25 · star 360) – A high-level plotting API for pandas, dask, xarray, and networkx built on.. BSD-3openTSNE (3rd_place_medal24 · star 760) – Extensible, parallel implementations of t-SNE. BSD-3PandasGUI (3rd_place_medal23 · star 2.1K) – A GUI for Pandas DataFrames. MIT python-ternary (3rd_place_medal23 · star 400) – Ternary plotting library for python with matplotlib. MITD-Tale (3rd_place_medal22 · star 2.1K) – Visualizer for pandas data structures. ❗️LGPL-2.1  Multicore-TSNE (3rd_place_medal22 · star 1.5K · zzz) – Parallel t-SNE implementation with Python and Torch.. BSD-3 Pandas-Bokeh (3rd_place_medal22 · star 630) – Bokeh Plotting Backend for Pandas and GeoPandas. MIT vega (3rd_place_medal22 · star 300) – IPython/Jupyter notebook module for Vega and Vega-Lite. BSD-3 Sweetviz (3rd_place_medal20 · star 1.4K) – Visualize and compare datasets, target values and associations, with one.. MITlets-plot (3rd_place_medal20 · star 520) – An open-source plotting library for statistical data. MITjoypy (3rd_place_medal20 · star 320) – Joyplots in Python with matplotlib & pandas. MITHiPlot (3rd_place_medal19 · star 2K) – HiPlot makes understanding high dimensional data easy. MITanimatplot (3rd_place_medal19 · star 360) – A python package for animating plots build on matplotlib. MITPyWaffle (3rd_place_medal18 · star 400 · zzz) – Make Waffle Charts in Python. MITAutoViz (3rd_place_medal18 · star 310) – Automatically Visualize any dataset, any size with a single line of.. Apache-2FiftyOne (3rd_place_medal18 · star 220) – Visualize, create, and debug image and video datasets.. Apache-2   data-describe (3rd_place_medal14 · star 270) – datadescribe: Pythonic EDA Accelerator for Data Science. Apache-2nx-altair (3rd_place_medal14 · star 160 · zzz) – Draw interactive NetworkX graphs with Altair. MIT Show 6 hidden projects…

Text Data & NLP

Back to top

Libraries for processing, cleaning, manipulating, and analyzing text data as well as libraries for NLP tasks such as language detection, fuzzy matching, classification, seq2seq learning, conversational AI, keyword extraction, and translation.

spaCy (1st_place_medal37 · star 20K) – Industrial-strength Natural Language Processing (NLP) in Python. MITtransformers (1st_place_medal36 · star 42K) – Transformers: State-of-the-art Natural Language.. Apache-2  gensim (1st_place_medal35 · star 12K) – Topic Modelling for Humans. ❗️LGPL-2.1nltk (1st_place_medal34 · star 9.7K) – Suite of libraries and programs for symbolic and statistical natural.. Apache-2AllenNLP (1st_place_medal32 · star 9.8K) – An open-source NLP research library, built on PyTorch. Apache-2 fairseq (1st_place_medal31 · star 11K) – Facebook AI Research Sequence-to-Sequence Toolkit written in Python. MIT ChatterBot (1st_place_medal31 · star 11K · zzz) – ChatterBot is a machine learning, conversational dialog engine.. BSD-3sentencepiece (1st_place_medal31 · star 4.9K) – Unsupervised text tokenizer for Neural Network-based text.. Apache-2fastText (1st_place_medal30 · star 22K · zzz) – Library for fast text representation and classification. MITflair (1st_place_medal30 · star 10K) – A very simple framework for state-of-the-art Natural Language Processing.. MIT snowballstemmer (1st_place_medal30 · star 480) – Snowball compiler and stemming algorithms. BSD-3TextBlob (2nd_place_medal29 · star 7.6K) – Simple, Pythonic, text processing–Sentiment analysis, part-of-speech.. MITtorchtext (2nd_place_medal29 · star 2.7K · chart_with_downwards_trend) – Data loaders and abstractions for text and NLP. BSD-3 Rasa (2nd_place_medal28 · star 11K) – Open source machine learning framework to automate text- and voice-.. Apache-2 OpenNMT (2nd_place_medal28 · star 4.9K) – Open Source Neural Machine Translation in PyTorch. MIT sentence-transformers (2nd_place_medal28 · star 4.4K) – Sentence Embeddings with BERT & XLNet. Apache-2 Tokenizers (2nd_place_medal28 · star 4.3K) – Fast State-of-the-Art Tokenizers optimized for Research and.. Apache-2Dedupe (2nd_place_medal28 · star 2.9K) – A python library for accurate and scalable fuzzy matching, record.. MITphonenumbers (2nd_place_medal28 · star 2.6K) – Python port of Google’s libphonenumber. Apache-2DeepPavlov (2nd_place_medal26 · star 5.1K) – An open source library for deep learning end-to-end dialog.. Apache-2 ftfy (2nd_place_medal26 · star 2.9K) – Fixes mojibake and other glitches in Unicode text, after the fact. MITGluonNLP (2nd_place_medal26 · star 2.2K) – Toolkit that enables easy text preprocessing, datasets loading.. Apache-2 TextDistance (2nd_place_medal26 · star 1.9K) – Compute distance between sequences. 30+ algorithms, pure python.. MITtextacy (2nd_place_medal26 · star 1.6K) – NLP, before and after spaCy. Apache-2jellyfish (2nd_place_medal26 · star 1.4K) – a python library for doing approximate and phonetic matching of.. BSD-2TensorFlow Text (2nd_place_medal26 · star 700) – Making text a first-class citizen in TensorFlow. Apache-2 CLTK (2nd_place_medal26 · star 650) – The Classical Language Toolkit. MITinflect (2nd_place_medal26 · star 490) – Correctly generate plurals, ordinals, indefinite articles; convert numbers.. MITParlAI (2nd_place_medal25 · star 7K) – A framework for training and evaluating AI models on a variety of.. MIT PyText (2nd_place_medal25 · star 6.1K) – A natural language modeling framework based on PyTorch. BSD-3 stanza (2nd_place_medal25 · star 5.3K · chart_with_downwards_trend) – Official Stanford NLP Python Library for Many Human Languages. Apache-2vaderSentiment (2nd_place_medal25 · star 2.9K · zzz) – VADER Sentiment Analysis. VADER (Valence Aware Dictionary.. MITspark-nlp (2nd_place_medal25 · star 2K) – State of the Art Natural Language Processing. Apache-2 haystack (2nd_place_medal25 · star 1.5K) – End-to-end Python framework for building natural language search.. Apache-2pyahocorasick (2nd_place_medal25 · star 590) – Python module (C extension and plain python) implementing Aho-.. BSD-3T5 (3rd_place_medal24 · star 3.2K) – Code for the paper Exploring the Limits of Transfer Learning with a.. Apache-2 Sumy (3rd_place_medal24 · star 2.5K) – Module for automatic summarization of text documents and HTML pages. Apache-2fastNLP (3rd_place_medal24 · star 2K) – fastNLP: A Modularized and Extensible NLP Framework. Currently still.. Apache-2pytorch-nlp (3rd_place_medal24 · star 1.9K) – Basic Utilities for PyTorch Natural Language Processing (NLP). BSD-3 scattertext (3rd_place_medal24 · star 1.5K · chart_with_upwards_trend) – Beautiful visualizations of how language differs among.. Apache-2sense2vec (3rd_place_medal24 · star 1.2K) – Contextually-keyed word vectors. MITspacy-transformers (3rd_place_medal24 · star 920) – Use pretrained transformers like BERT, XLNet and GPT-2.. MIT spacySciSpacy (3rd_place_medal24 · star 850) – A full spaCy pipeline and models for scientific/biomedical documents. Apache-2Ciphey (3rd_place_medal23 · star 6.5K) – Automatically decrypt encryptions without knowing the key or cipher,.. MITflashtext (3rd_place_medal23 · star 4.7K · zzz) – Extract Keywords from sentence or Replace keywords in sentences. MITneuralcoref (3rd_place_medal23 · star 2.2K) – Fast Coreference Resolution in spaCy with Neural Networks. MITpySBD (3rd_place_medal23 · star 290) – pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence.. MITtextgenrnn (3rd_place_medal22 · star 4.3K · zzz) – Easily train your own text-generating neural network of any.. MIT fast-bert (3rd_place_medal22 · star 1.5K) – Super easy library for BERT based NLP models. Apache-2PyTextRank (3rd_place_medal22 · star 1.5K · chart_with_downwards_trend) – Python implementation of TextRank for phrase extraction and.. MITFARM (3rd_place_medal22 · star 1.1K) – Fast & easy transfer learning for NLP. Harvesting language models.. Apache-2 DeepMatcher (3rd_place_medal21 · star 3.5K · zzz) – Python package for performing Entity and Text Matching using.. BSD-3gpt-2-simple (3rd_place_medal21 · star 2.5K) – Python package to easily retrain OpenAI’s GPT-2 text-.. MIT Texar (3rd_place_medal21 · star 2.1K · zzz) – Toolkit for Machine Learning, Natural Language Processing, and.. Apache-2 NLP Architect (3rd_place_medal20 · star 2.6K) – A model library for exploring state-of-the-art deep learning.. Apache-2NeMo (3rd_place_medal20 · star 2.5K) – NeMo: a toolkit for conversational AI. Apache-2 DELTA (3rd_place_medal20 · star 1.4K) – DELTA is a deep learning based natural language and speech.. Apache-2 Sockeye (3rd_place_medal20 · star 990) – Sequence-to-sequence framework with a focus on Neural Machine.. Apache-2 YouTokenToMe (3rd_place_medal20 · star 720) – Unsupervised text tokenizer focused on computational efficiency. MITfinetune (3rd_place_medal20 · star 630) – Scikit-learn style model finetuning for NLP. MPL-2.0  Texthero (3rd_place_medal19 · star 2.1K) – Text preprocessing, representation and visualization from zero to hero. MITtextpipe (3rd_place_medal19 · star 280) – Textpipe: clean and extract metadata from text. MITKashgari (3rd_place_medal18 · star 2K) – Kashgari is a production-level NLP Transfer learning framework.. Apache-2 Camphr (3rd_place_medal18 · star 330) – spaCy plugin for Transformers , Udify, ELmo, etc. Apache-2 spacyskift (3rd_place_medal18 · star 210) – scikit-learn wrappers for Python fastText. MIT Translate (3rd_place_medal15 · star 680) – Translate – a PyTorch Language Library. BSD-3 VizSeq (3rd_place_medal15 · star 310) – An Analysis Toolkit for Natural Language Generation (Translation,.. MITOpenNRE (3rd_place_medal14 · star 3K) – An Open-Source Package for Neural Relation Extraction (NRE). MITTransferNLP (3rd_place_medal14 · star 290 · zzz) – NLP library designed for reproducible experimentation.. MIT NeuralQA (3rd_place_medal14 · star 180) – NeuralQA: A Usable Library for Question Answering on Large Datasets with.. MITtextvec (3rd_place_medal13 · star 170) – Text vectorization tool to outperform TFIDF for classification tasks. MIT Show 11 hidden projects…

Image Data

Back to top

Libraries for image & video processing, manipulation, and augmentation as well as libraries for computer vision tasks such as facial recognition, object detection, and classification.

Pillow (1st_place_medal39 · star 8.3K) – The friendly PIL fork (Python Imaging Library). ❗️PILtorchvision (1st_place_medal36 · star 8.6K) – Datasets, Transforms and Models specific to Computer Vision. BSD-3 scikit-image (1st_place_medal33 · star 4.2K) – Image processing in Python. BSD-2imgaug (1st_place_medal31 · star 11K · zzz) – Image augmentation for machine learning experiments. MITimageio (1st_place_medal31 · star 840) – Python library for reading and writing image data. BSD-2opencv-python (2nd_place_medal30 · star 1.8K) – Automated CI toolchain to produce precompiled opencv-python,.. MITWand (2nd_place_medal30 · star 1.1K) – The ctypes-based simple ImageMagick binding for Python. MITFace Recognition (2nd_place_medal29 · star 39K) – The world’s simplest facial recognition api for Python.. MIT MoviePy (2nd_place_medal29 · star 7.3K) – Video editing with Python. MITPyTorch Image Models (2nd_place_medal28 · star 7.9K · chart_with_upwards_trend) – PyTorch image models, scripts, pretrained weights –.. Apache-2 Albumentations (2nd_place_medal28 · star 7.5K) – Fast image augmentation library and easy to use wrapper.. MIT Kornia (2nd_place_medal28 · star 3.7K) – Open Source Differentiable Computer Vision Library for PyTorch. Apache-2 imutils (2nd_place_medal28 · star 3.6K) – A series of convenience functions to make basic image processing.. MITImageHash (2nd_place_medal28 · star 1.9K) – A Python Perceptual Image Hashing Module. BSD-2imageai (2nd_place_medal27 · star 6K) – A python library built to empower developers to build applications and.. MITGluonCV (2nd_place_medal27 · star 4.6K) – Gluon CV Toolkit. Apache-2 detectron2 (2nd_place_medal26 · star 15K) – Detectron2 is FAIR’s next-generation platform for object.. Apache-2 InsightFace (2nd_place_medal26 · star 8.7K) – Face Analysis Project on MXNet. MIT MMDetection (2nd_place_medal25 · star 14K) – OpenMMLab Detection Toolbox and Benchmark. Apache-2 PyTorch3D (2nd_place_medal25 · star 4.6K) – PyTorch3D is FAIR’s library of reusable components for deep.. MIT facenet-pytorch (2nd_place_medal25 · star 1.9K) – Pretrained Pytorch face detection (MTCNN) and recognition.. MIT mahotas (2nd_place_medal25 · star 670) – Computer Vision in Python. MITAugmentor (3rd_place_medal24 · star 4.3K · zzz) – Image augmentation library in Python for machine learning. MITmtcnn (3rd_place_medal24 · star 1.4K) – MTCNN face detection implementation for TensorFlow, as a PIP package. MIT Face Alignment (3rd_place_medal23 · star 4.7K) – 2D and 3D Face alignment library build using pytorch. BSD-3 CellProfiler (3rd_place_medal23 · star 550) – An open-source application for biological image analysis. BSD-3segmentation_models (3rd_place_medal22 · star 3K · zzz) – Segmentation models with pretrained backbones. Keras.. MIT vidgear (3rd_place_medal22 · star 1.7K) – High-performance cross-platform Video Processing Python framework.. Apache-2pyvips (3rd_place_medal22 · star 300) – python binding for libvips using cffi. MITImage Deduplicator (3rd_place_medal21 · star 3.4K) – Finding duplicate images made easy!. Apache-2 Image Super-Resolution (3rd_place_medal21 · star 2.6K) – Super-scale your images and run experiments with.. Apache-2 tensorflow-graphics (3rd_place_medal21 · star 2.4K) – TensorFlow Graphics: Differentiable Graphics Layers.. Apache-2 Classy Vision (3rd_place_medal21 · star 1.2K) – An end-to-end PyTorch framework for image and video.. MIT Torch Points 3D (3rd_place_medal21 · star 1.1K) – Pytorch framework for doing deep learning on point clouds. BSD-3 MMF (3rd_place_medal20 · star 4.2K) – A modular framework for vision & language multimodal research from.. BSD-3 image-match (3rd_place_medal20 · star 2.5K) – Quickly search over billions of images. Apache-2nude.py (3rd_place_medal20 · star 790) – Nudity detection with Python. MITCaer (3rd_place_medal20 · star 450) – A lightweight Computer Vision library. Scale your models, not boilerplate. MITvit-pytorch (3rd_place_medal18 · star 2.9K · hatching_chick) – Implementation of Vision Transformer, a simple way to.. MIT Norfair (3rd_place_medal18 · star 920) – Lightweight Python library for adding real-time 2D object tracking to.. BSD-3PaddleDetection (3rd_place_medal17 · star 2.3K) – Object detection and instance segmentation toolkit.. Apache-2 lightly (3rd_place_medal17 · star 430 · hatching_chick) – A python library for self-supervised learning on images. MIT pycls (3rd_place_medal15 · star 1.5K) – Codebase for Image Classification Research, written in PyTorch. MIT DE⫶TR (3rd_place_medal14 · star 6.4K) – End-to-End Object Detection with Transformers. Apache-2 PySlowFast (3rd_place_medal14 · star 3.4K) – PySlowFast: video understanding codebase from FAIR for.. Apache-2 Show 4 hidden projects…

Graph Data

Back to top

Libraries for graph processing, clustering, embedding, and machine learning tasks.

networkx (1st_place_medal33 · star 8.8K · chart_with_downwards_trend) – Network Analysis in Python. BSD-3PyTorch Geometric (1st_place_medal29 · star 10K · chart_with_upwards_trend) – Geometric Deep Learning Extension Library for PyTorch. MIT dgl (2nd_place_medal26 · star 6.8K) – Python package built to ease deep learning on graph, on top of existing.. Apache-2StellarGraph (2nd_place_medal25 · star 1.8K) – StellarGraph – Machine Learning on Graphs. Apache-2 Spektral (2nd_place_medal23 · star 1.7K) – Graph Neural Networks with Keras and Tensorflow 2. MIT ogb (2nd_place_medal22 · star 770) – Benchmark datasets, data loaders, and evaluators for graph machine learning. MITNode2Vec (2nd_place_medal22 · star 650) – Implementation of the node2vec algorithm. MITtorch-cluster (2nd_place_medal21 · star 340) – PyTorch Extension Library of Optimized Graph Cluster.. MIT AmpliGraph (2nd_place_medal20 · star 1.4K · zzz) – Python library for Representation Learning on Knowledge.. Apache-2 PyTorch-BigGraph (3rd_place_medal19 · star 2.7K) – Generate embeddings from large-scale graph-structured.. BSD-3 PyKEEN (3rd_place_medal19 · star 330) – A Python library for learning and evaluating knowledge graph embeddings. MITgraph-nets (3rd_place_medal18 · star 4.8K) – Build Graph Nets in Tensorflow. Apache-2 DeepGraph (3rd_place_medal18 · star 230) – Analyze Data with Pandas-based Networks. Documentation:. BSD-3 Paddle Graph Learning (3rd_place_medal17 · star 920) – Paddle Graph Learning (PGL) is an efficient and.. Apache-2 kglib (3rd_place_medal16 · star 400) – Grakn Knowledge Graph Library (ML R&D). Apache-2pytorch_geometric_temporal (3rd_place_medal16 · star 370) – A Temporal Extension Library for PyTorch Geometric. MIT GraphEmbedding (3rd_place_medal15 · star 1.8K) – Implementation and experiments of graph embedding algorithms. MIT Euler (3rd_place_medal14 · star 2.5K · zzz) – A distributed graph deep learning framework. Apache-2 AutoGL (3rd_place_medal14 · star 590 · hatching_chick) – An autoML framework & toolkit for machine learning on graphs. MIT OpenKE (3rd_place_medal13 · star 2.4K · zzz) – An Open-Source Package for Knowledge Embedding (KE). MITGraphVite (3rd_place_medal13 · star 860) – GraphVite: A General and High-performance Graph Embedding System. Apache-2Show 8 hidden projects…

Audio Data

Back to top

Libraries for audio analysis, manipulation, transformation, and extraction, as well as speech recognition and music generation tasks.

DeepSpeech (1st_place_medal31 · star 17K) – DeepSpeech is an open source embedded (offline, on-device).. MPL-2.0 Pydub (1st_place_medal30 · star 5.2K · chart_with_upwards_trend) – Manipulate audio with a simple and easy high level interface. MITMagenta (2nd_place_medal29 · star 16K) – Magenta: Music and Art Generation with Machine Intelligence. Apache-2 torchaudio (2nd_place_medal29 · star 1.3K · chart_with_upwards_trend) – Data manipulation and transformation for audio signal.. BSD-2 librosa (2nd_place_medal27 · star 4.3K) – Python library for audio and music analysis. ISCaudioread (2nd_place_medal26 · star 360) – cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.. MITspleeter (2nd_place_medal25 · star 16K) – Deezer source separation library including pretrained models. MIT pyAudioAnalysis (2nd_place_medal25 · star 3.8K) – Python Audio Analysis Library: Feature Extraction,.. Apache-2python-soundfile (2nd_place_medal25 · star 370) – SoundFile is an audio library based on libsndfile, CFFI, and.. BSD-3espnet (3rd_place_medal24 · star 3.5K) – End-to-End Speech Processing Toolkit. Apache-2python_speech_features (3rd_place_medal23 · star 1.9K) – This library provides common speech features for ASR.. MITtinytag (3rd_place_medal23 · star 440) – Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and.. MITPorcupine (3rd_place_medal22 · star 2.4K) – On-device wake word detection powered by deep learning. Apache-2DDSP (3rd_place_medal22 · star 1.8K) – DDSP: Differentiable Digital Signal Processing. Apache-2 kapre (3rd_place_medal21 · star 720) – kapre: Keras Audio Preprocessors. MIT Dejavu (3rd_place_medal20 · star 5.3K · zzz) – Audio fingerprinting and recognition in Python. MITTTS (3rd_place_medal20 · star 3.3K) – Deep learning for Text to Speech (Discussion forum:.. MPL-2.0Muda (3rd_place_medal17 · star 180 · zzz) – A library for augmenting annotated audio data. ISCJulius (3rd_place_medal14 · star 180 · hatching_chick) – Fast PyTorch based DSP for audio and 1D signals. MIT Show 4 hidden projects…

Geospatial Data

Back to top

Libraries to load, process, analyze, and write geographic data as well as libraries for spatial analysis, map visualization, and geocoding.

pydeck (1st_place_medal33 · star 8.5K) – WebGL2 powered geospatial visualization layers. MIT folium (1st_place_medal32 · star 5.2K) – Python Data. Leaflet.js Maps. MITgeopy (1st_place_medal32 · star 3.2K) – Geocoding library for Python. MITShapely (1st_place_medal32 · star 2.2K) – Manipulation and analysis of geometric objects. BSD-3GeoPandas (2nd_place_medal31 · star 2.5K) – Python tools for geographic data. BSD-3 pyproj (2nd_place_medal31 · star 580 · chart_with_upwards_trend) – Python interface to PROJ (cartographic projections and coordinate.. MITRasterio (2nd_place_medal30 · star 1.4K) – Rasterio reads and writes geospatial raster datasets. BSD-3Fiona (2nd_place_medal30 · star 780) – Fiona reads and writes geographic data files. BSD-3ipyleaflet (3rd_place_medal28 · star 1.1K · chart_with_upwards_trend) – A Jupyter – Leaflet.js bridge. MIT geojson (3rd_place_medal26 · star 600) – Python bindings and utilities for GeoJSON. BSD-3ArcGIS API (3rd_place_medal25 · star 980) – Documentation and samples for ArcGIS API for Python. Apache-2PySAL (3rd_place_medal25 · star 830) – PySAL: Python Spatial Analysis Library Meta-Package. BSD-3GeoViews (3rd_place_medal22 · star 330) – Simple, concise geographical visualization in Python. BSD-3EarthPy (3rd_place_medal20 · star 230) – A package built to support working with spatial data using open source.. BSD-3pymap3d (3rd_place_medal19 · star 180) – pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef.. BSD-2Show 7 hidden projects…

Financial Data

Back to top

Libraries for algorithmic stock/crypto trading, risk analytics, backtesting, technical analysis, and other tasks on financial data.

zipline (1st_place_medal30 · star 14K) – Zipline, a Pythonic Algorithmic Trading Library. Apache-2yfinance (1st_place_medal30 · star 4.5K) – Yahoo! Finance market data downloader (+faster Pandas Datareader). Apache-2Alpha Vantage (1st_place_medal27 · star 3.2K) – A python wrapper for Alpha Vantage API for financial data. MITta (1st_place_medal27 · star 1.9K) – Technical Analysis Library using Pandas and Numpy. MITpyfolio (2nd_place_medal26 · star 3.6K · zzz) – Portfolio and risk analytics in Python. Apache-2empyrical (2nd_place_medal25 · star 740) – Common financial risk and performance metrics. Used by zipline and.. Apache-2Alphalens (2nd_place_medal24 · star 1.8K · zzz) – Performance analysis of predictive (alpha) stock factors. Apache-2IB-insync (2nd_place_medal24 · star 1.3K) – Python sync/async framework for Interactive Brokers API. BSD-2bt (2nd_place_medal24 · star 980) – bt – flexible backtesting for Python. MITffn (2nd_place_medal24 · star 800) – ffn – a financial function library for Python. MITEnigma Catalyst (3rd_place_medal23 · star 2K) – An Algorithmic Trading Library for Crypto-Assets in Python. Apache-2stockstats (3rd_place_medal23 · star 730) – Supply a wrapper “StockDataFrame“ based on the.. BSD-3TensorTrade (3rd_place_medal21 · star 3K) – An open source reinforcement learning framework for training,.. Apache-2finmarketpy (3rd_place_medal20 · star 2.5K) – Python library for backtesting trading strategies & analyzing.. Apache-2Qlib (3rd_place_medal19 · star 4.6K) – Qlib is an AI-oriented quantitative investment platform, which aims to.. MIT tf-quant-finance (3rd_place_medal19 · star 2.5K) – High-performance TensorFlow library for quantitative.. Apache-2 Crypto Signals (3rd_place_medal18 · star 2.7K) – Github.com/CryptoSignal – #1 Quant Trading & Technical Analysis.. MITShow 6 hidden projects…

Time Series Data

Back to top

Libraries for forecasting, anomaly detection, feature extraction, and machine learning on time-series and sequential data.

Prophet (1st_place_medal28 · star 12K) – Tool for producing high quality forecasts for time series data that has.. MITtsfresh (1st_place_medal27 · star 5.5K) – Automatic extraction of relevant features from time series:. MIT sktime (1st_place_medal27 · star 3.7K) – A unified framework for machine learning with time series. BSD-3 pmdarima (2nd_place_medal26 · star 830) – A statistical library designed to fill the void in Python’s time series.. MITtslearn (2nd_place_medal25 · star 1.5K) – A machine learning toolkit dedicated to time-series data. BSD-2 Streamz (2nd_place_medal24 · star 920) – Real-time stream processing for python. BSD-3GluonTS (2nd_place_medal23 · star 1.8K) – Probabilistic time series modeling in Python. Apache-2 Darts (2nd_place_medal22 · star 750) – A python library for easy manipulation and forecasting of time series. Apache-2STUMPY (3rd_place_medal20 · star 1.7K) – STUMPY is a powerful and scalable Python library for computing a Matrix.. BSD-3pyts (3rd_place_medal20 · star 890 · zzz) – A Python package for time series classification. BSD-3pytorch-forecasting (3rd_place_medal19 · star 830) – Time series forecasting with PyTorch. MITseglearn (3rd_place_medal19 · star 430) – Python module for machine learning time series:. BSD-3matrixprofile-ts (3rd_place_medal18 · star 620 · zzz) – A Python library for detecting patterns and anomalies.. Apache-2Auto TS (3rd_place_medal18 · star 190) – Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost.. Apache-2ADTK (3rd_place_medal17 · star 610 · zzz) – A Python toolkit for rule-based/unsupervised anomaly detection in time.. MPL-2.0tick (3rd_place_medal17 · star 320 · zzz) – Module for statistical learning, with a particular emphasis on time-.. BSD-3atspy (3rd_place_medal16 · star 340) – AtsPy: Automated Time Series Models in Python (by @firmai). MITShow 3 hidden projects…

Medical Data

Back to top

Libraries for processing and analyzing medical data such as MRIs, EEGs, genomic data, and other medical imaging formats.

Lifelines (1st_place_medal29 · star 1.6K) – Survival analysis in Python. MITNilearn (1st_place_medal29 · star 710) – Machine learning for NeuroImaging in Python. BSD-3 NIPYPE (1st_place_medal29 · star 560) – Workflows and interfaces for neuroimaging packages. Apache-2NiBabel (1st_place_medal29 · star 390) – Python package to access a cacophony of neuro-imaging file formats. MITMNE (2nd_place_medal27 · star 1.5K) – MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python. BSD-3DIPY (2nd_place_medal27 · star 390) – DIPY is the paragon 3D/4D+ imaging library in Python. Contains generic.. BSD-3Hail (2nd_place_medal24 · star 700) – Scalable genomic data analysis. MIT NIPY (2nd_place_medal23 · star 290) – Neuroimaging in Python FMRI analysis package. BSD-3MONAI (3rd_place_medal22 · star 1.8K) – AI Toolkit for Healthcare Imaging. Apache-2 DeepVariant (3rd_place_medal21 · star 2.2K) – DeepVariant is an analysis pipeline that uses a deep neural.. BSD-3 NiftyNet (3rd_place_medal21 · star 1.3K · zzz) – [unmaintained] An open-source convolutional neural.. Apache-2 Brainiak (3rd_place_medal19 · star 230) – Brain Imaging Analysis Kit. Apache-2Glow (3rd_place_medal19 · star 160) – An open-source toolkit for large-scale genomic analysis. Apache-2Medical Detection Toolkit (3rd_place_medal12 · star 910 · zzz) – The Medical Detection Toolkit contains 2D + 3D.. Apache-2 MedicalNet (3rd_place_medal11 · star 1.1K · zzz) – Many studies have shown that the performance on deep learning is.. MITShow 4 hidden projects…

Optical Character Recognition

Back to top

Libraries for optical character recognition (OCR) and text extraction from images or videos.

Tesseract (1st_place_medal30 · star 3.5K) – Python-tesseract is an optical character recognition (OCR) tool.. Apache-2EasyOCR (1st_place_medal28 · star 11K) – Ready-to-use OCR with 80+ supported languages and all popular writing.. Apache-2OCRmyPDF (2nd_place_medal27 · star 4K) – OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to.. MPL-2.0tesserocr (2nd_place_medal26 · star 1.4K) – A Python wrapper for the tesseract-ocr API. MITPaddleOCR (2nd_place_medal24 · star 11K) – Awesome multilingual OCR toolkits based on PaddlePaddle.. Apache-2 attention-ocr (3rd_place_medal21 · star 840) – A Tensorflow model for text recognition (CNN + seq2seq with.. MIT keras-ocr (3rd_place_medal20 · star 780) – A packaged and flexible version of the CRAFT text detector and.. MIT calamari (3rd_place_medal19 · star 790) – Line based ATR Engine based on OCRopy. Apache-2doc2text (3rd_place_medal18 · star 1.2K) – Detect text blocks and OCR poorly scanned PDFs in bulk. Python module.. MITMozart (3rd_place_medal10 · star 240 · hatching_chick) – An optical music recognition (OMR) system. Converts sheet.. Apache-2 Show 1 hidden projects…

Data Containers & Structures

Back to top

General-purpose data containers & structures as well as utilities & extensions for pandas.

pandas (1st_place_medal40 · star 29K) – Flexible and powerful data analysis / manipulation library for.. BSD-3 numpy (1st_place_medal38 · star 17K) – The fundamental package for scientific computing with Python. BSD-3h5py (1st_place_medal36 · star 1.5K) – HDF5 for Python — The h5py package is a Pythonic interface to the HDF5.. BSD-3Arrow (2nd_place_medal35 · star 7.5K) – Apache Arrow is a cross-language development platform for in-memory.. Apache-2xarray (2nd_place_medal32 · star 2K) – N-D labeled arrays and datasets in Python. Apache-2numexpr (2nd_place_medal31 · star 1.6K) – Fast numerical array expression evaluator for Python, NumPy, PyTables,.. MITTinyDB (2nd_place_medal29 · star 4.1K) – TinyDB is a lightweight document oriented database optimized for your.. MITKoalas (2nd_place_medal29 · star 2.7K) – Koalas: pandas API on Apache Spark. Apache-2  Bottleneck (2nd_place_medal29 · star 580) – Fast NumPy array functions written in C. BSD-2Modin (2nd_place_medal28 · star 5.8K) – Modin: Speed up your Pandas workflows by changing a single line of.. Apache-2 PyTables (2nd_place_medal28 · star 1K) – A Python package to manage extremely large amounts of data. BSD-3datasketch (3rd_place_medal27 · star 1.4K) – MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog,.. MITzarr (3rd_place_medal26 · star 660) – An implementation of chunked, compressed, N-dimensional arrays for Python. MITbcolz (3rd_place_medal25 · star 910) – A columnar data container that can be compressed. BSD-3Arctic (3rd_place_medal24 · star 2.2K) – Arctic is a high performance datastore for numeric data. ❗️LGPL-2.1swifter (3rd_place_medal24 · star 1.6K) – A package which efficiently applies any function to a pandas.. MIT Pandaral·lel (3rd_place_medal24 · star 1.4K) – A simple and efficient tool to parallelize Pandas.. BSD-3  Vaex (3rd_place_medal23 · star 5.9K) – Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and.. MITdatatable (3rd_place_medal21 · star 1.2K) – A Python package for manipulating 2-dimensional tabular data.. MPL-2.0StaticFrame (3rd_place_medal21 · star 220) – Immutable and grow-only Pandas-like DataFrames with a more explicit.. MITfletcher (3rd_place_medal20 · star 210) – Pandas ExtensionDType/Array backed by Apache Arrow. MIT Bounter (3rd_place_medal17 · star 900 · zzz) – Efficient Counter that uses a limited (bounded) amount of memory.. MITPandaPy (3rd_place_medal14 · star 470) – PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x.. MIT Show 5 hidden projects…

Data Loading & Extraction

Back to top

Libraries for loading, collecting, and extracting data from a variety of data sources and formats.

Faker (1st_place_medal36 · star 12K) – Faker is a Python package that generates fake data for you. MITxlrd (1st_place_medal34 · star 1.9K) – Please use openpyxl where you can… BSD-3xmltodict (1st_place_medal32 · star 4.3K · zzz) – Python module that makes working with XML feel like you are.. MITTensorFlow Datasets (1st_place_medal32 · star 2.7K) – TFDS is a collection of datasets ready to use with.. Apache-2 python-magic (1st_place_medal32 · star 1.8K) – A python wrapper for libmagic. MITTablib (2nd_place_medal31 · star 3.9K) – Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c. MITsmart-open (2nd_place_medal30 · star 2K) – Utils for streaming large files (S3, HDFS, gzip, bz2…). MITDatasets (2nd_place_medal29 · star 6.9K) – The largest hub of ready-to-use NLP datasets for ML models with.. Apache-2pandas-datareader (2nd_place_medal29 · star 1.9K) – Extract data from a wide range of Internet sources.. BSD-3 snorkel (3rd_place_medal28 · star 4.5K · chart_with_upwards_trend) – A system for quickly generating training data with weak.. Apache-2csvkit (3rd_place_medal28 · star 4.5K) – A suite of utilities for converting to and working with CSV, the king of.. MITtabulator-py (3rd_place_medal26 · star 200) – Python library for reading and writing tabular data via streams. MITIntake (3rd_place_medal25 · star 530) – Intake is a lightweight package for finding, investigating, loading and.. BSD-2SDV (3rd_place_medal21 · star 360) – Synthetic Data Generation for tabular, relational and time series data. MITdatatest (3rd_place_medal21 · star 240) – Tools for test driven data-wrangling and data validation. Apache-2Show 8 hidden projects…

Web Scraping & Crawling

Back to top

Libraries for web scraping, crawling, downloading, and mining as well as libraries.

link best-of-web-python – Web Scraping ( star 1.1K · hatching_chick) – Collection of web-scraping and crawling libraries.

Data Pipelines & Streaming

Back to top

Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.

Celery (1st_place_medal39 · star 17K · chart_with_upwards_trend) – Asynchronous task queue/job queue based on distributed message passing. BSD-3Airflow (1st_place_medal36 · star 21K · chart_with_upwards_trend) – Platform to programmatically author, schedule, and monitor.. Apache-2joblib (1st_place_medal35 · star 2.4K) – Computing with Python functions. BSD-3rq (1st_place_medal33 · star 7.6K) – Simple job queues for Python. BSD-3luigi (2nd_place_medal32 · star 14K) – Luigi is a Python module that helps you build complex pipelines of batch.. Apache-2Beam (2nd_place_medal32 · star 4.6K) – Unified programming model to define and execute data processing.. Apache-2Prefect (2nd_place_medal30 · star 6K) – The easiest way to automate your data. Apache-2dbt (2nd_place_medal29 · star 2.7K) – dbt (data build tool) enables data analysts and engineers to transform.. Apache-2faust (2nd_place_medal28 · star 5.4K) – Python Stream Processing. BSD-3Kedro (2nd_place_medal28 · star 3.6K) – A Python framework for creating reproducible, maintainable and modular.. Apache-2Dagster (2nd_place_medal27 · star 3K) – A data orchestrator for machine learning, analytics, and ETL. Apache-2mrjob (2nd_place_medal27 · star 2.5K) – Run MapReduce jobs on Hadoop or Amazon Web Services. Apache-2petl (2nd_place_medal27 · star 860) – Python Extract Transform and Load Tables of Data. MITPyFunctional (2nd_place_medal26 · star 1.8K) – Python library for creating data pipelines with chain functional.. MITHub (3rd_place_medal25 · star 2.7K) – Fastest unstructured dataset management for TensorFlow/PyTorch… MPL-2.0  TFX (3rd_place_medal25 · star 1.4K) – TFX is an end-to-end platform for deploying production ML pipelines. Apache-2 Great Expectations (3rd_place_medal24 · star 3.9K) – Always know what to expect from your data. Apache-2streamparse (3rd_place_medal23 · star 1.4K) – Run Python in Apache Storm topologies. Pythonic API, CLI.. Apache-2bonobo (3rd_place_medal23 · star 1.4K) – Extract Transform Load for Python 3.5+. Apache-2Optimus (3rd_place_medal23 · star 980) – Agile Data Preparation Workflows madeeasy with dask, cudf,.. Apache-2 pysparkling (3rd_place_medal23 · star 230) – A pure Python implementation of Apache Spark’s RDD and DStream.. MITPypeline (3rd_place_medal22 · star 1.2K) – Concurrent data pipelines in Python . MITdpark (3rd_place_medal20 · star 2.6K) – Python clone of Spark, a MapReduce alike framework in Python. BSD-3 mrq (3rd_place_medal20 · star 840) – Mr. Queue – A distributed worker task queue in Python using Redis & gevent. MITpdpipe (3rd_place_medal20 · star 590) – Easy pipelines for pandas DataFrames. MIT ploomber (3rd_place_medal20 · star 210) – A convention over configuration workflow orchestrator. Develop.. Apache-2spark-deep-learning (3rd_place_medal18 · star 1.8K) – Deep Learning Pipelines for Apache Spark. Apache-2 Mara Pipelines (3rd_place_medal18 · star 1.6K) – A lightweight opinionated ETL framework, halfway between plain.. MITTaskTiger (3rd_place_medal18 · star 1K) – Python task queue using Redis. MITDatabolt Flow (3rd_place_medal18 · star 900) – Python library for building highly effective data science workflows. MITBatchFlow (3rd_place_medal18 · star 160) – BatchFlow helps you conveniently work with random or sequential.. Apache-2flupy (3rd_place_medal18 · star 150) – Fluent data pipelines for python and your shell. MITriko (3rd_place_medal17 · star 1.6K · zzz) – A Python stream processing engine modeled after Yahoo! Pipes. MITzenml (3rd_place_medal14 · star 900 · hatching_chick) – ZenML: Bring Zen to your ML with reproducible pipelines. Apache-2Show 1 hidden projects…

Distributed Machine Learning

Back to top

Libraries that provide capabilities to distribute and parallelize machine learning tasks across large-scale compute infrastructure.

Ray (1st_place_medal32 · star 15K) – An open source framework that provides a simple, universal API for.. Apache-2dask (1st_place_medal32 · star 8K · chart_with_downwards_trend) – Parallel computing with task scheduling. BSD-3dask.distributed (1st_place_medal31 · star 1.2K · chart_with_downwards_trend) – A distributed task scheduler for Dask. BSD-3horovod (2nd_place_medal29 · star 11K) – Distributed training framework for TensorFlow, Keras, PyTorch, and.. Apache-2ipyparallel (2nd_place_medal28 · star 1.9K) – Interactive Parallel Computing in Python. BSD-3 Mesh (2nd_place_medal26 · star 910) – Mesh TensorFlow: Model Parallelism Made Easier. Apache-2 BigDL (2nd_place_medal25 · star 3.7K) – BigDL: Distributed Deep Learning Framework for Apache Spark. Apache-2Elephas (2nd_place_medal25 · star 1.5K) – Distributed Deep learning with Keras & Spark. MIT keras petastorm (2nd_place_medal25 · star 1.1K) – Petastorm library enables single machine or distributed training.. Apache-2mpi4py (2nd_place_medal25 · star 390) – Python bindings for MPI. BSD-3DeepSpeed (3rd_place_medal24 · star 4.5K) – DeepSpeed is a deep learning optimization library that makes.. MIT TensorFlowOnSpark (3rd_place_medal24 · star 3.6K) – TensorFlowOnSpark brings TensorFlow programs to.. Apache-2  dask-ml (3rd_place_medal24 · star 690) – Scalable Machine Learning with Dask. BSD-3MMLSpark (3rd_place_medal23 · star 2.3K) – Microsoft Machine Learning for Apache Spark. MIT analytics-zoo (3rd_place_medal22 · star 2.2K) – Distributed Tensorflow, Keras and PyTorch on Apache.. Apache-2 FairScale (3rd_place_medal21 · star 850) – PyTorch extensions for high performance and large scale training. BSD-3 Submit it (3rd_place_medal21 · star 310) – Python 3.6+ toolbox for submitting jobs to Slurm. MITApache Singa (3rd_place_medal19 · star 2.2K) – a distributed deep learning platform. Apache-2BytePS (3rd_place_medal18 · star 2.7K) – A high performance and generic framework for distributed DNN training. Apache-2Fiber (3rd_place_medal18 · star 860) – Distributed Computing for AI Made Simple. Apache-2Hivemind (3rd_place_medal18 · star 660) – Decentralized deep learning in PyTorch. Built to train models on.. MITsk-dist (3rd_place_medal18 · star 260) – Distributed scikit-learn meta-estimators in PySpark. Apache-2  somoclu (3rd_place_medal18 · star 220 · zzz) – Massively parallel self-organizing maps: accelerate training on.. MITShow 3 hidden projects…

Hyperparameter Optimization & AutoML

Back to top

Libraries for hyperparameter optimization, automl and neural architecture search.

Optuna (1st_place_medal31 · star 4.2K) – A hyperparameter optimization framework. MITHyperopt (1st_place_medal30 · star 5.5K) – Distributed Asynchronous Hyperparameter Optimization in Python. BSD-3scikit-optimize (1st_place_medal29 · star 2.1K) – Sequential model-based optimization with a `scipy.optimize`.. BSD-3Keras Tuner (1st_place_medal28 · star 2.3K) – Hyperparameter tuning for humans. Apache-2 AutoKeras (2nd_place_medal27 · star 7.8K) – AutoML library for deep learning. Apache-2 Bayesian Optimization (2nd_place_medal27 · star 4.9K) – A Python implementation of global optimization with.. MITNNI (2nd_place_medal26 · star 9.3K) – An open source AutoML toolkit for automate machine learning lifecycle,.. MITauto-sklearn (2nd_place_medal26 · star 5.3K) – Automated Machine Learning with scikit-learn. BSD-3 AutoGluon (2nd_place_medal26 · star 3K) – AutoGluon: AutoML for Text, Image, and Tabular Data. Apache-2 nevergrad (2nd_place_medal26 · star 2.8K) – A Python toolbox for performing gradient-free optimization. MITBoTorch (2nd_place_medal26 · star 1.9K) – Bayesian optimization in PyTorch. MIT SMAC3 (2nd_place_medal26 · star 560) – Sequential Model-based Algorithm Configuration. BSD-3featuretools (2nd_place_medal25 · star 5.4K) – An open source python library for automated feature engineering. BSD-3Ax (2nd_place_medal25 · star 1.4K) – Adaptive Experimentation Platform. MIT Hyperas (2nd_place_medal23 · star 2.1K) – Keras + Hyperopt: A very simple wrapper for convenient.. MIT GPyOpt (2nd_place_medal23 · star 720) – Gaussian Process Optimization using GPy. BSD-3Talos (2nd_place_medal22 · star 1.4K) – Hyperparameter Optimization for TensorFlow, Keras and PyTorch. MIT Orion (2nd_place_medal22 · star 180) – Asynchronous Distributed Hyperparameter Optimization. BSD-3AdaNet (3rd_place_medal21 · star 3.2K · zzz) – Fast and flexible AutoML with learning guarantees. Apache-2 mljar-supervised (3rd_place_medal21 · star 950) – Automates Machine Learning Pipeline with Feature Engineering.. MITNeuraxle (3rd_place_medal21 · star 380) – A Sklearn-like Framework for Hyperparameter Tuning and AutoML in.. Apache-2lazypredict (3rd_place_medal20 · star 400) – Lazy Predict help build a lot of basic models without much code.. MIT optunity (3rd_place_medal20 · star 360 · zzz) – optimization routines for hyperparameter tuning. BSD-3Auto ViML (3rd_place_medal20 · star 220) – Automatically Build Multiple ML Models with a Single Line of Code… Apache-2Test Tube (3rd_place_medal19 · star 660 · zzz) – Python library to easily log experiments and parallelize.. MITDragonfly (3rd_place_medal17 · star 570 · zzz) – An open source python library for scalable Bayesian optimisation. MITHyperparameterHunter (3rd_place_medal16 · star 650) – Easy hyperparameter optimization and automatic result.. MITAlphaPy (3rd_place_medal16 · star 560) – Automated Machine Learning [AutoML] with Python, scikit-learn, Keras,.. Apache-2Parfit (3rd_place_medal15 · star 200 · zzz) – A package for parallelizing the fit and flexibly scoring of.. MIT ENAS (3rd_place_medal13 · star 2.4K · zzz) – PyTorch implementation of Efficient Neural Architecture Search via.. Apache-2Devol (3rd_place_medal11 · star 920 · zzz) – Genetic neural architecture search with Keras. MITShow 14 hidden projects…

Reinforcement Learning

Back to top

Libraries for building and evaluating reinforcement learning & agent-based systems.

OpenAI Gym (1st_place_medal35 · star 24K) – A toolkit for developing and comparing reinforcement learning.. MITDopamine (1st_place_medal27 · star 9.3K) – Dopamine is a research framework for fast prototyping of.. Apache-2 TensorLayer (1st_place_medal27 · star 6.5K) – Deep Learning and Reinforcement Learning Library for.. Apache-2 TF-Agents (1st_place_medal27 · star 1.8K) – TF-Agents: A reliable, scalable and easy to use TensorFlow.. Apache-2 TensorForce (2nd_place_medal25 · star 2.9K) – Tensorforce: a TensorFlow library for applied.. Apache-2 ViZDoom (2nd_place_medal25 · star 1.2K) – Doom-based AI Research Platform for Reinforcement Learning from Raw.. MITStable Baselines (2nd_place_medal24 · star 3K) – A fork of OpenAI Baselines, implementations of reinforcement.. MITAcme (3rd_place_medal23 · star 2K) – A library of reinforcement learning components and agents. Apache-2 garage (3rd_place_medal22 · star 1.1K) – A toolkit for reproducible reinforcement learning research. MIT ChainerRL (3rd_place_medal22 · star 930) – ChainerRL is a deep reinforcement learning library built on top of.. MITPARL (3rd_place_medal21 · star 1.9K) – A high-performance distributed training framework for Reinforcement.. Apache-2 TRFL (3rd_place_medal19 · star 3.1K · zzz) – TensorFlow Reinforcement Learning. Apache-2 Coach (3rd_place_medal19 · star 1.9K) – Reinforcement Learning Coach by Intel AI Lab enables easy.. Apache-2PFRL (3rd_place_medal19 · star 530) – PFRL: a PyTorch-based deep reinforcement learning library. MITReAgent (3rd_place_medal17 · star 2.8K) – A platform for Reasoning systems (Reinforcement Learning,.. BSD-3 RLax (3rd_place_medal17 · star 570) – A library of reinforcement learning building blocks in JAX. Apache-2 jaxShow 3 hidden projects…

Recommender Systems

Back to top

Libraries for building and evaluating recommendation systems.

lightfm (1st_place_medal27 · star 3.5K) – A Python implementation of LightFM, a hybrid recommendation algorithm. Apache-2implicit (1st_place_medal27 · star 2.3K) – Fast Python Collaborative Filtering for Implicit Feedback Datasets. MITscikit-surprise (2nd_place_medal26 · star 4.7K · zzz) – A Python scikit for building and analyzing recommender.. BSD-3TF Ranking (2nd_place_medal22 · star 2.1K) – Learning to Rank in TensorFlow. Apache-2 Cornac (2nd_place_medal22 · star 310) – A Comparative Framework for Multimodal Recommender Systems. Apache-2Recommenders (2nd_place_medal21 · star 9.3K) – Best Practices on Recommendation Systems. MITfastFM (3rd_place_medal20 · star 910 · zzz) – fastFM: A Library for Factorization Machines. BSD-3RecBole (3rd_place_medal20 · star 770) – A unified, comprehensive and efficient recommendation library. MIT TF Recommenders (3rd_place_medal19 · star 750) – TensorFlow Recommenders is a library for building.. Apache-2 recmetrics (3rd_place_medal18 · star 240) – A library of metrics for evaluating recommender systems. MITCase Recommender (3rd_place_medal16 · star 320 · zzz) – Case Recommender: A Flexible and Extensible Python.. MIT Show 3 hidden projects…

Privacy Machine Learning

Back to top

Libraries for encrypted and privacy-preserving machine learning using methods like federated learning & differential privacy.

PySyft (1st_place_medal26 · star 6.9K) – A library for answering questions using data you cannot see. Apache-2 Opacus (2nd_place_medal22 · star 760) – Training PyTorch models with differential privacy. Apache-2 FATE (2nd_place_medal20 · star 2.8K) – An Industrial Grade Federated Learning Framework. Apache-2TensorFlow Privacy (2nd_place_medal20 · star 1.4K) – Library for training machine learning models with.. Apache-2 TFEncrypted (2nd_place_medal20 · star 830 · zzz) – A Framework for Encrypted Machine Learning in TensorFlow. Apache-2 CrypTen (3rd_place_medal16 · star 730) – A framework for Privacy Preserving Machine Learning. MIT 

Workflow & Experiment Tracking

Back to top

Libraries to organize, track, and visualize machine learning experiments.

Tensorboard (1st_place_medal36 · star 5.2K) – TensorFlow’s Visualization Toolkit. Apache-2 mlflow (1st_place_medal32 · star 8.6K) – Open source platform for the machine learning lifecycle. Apache-2DVC (1st_place_medal30 · star 7.5K) – Data Version Control | Git for Data & Models. Apache-2wandb client (1st_place_medal30 · star 2.8K) – A tool for visualizing and tracking your machine learning.. MITSageMaker SDK (1st_place_medal30 · star 1.3K) – A library for training and deploying machine learning.. Apache-2  kaggle (2nd_place_medal29 · star 3.9K) – Official Kaggle API. Apache-2AzureML SDK (2nd_place_medal29 · star 2.2K) – Python notebooks with ML and deep learning examples with Azure.. MITsnakemake (2nd_place_medal29 · star 880) – This is the development home of the workflow management system.. MITtensorboardX (2nd_place_medal28 · star 6.8K) – tensorboard for pytorch (and chainer, mxnet, numpy, …). MITsacred (2nd_place_medal28 · star 3.3K) – Sacred is a tool to help you configure, organize, log and reproduce.. MITPyCaret (2nd_place_medal28 · star 3K) – An open-source, low-code machine learning library in Python. MITMetaflow (2nd_place_medal26 · star 4.2K) – Build and manage real-life data science projects with ease. Apache-2Catalyst (2nd_place_medal26 · star 2.5K) – Accelerated deep learning R&D. Apache-2 VisualDL (2nd_place_medal24 · star 3.9K) – Deep Learning Visualization Toolkit. Apache-2 ClearML (2nd_place_medal24 · star 2.2K) – ClearML – Auto-Magical Suite of tools to streamline your ML.. Apache-2TNT (2nd_place_medal24 · star 1.3K) – Simple tools for logging and visualizing, loading and training. BSD-3 livelossplot (2nd_place_medal24 · star 1K) – Live training loss plot in Jupyter Notebook for Keras, PyTorch.. MIT ml-metadata (2nd_place_medal24 · star 290) – For recording and retrieving metadata associated with ML.. Apache-2TensorWatch (3rd_place_medal22 · star 3K) – Debugging, monitoring and visualization for Python Machine Learning.. MITknockknock (3rd_place_medal22 · star 2K · zzz) – Knock Knock: Get notified when your training ends with only two.. MITlore (3rd_place_medal21 · star 1.5K · zzz) – Lore makes machine learning approachable for Software Engineers and.. MITGuild AI (3rd_place_medal21 · star 550) – Experiment tracking, ML developer tools. Apache-2Studio.ml (3rd_place_medal21 · star 370) – Studio: Simplify and expedite model building process. Apache-2quinn (3rd_place_medal21 · star 220) – pyspark methods to enhance developer productivity. Apache-2 hiddenlayer (3rd_place_medal20 · star 1.4K · zzz) – Neural network graphs and training metrics for.. MIT   Labml (3rd_place_medal20 · star 500) – Monitor deep learning model training and hardware usage from your mobile.. MITgokart (3rd_place_medal19 · star 170) – A wrapper of the data pipeline library luigi. MITaim (3rd_place_medal15 · star 880) – Aim a super-easy way to record, search and compare 1000s of ML training.. Apache-2Show 7 hidden projects…

Model Serialization & Conversion

Back to top

Libraries to serialize models to files, convert between a variety of model formats, and optimize models for deployment.

onnx (1st_place_medal33 · star 9.9K) – Open standard for machine learning interoperability. Apache-2Core ML Tools (1st_place_medal26 · star 2.1K) – Core ML tools contain supporting tools for Core ML model.. BSD-3TorchServe (2nd_place_medal24 · star 1.6K) – Model Serving on PyTorch. Apache-2 mmdnn (2nd_place_medal23 · star 5.3K · zzz) – MMdnn is a set of tools to help users inter-operate among different deep.. MITcortex (2nd_place_medal21 · star 7.4K) – Model serving at scale. Apache-2m2cgen (2nd_place_medal21 · star 1.8K) – Transform ML models into a native code (Java, C, Python, Go, JavaScript,.. MITHummingbird (3rd_place_medal20 · star 2.3K) – Hummingbird compiles trained ML models into tensor computation for.. MITpytorch2keras (3rd_place_medal18 · star 670 · zzz) – PyTorch to Keras model convertor. MITtfdeploy (3rd_place_medal16 · star 350) – Deploy tensorflow graphs for fast evaluation and export to.. BSD-3 Show 2 hidden projects…

Model Interpretability

Back to top

Libraries to visualize, explain, debug, evaluate, and interpret machine learning models.

shap (1st_place_medal34 · star 12K) – A game theoretic approach to explain the output of any machine learning model. MITLime (1st_place_medal29 · star 8.5K) – Lime: Explaining the predictions of any machine learning classifier. BSD-2pyLDAvis (1st_place_medal28 · star 1.4K) – Python library for interactive topic model visualization. Port of.. BSD-3 InterpretML (1st_place_medal27 · star 3.5K) – Fit interpretable models. Explain blackbox machine learning. MIT Model Analysis (1st_place_medal27 · star 1K) – Model analysis tools for TensorFlow. Apache-2  yellowbrick (2nd_place_medal25 · star 3.1K) – Visual analysis and diagnostic tools to facilitate machine.. Apache-2 Captum (2nd_place_medal25 · star 2.2K) – Model interpretability and understanding for PyTorch. BSD-3 dtreeviz (2nd_place_medal25 · star 1.4K) – A python library for decision tree visualization and model interpretation. MITFairness 360 (2nd_place_medal25 · star 1.2K) – A comprehensive set of fairness metrics for datasets and.. Apache-2arviz (2nd_place_medal25 · star 960) – Exploratory analysis of Bayesian models with Python. Apache-2Lucid (2nd_place_medal24 · star 4.1K) – A collection of infrastructure and tools for research in neural.. Apache-2 DoWhy (2nd_place_medal24 · star 2.7K) – DoWhy is a Python library for causal inference that supports explicit.. MITkeras-vis (2nd_place_medal23 · star 2.8K · zzz) – Neural network visualization toolkit for keras. MIT TreeInterpreter (2nd_place_medal23 · star 650) – Package for interpreting scikit-learn’s decision tree.. BSD-3 Alibi (2nd_place_medal22 · star 910) – Algorithms for monitoring and explaining machine learning models. Apache-2keract (2nd_place_medal22 · star 860) – Activation Maps (Layers Outputs) and Gradients in Keras. MIT random-forest-importances (2nd_place_medal22 · star 420) – Code to compute permutation and drop-column.. MIT Explainability 360 (3rd_place_medal21 · star 780) – Interpretability and explainability of data and machine.. Apache-2iNNvestigate (3rd_place_medal21 · star 780) – A toolbox to iNNvestigate neural networks’ predictions!. BSD-2 tf-explain (3rd_place_medal21 · star 780) – Interpretability Methods for tf.keras models with Tensorflow 2.x. MIT fairlearn (3rd_place_medal21 · star 710) – A Python package to assess and improve fairness of machine.. MIT aequitas (3rd_place_medal21 · star 360) – Bias and Fairness Audit Toolkit. MITexplainerdashboard (3rd_place_medal20 · star 370) – Quickly build Explainable AI dashboards that show the inner.. MITchecklist (3rd_place_medal19 · star 1.3K) – Beyond Accuracy: Behavioral Testing of NLP models with CheckList. MIT CausalNex (3rd_place_medal19 · star 1K) – A Python library that helps data scientists to infer.. Apache-2  deeplift (3rd_place_medal19 · star 510) – Public facing deeplift repo. MITWhat-If Tool (3rd_place_medal19 · star 460) – Source code/webpage/demos for the What-If Tool. Apache-2sklearn-evaluation (3rd_place_medal19 · star 290) – Machine learning model evaluation made easy: plots,.. MIT tcav (3rd_place_medal18 · star 440) – Code for the TCAV ML interpretability project. Apache-2 fairness-indicators (3rd_place_medal18 · star 180) – Tensorflow’s Fairness Evaluation and Visualization.. Apache-2  LIT (3rd_place_medal17 · star 2.4K) – The Language Interpretability Tool: Interactively analyze NLP models for.. Apache-2ExplainX.ai (3rd_place_medal17 · star 190) – Explainable AI framework for data scientists. Explain & debug any.. MITimodels (3rd_place_medal17 · star 190) – Interpretable ML package for concise, transparent, and accurate predictive.. MITDiCE (3rd_place_medal16 · star 480) – Generate Diverse Counterfactual Explanations for any machine.. MIT  LOFO (3rd_place_medal16 · star 310 · zzz) – Leave One Feature Out Importance. MITmodel-card-toolkit (3rd_place_medal16 · star 180) – a tool that leverages rich metadata and lineage.. Apache-2FlashTorch (3rd_place_medal15 · star 560 · zzz) – Visualization toolkit for neural networks in PyTorch! Demo –. MIT Anchor (3rd_place_medal14 · star 630) – Code for High-Precision Model-Agnostic Explanations paper. BSD-2Show 8 hidden projects…

Vector Similarity Search (ANN)

Back to top

Libraries for Approximate Nearest Neighbor Search and Vector Indexing/Similarity Search.

link ANN Benchmarks ( star 2.1K) – Benchmarks of approximate nearest neighbor libraries in Python.

Faiss (1st_place_medal29 · star 13K) – A library for efficient similarity search and clustering of dense vectors. MITAnnoy (1st_place_medal29 · star 8.2K) – Approximate Nearest Neighbors in C++/Python optimized for memory usage.. Apache-2NMSLIB (2nd_place_medal28 · star 2.3K) – Non-Metric Space Library (NMSLIB): An efficient similarity search.. Apache-2hnswlib (2nd_place_medal26 · star 1.4K) – Header-only C++/python library for fast approximate nearest neighbors. Apache-2Milvus (2nd_place_medal25 · star 5.3K) – An open source embedding vector similarity search engine powered by.. Apache-2PyNNDescent (2nd_place_medal25 · star 380) – A Python nearest neighbor descent for approximate nearest neighbors. BSD-2Magnitude (3rd_place_medal23 · star 1.4K · zzz) – A fast, efficient universal vector embedding utility package. MITNGT (3rd_place_medal19 · star 630) – Nearest Neighbor Search with Neighborhood Graph and Tree for High-.. Apache-2N2 (3rd_place_medal19 · star 460) – TOROS N2 – lightweight approximate Nearest Neighbor library which runs fast.. Apache-2Show 2 hidden projects…

Probabilistics & Statistics

Back to top

Libraries providing capabilities for probabilistic programming/reasoning, bayesian inference, gaussian processes, or statistics.

PyMC3 (1st_place_medal32 · star 5.6K) – Probabilistic Programming in Python: Bayesian Modeling and.. Apache-2tensorflow-probability (1st_place_medal31 · star 3.3K) – Probabilistic reasoning and statistical analysis in.. Apache-2 hmmlearn (1st_place_medal29 · star 2.2K) – Hidden Markov Models in Python, with scikit-learn like API. BSD-3 Pyro (2nd_place_medal28 · star 6.8K) – Deep universal probabilistic programming with Python and PyTorch. Apache-2 GPyTorch (2nd_place_medal28 · star 2.3K) – A highly efficient and modular implementation of Gaussian Processes.. MIT pomegranate (2nd_place_medal27 · star 2.6K) – Fast, flexible and easy to use probabilistic modelling in Python. MITfilterpy (2nd_place_medal27 · star 1.7K) – Python Kalman filtering and optimal estimation library. Implements.. MITGPflow (2nd_place_medal27 · star 1.4K) – Gaussian processes in TensorFlow. Apache-2 pgmpy (3rd_place_medal25 · star 1.7K) – Python Library for learning (Structure and Parameter) and inference.. MITSALib (3rd_place_medal24 · star 440) – Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris,.. MITbambi (3rd_place_medal20 · star 580) – BAyesian Model-Building Interface (Bambi) in Python. MITscikit-posthocs (3rd_place_medal20 · star 190) – Multiple Pairwise Comparisons (Post Hoc) Tests in Python. MIT Funsor (3rd_place_medal19 · star 160) – Functional tensors for probabilistic programming. Apache-2 pyhsmm (3rd_place_medal18 · star 480 · zzz) – Bayesian inference in HSMMs and HMMs. MITOrbit (3rd_place_medal18 · star 340) – A Python package for Bayesian forecasting with object-oriented design.. Apache-2Baal (3rd_place_medal17 · star 320) – Using approximate bayesian posteriors in deep nets for active learning. Apache-2Show 5 hidden projects…

Adversarial Robustness

Back to top

Libraries for testing the robustness of machine learning models against attacks with adversarial/malicious examples.

CleverHans (1st_place_medal27 · star 5K) – An adversarial example library for constructing attacks, building.. MIT Foolbox (1st_place_medal27 · star 1.8K) – A Python toolbox to create adversarial examples that fool neural networks.. MITART (2nd_place_medal23 · star 2.1K) – Adversarial Robustness Toolbox (ART) – Python Library for Machine Learning.. MITTextAttack (2nd_place_medal23 · star 1.3K) – TextAttack is a Python framework for adversarial attacks, data.. MITrobustness (3rd_place_medal18 · star 490) – A library for experimenting with, training and evaluating neural.. MITAdvBox (3rd_place_medal16 · star 1.1K · zzz) – Advbox is a toolbox to generate adversarial examples that fool.. Apache-2Show 2 hidden projects…

GPU Utilities

Back to top

Libraries that require and make use of CUDA/GPU system capabilities to optimize data handling and machine learning tasks.

CuPy (1st_place_medal31 · star 4.9K) – A NumPy-compatible array library accelerated by CUDA. MITgpustat (1st_place_medal26 · star 2.3K) – A simple command-line utility for querying and monitoring GPU status. MITPyCUDA (2nd_place_medal25 · star 1.1K · chart_with_downwards_trend) – CUDA integration for Python, plus shiny features. MITApex (2nd_place_medal23 · star 5.1K) – A PyTorch Extension: Tools for easy mixed precision and distributed.. BSD-3 ArrayFire (2nd_place_medal23 · star 3.3K) – ArrayFire: a general purpose GPU library. BSD-3scikit-cuda (2nd_place_medal23 · star 800) – Python interface to GPU-powered libraries. BSD-3cuDF (3rd_place_medal21 · star 3.7K) – cuDF – GPU DataFrame Library. Apache-2py3nvml (3rd_place_medal21 · star 170 · zzz) – Python 3 Bindings for NVML library. Get NVIDIA GPU status inside.. BSD-3DALI (3rd_place_medal20 · star 3.1K) – A library containing both highly optimized building blocks and an.. Apache-2cuML (3rd_place_medal19 · star 2K) – cuML – RAPIDS Machine Learning Library. Apache-2BlazingSQL (3rd_place_medal17 · star 1.4K) – BlazingSQL is a lightweight, GPU accelerated, SQL engine for.. Apache-2Vulkan Kompute (3rd_place_medal17 · star 350) – General purpose GPU compute framework for cross vendor.. Apache-2cuGraph (3rd_place_medal16 · star 670) – cuGraph – RAPIDS Graph Analytics Library. Apache-2cuSignal (3rd_place_medal15 · star 460) – GPU accelerated signal processing. Apache-2Show 4 hidden projects…

Tensorflow Utilities

Back to top

Libraries that extend TensorFlow with additional capabilities.

tensorflow-hub (1st_place_medal32 · star 2.8K) – A library for transfer learning by reusing parts of.. Apache-2 tensor2tensor (1st_place_medal31 · star 11K) – Library of deep learning models and datasets designed to.. Apache-2 TF Addons (1st_place_medal31 · star 1.2K) – Useful extra functionality for TensorFlow 2.x maintained by.. Apache-2 TensorFlow Transform (2nd_place_medal29 · star 860) – Input pipeline framework. Apache-2 TensorFlow I/O (2nd_place_medal26 · star 420) – Dataset, streaming, and file system extensions.. Apache-2 TF Model Optimization (3rd_place_medal25 · star 980) – A toolkit to optimize ML models for deployment for.. Apache-2 efficientnet (3rd_place_medal23 · star 1.7K) – Implementation of EfficientNet model. Keras and.. Apache-2 TensorFlow Cloud (3rd_place_medal22 · star 230) – The TensorFlow Cloud repository provides APIs that.. Apache-2 Neural Structured Learning (3rd_place_medal21 · star 790) – Training neural models with structured signals. Apache-2 TensorNets (3rd_place_medal19 · star 980) – High level network definitions with pre-trained weights in.. MIT tffm (3rd_place_medal18 · star 760 · zzz) – TensorFlow implementation of an arbitrary order Factorization Machine. MIT TF Compression (3rd_place_medal18 · star 450) – Data compression in TensorFlow. Apache-2 Saliency (3rd_place_medal17 · star 640) – TensorFlow implementation for SmoothGrad, Grad-CAM, Guided.. Apache-2 

Sklearn Utilities

Back to top

Libraries that extend scikit-learn with additional capabilities.

imbalanced-learn (1st_place_medal31 · star 5.1K) – A Python Package to Tackle the Curse of Imbalanced.. MIT MLxtend (1st_place_medal30 · star 3.4K) – A library of extension and helper modules for Python’s data.. BSD-3 category_encoders (2nd_place_medal24 · star 1.6K · zzz) – A library of sklearn compatible categorical variable.. BSD-3 sklearn-contrib-lightning (2nd_place_medal24 · star 1.4K) – Large-scale linear classification, regression and.. BSD-3 scikit-opt (2nd_place_medal22 · star 2K) – Genetic Algorithm, Particle Swarm Optimization, Simulated.. MIT fancyimpute (2nd_place_medal22 · star 940) – Multivariate imputation and matrix completion algorithms.. Apache-2 combo (2nd_place_medal22 · star 480) – (AAAI’ 20) A Python Toolbox for Machine Learning Model.. BSD-2  xgboostscikit-lego (3rd_place_medal20 · star 440) – Extra blocks for scikit-learn pipelines. MIT DESlib (3rd_place_medal20 · star 320) – A Python library for dynamic classifier and ensemble selection. BSD-3 iterative-stratification (3rd_place_medal19 · star 530) – scikit-learn cross validators for iterative.. BSD-3 scikit-tda (3rd_place_medal19 · star 270) – Topological Data Analysis for Python. MIT skggm (3rd_place_medal16 · star 180) – Scikit-learn compatible estimation of general graphical models. MIT Show 5 hidden projects…

Pytorch Utilities

Back to top

Libraries that extend Pytorch with additional capabilities.

pretrainedmodels (1st_place_medal27 · star 7.8K · zzz) – Pretrained ConvNets for pytorch: NASNet, ResNeXt,.. BSD-3 pytorch-summary (1st_place_medal25 · star 3K · zzz) – Model summary in PyTorch similar to `model.summary()` in.. MIT pytorch-optimizer (1st_place_medal25 · star 1.7K) – torch-optimizer — collection of optimizers for.. Apache-2 EfficientNet-PyTorch (2nd_place_medal24 · star 5.5K) – A PyTorch implementation of EfficientNet. Apache-2 torchdiffeq (2nd_place_medal24 · star 3.4K) – Differentiable ODE solvers with full GPU support and.. MIT PML (2nd_place_medal24 · star 2.8K) – The easiest way to use deep metric learning in your application. Modular,.. MIT SRU (2nd_place_medal23 · star 1.9K) – Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755). MIT Torchmeta (2nd_place_medal21 · star 1.2K) – A collection of extensions and data-loaders for few-shot learning.. MIT torch-scatter (2nd_place_medal21 · star 610) – PyTorch Extension Library of Optimized Scatter Operations. MIT PyTorch Sparse (2nd_place_medal21 · star 360) – PyTorch Extension Library of Optimized Autograd Sparse.. MIT reformer-pytorch (2nd_place_medal20 · star 1.4K) – Reformer, the efficient Transformer, in Pytorch. MIT EfficientNets (2nd_place_medal20 · star 1.3K) – Pretrained EfficientNet, EfficientNet-Lite, MixNet,.. Apache-2 Higher (2nd_place_medal20 · star 1.1K) – higher is a pytorch library allowing users to obtain higher.. Apache-2 TabNet (2nd_place_medal20 · star 860) – PyTorch implementation of TabNet paper :.. MIT Pytorch Toolbelt (3rd_place_medal19 · star 940) – PyTorch extensions for fast R&D prototyping and Kaggle.. MIT Performer Pytorch (3rd_place_medal17 · star 540 · hatching_chick) – An implementation of Performer, a linear attention-.. MIT Tensor Sensor (3rd_place_medal17 · star 530) – The goal of this library is to generate more helpful.. MIT tinygrad (3rd_place_medal15 · star 4.1K · hatching_chick) – You like pytorch? You like micrograd? You love tinygrad!. MIT Lambda Networks (3rd_place_medal15 · star 1.4K · hatching_chick) – Implementation of LambdaNetworks, a new approach to.. MIT Torch-Struct (3rd_place_medal15 · star 910) – Fast, general, and tested differentiable structured prediction.. MIT torchsde (3rd_place_medal15 · star 680) – Differentiable SDE solvers with GPU support and efficient.. Apache-2 Pywick (3rd_place_medal15 · star 320) – High-level batteries-included neural network training library for.. MIT Tez (3rd_place_medal14 · star 580 · hatching_chick) – Tez is a super-simple and lightweight Trainer for PyTorch. It.. Apache-2 micrograd (3rd_place_medal12 · star 1.6K · zzz) – A tiny scalar-valued autograd engine and a neural net library.. MIT Show 3 hidden projects…

Database Clients

Back to top

Libraries for connecting to, operating, and querying databases.

link best-of-python – DB Clients ( star 1.5K · hatching_chick) – Collection of database clients for python.

Others

Back to top

scipy (1st_place_medal40 · star 8K) – Ecosystem of open-source software for mathematics, science, and engineering. BSD-3SymPy (1st_place_medal36 · star 7.9K) – A computer algebra system written in pure Python. BSD-3Autograd (1st_place_medal30 · star 5.2K) – Efficiently computes derivatives of numpy code. MIThdbscan (1st_place_medal29 · star 1.8K) – A high performance implementation of HDBSCAN clustering. BSD-3 PyOD (1st_place_medal28 · star 4.2K) – (JMLR’19) A Python Toolbox for Scalable Outlier Detection (Anomaly.. BSD-2Keras-Preprocessing (1st_place_medal28 · star 920) – Utilities for working with image data, text data, and.. MIT Cython BLIS (1st_place_medal28 · star 160) – Fast matrix-multiplication as a self-contained Python library no.. BSD-3Streamlit (2nd_place_medal27 · star 14K) – Streamlit The fastest way to build data apps in Python. Apache-2carla (2nd_place_medal26 · star 5.7K) – Open-source simulator for autonomous driving research. MITDatasette (2nd_place_medal26 · star 4.8K) – An open source multi-tool for exploring and publishing data. Apache-2DeepChem (2nd_place_medal26 · star 2.8K) – Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry,.. MIT agate (2nd_place_medal26 · star 1K) – A Python data analysis library that is optimized for humans instead of machines. MITpyclustering (2nd_place_medal26 · star 800) – pyclustring is a Python, C++ data mining library. BSD-3Trax (2nd_place_medal25 · star 5.9K) – Trax Deep Learning with Clear Code and Speed. Apache-2causalml (2nd_place_medal25 · star 1.8K) – Uplift modeling and causal inference with machine learning.. Apache-2Pythran (2nd_place_medal25 · star 1.5K) – Ahead of Time compiler for numeric kernels. BSD-3TabPy (2nd_place_medal25 · star 1K) – Execute Python code on the fly and display results in Tableau visualizations:. MITkmodes (2nd_place_medal25 · star 820) – Python implementations of the k-modes and k-prototypes clustering.. MITmetric-learn (2nd_place_medal24 · star 1.1K · zzz) – Metric learning algorithms in Python. MIT PennyLane (2nd_place_medal24 · star 800) – PennyLane is a cross-platform Python library for differentiable.. Apache-2pyopencl (2nd_place_medal24 · star 790 · chart_with_downwards_trend) – OpenCL integration for Python, plus shiny features. MITPySwarms (2nd_place_medal24 · star 740) – A research toolkit for particle swarm optimization in Python. MITpyjanitor (2nd_place_medal24 · star 640) – Clean APIs for data cleaning. Python implementation of R package Janitor. MITfindspark (2nd_place_medal24 · star 390 · zzz) – Find pyspark to make it importable. BSD-3 datalad (2nd_place_medal24 · star 230) – Keep code, data, containers under control with git and git-annex. MITGradio (3rd_place_medal23 · star 2.1K) – Wrap UIs around any model, share with anyone. Apache-2modAL (3rd_place_medal23 · star 1.1K) – A modular active learning framework for Python. MIT PaddleHub (3rd_place_medal22 · star 4.7K) – Awesome pre-trained models toolkit based on.. Apache-2 pycm (3rd_place_medal22 · star 1.1K) – Multi-class confusion matrix library in Python. MITPrince (3rd_place_medal22 · star 590) – Python factor analysis library (PCA, CA, MCA, MFA, FAMD). MIT SUOD (3rd_place_medal22 · star 240) – (MLSys’ 21) An Acceleration System for Large-scare Unsupervised.. BSD-2Mars (3rd_place_medal21 · star 2.1K) – Mars is a tensor-based unified framework for large-scale data.. Apache-2tensorly (3rd_place_medal21 · star 970) – TensorLy: Tensor Learning in Python. BSD-2StreamAlert (3rd_place_medal20 · star 2.5K) – StreamAlert is a serverless, realtime data analysis framework.. Apache-2AstroML (3rd_place_medal20 · star 730) – Machine learning, statistics, and data mining for astronomy and.. BSD-2 alibi-detect (3rd_place_medal20 · star 600) – Algorithms for outlier and adversarial instance detection,.. Apache-2baikal (3rd_place_medal20 · star 570) – A graph-based functional API for building complex scikit-learn pipelines. BSD-3BioPandas (3rd_place_medal20 · star 330) – Working with molecular structures in pandas DataFrames. BSD-3 scikit-rebate (3rd_place_medal20 · star 310) – A scikit-learn-compatible Python implementation of ReBATE, a.. MIT rrcf (3rd_place_medal20 · star 290 · zzz) – Implementation of the Robust Random Cut Forest algorithm for anomaly.. MITFeature Engine (3rd_place_medal19 · star 470) – Feature engineering package with sklearn like functionality. BSD-3apricot (3rd_place_medal18 · star 310) – apricot implements submodular optimization for the purpose of selecting.. MITRiver (3rd_place_medal17 · star 1.4K) – Online machine learning in Python. BSD-3traingenerator (3rd_place_medal10 · star 940 · hatching_chick) – A web app to generate template code for machine learning. MITShow 8 hidden projects…

Related Resources

3. Machine Learning – TensorFlow – Part III

https://www.tensorflow.org/  

 

 

4. Artificial Intelligence (AI) – Part I

Reproduced from GitHub https://github.com/

A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.

Contents

  1. Courses
  2. Books
  3. Programming
  4. Philosophy
  5. Free Content
  6. Code
  7. Videos
  8. Learning
  9. Organizations
  10. Journals
  11. Competitions
  12. Newsletters
  13. Misc

Courses

  • MIT: Intro to Deep Learning – A seven day bootcamp designed in MIT to introduce deep learning methods and applications
  • Deep Blueberry: Deep Learning book – A free five-weekend plan to self-learners to learn the basics of deep-learning architectures like CNNs, LSTMs, RNNs, VAEs, GANs, DQN, A3C and more
  • Spinning Up in Deep Reinforcement Learning – A free deep reinforcement learning course by OpenAI
  • MIT Artifical Intelligence Videos – MIT AI Course
  • Grokking Deep Learning in Motion – Beginner’s course to learn deep learning and neural networks without frameworks.
  • Intro to Artificial Intelligence – Learn the Fundamentals of AI. Course run by Peter Norvig
  • EdX Artificial Intelligence – The course will introduce the basic ideas and techniques underlying the design of intelligent computer systems
  • Artificial Intelligence For Robotics – This class will teach you basic methods in Artificial Intelligence, including: probabilistic inference, planning and search, localization, tracking and control, all with a focus on robotics
  • Machine Learning – Basic machine learning algorithms for supervised and unsupervised learning
  • Neural Networks For Machine Learning – Algorithmic and practical tricks for artifical neural networks.
  • Deep Learning – An Introductory course to the world of Deep Learning.
  • Stanford Statistical Learning – Introductory course on machine learning focusing on: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines.
  • Knowledge Based Artificial Intelligence – Georgia Tech’s course on Artificial Intelligence focussing on Symbolic AI.
  • Deep RL Bootcamp Lectures – Deep Reinforcement Bootcamp Lectures – August 2017
  • Machine Learning Crash Course By Google Machine Learning Crash Course features a series of lessons with video lectures, real-world case studies, and hands-on practice exercises.
  • Python Class By Google This is a free class for people with a little bit of programming experience who want to learn Python. The class includes written materials, lecture videos, and lots of code exercises to practice Python coding.
  • Deep Learning Crash Course In this liveVideo course, machine learning expert Oliver Zeigermann teaches you the basics of deep learning.
  • Artificial Intelligence: A Modern Approach – Stuart Russell & Peter Norvig
    • Also consider browsing the list of recommended reading, divided by each chapter in “Artificial Intelligence: A Modern Approach”.
  • Paradigms Of Artificial Intelligence Programming: Case Studies in Common Lisp – Paradigms of AI Programming is the first text to teach advanced Common Lisp techniques in the context of building major AI systems
  • Reinforcement Learning: An Introduction – This introductory textbook on reinforcement learning is targeted toward engineers and scientists in artificial intelligence, operations research, neural networks, and control systems, and we hope it will also be of interest to psychologists and neuroscientists.
  • The Cambridge Handbook Of Artificial Intelligence – Written for non-specialists, it covers the discipline’s foundations, major theories, and principal research areas, plus related topics such as artificial life
  • The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind – In this mind-expanding book, scientific pioneer Marvin Minsky continues his groundbreaking research, offering a fascinating new model for how our minds work
  • Artificial Intelligence: A New Synthesis – Beginning with elementary reactive agents, Nilsson gradually increases their cognitive horsepower to illustrate the most important and lasting ideas in AI
  • On Intelligence – Hawkins develops a powerful theory of how the human brain works, explaining why computers are not intelligent and how, based on this new theory, we can finally build intelligent machines. Also audio version available from audible.com
  • How To Create A Mind – Kurzweil discusses how the brain works, how the mind emerges, brain-computer interfaces, and the implications of vastly increasing the powers of our intelligence to address the world’s problems
  • Deep Learning – Goodfellow, Bengio and Courville’s introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction – Hastie and Tibshirani cover a broad range of topics, from supervised learning (prediction) to unsupervised learning including neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book.
  • Deep Learning and the Game of Go – Deep Learning and the Game of Go teaches you how to apply the power of deep learning to complex human-flavored reasoning tasks by building a Go-playing AI. After exposing you to the foundations of machine and deep learning, you’ll use Python to build a bot and then teach it the rules of the game.
  • Deep Learning for Search – Deep Learning for Search teaches you how to leverage neural networks, NLP, and deep learning techniques to improve search performance.
  • Deep Learning with PyTorch – PyTorch puts these superpowers in your hands, providing a comfortable Python experience that gets you started quickly and then grows with you as you—and your deep learning skills—become more sophisticated. Deep Learning with PyTorch will make that journey engaging and fun.
  • Deep Reinforcement Learning in Action – Deep Reinforcement Learning in Action teaches you the fundamental concepts and terminology of deep reinforcement learning, along with the practical skills and techniques you’ll need to implement it into your own projects.
  • Grokking Deep Reinforcement Learning – Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching.
  • Fusion in Action – Fusion in Action teaches you to build a full-featured data analytics pipeline, including document and data search and distributed data clustering.
  • Real-World Natural Language Processing – Early access book on how to create practical NLP applications using Python.
  • Grokking Machine Learning – Early access book that introduces the most valuable machine learning techniques.
  • Succeeding with AI – An introduction to managing successful AI projects and applying AI to real-life situations.
  • Elements of AI (Part 1) – Reaktor/University of Helsinki – An Introduction to AI is a free online course for everyone interested in learning what AI is, what is possible (and not possible) with AI, and how it affects our lives – with no complicated math or programming required.
  • Essential Natural Language Processing – A hands-on guide to NLP with practical techniques, numerous Python-based examples and real-world case studies.
  • Kaggle’s micro courses – A series of micro courses by offering practical and hands-on knowledge ranging from Python to Deep Learning.
  • Transfer Learning for Natural Language Processing – A book that gets you up to speed with the relevant ML concepts and then dives into transfer learning for NLP.
  • (Stanford Deep Learning Series][https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb]
  • Amazon Machine Learning Developer Guide – A book for ML developers which itroduces the ML concepts & strategies with lots of practical usages.
  • Machine Learning for Humans – A series of simple, plain-English explanations accompanied by math, code, and real-world examples.

Books

  • Machine Learning for Mortals (Mere and Otherwise) – Early access book that provides basics of machine learning and using R programming language.
  • How Machine Learning Works – Mostafa Samir. Early access book that introduces machine learning from both practical and theoretical aspects in a non-threating way.
  • MachineLearningWithTensorFlow2ed – a book on general purpose machine learning techniques regression, classification, unsupervised clustering, reinforcement learning, auto encoders, convolutional neural networks, RNNs, LSTMs, using TensorFlow 1.14.1.
  • Serverless Machine Learning – a book for machine learning engineers on how to train and deploy machine learning systems on public clouds like AWS, Azure, and GCP, using a code-oriented approach.
  • The Hundred-Page Machine Learning Book – all you need to know about Machine Learning in a hundred pages, supervised and unsupervised learning, SVM, neural networks, ensemble methods, gradient descent, cluster analysis and dimensionality reduction, autoencoders and transfer learning, feature engineering and hyperparameter tuning.

Programming

Philosophy

  • Super Intelligence – Superintelligence asks the questions: What happens when machines surpass humans in general intelligence. A really great book.
  • Our Final Invention: Artificial Intelligence And The End Of The Human Era – Our Final Invention explores the perils of the heedless pursuit of advanced AI. Until now, human intelligence has had no rival. Can we coexist with beings whose intelligence dwarfs our own? And will they allow us to?
  • How to Create a Mind: The Secret of Human Thought Revealed – Ray Kurzweil, director of engineering at Google, explored the process of reverse-engineering the brain to understand precisely how it works, then applies that knowledge to create vastly intelligent machines.
  • Minds, Brains, And Programs – The 1980 paper by philospher John Searle that contains the famous ‘Chinese Room’ thought experiment. Probably the most famous attack on the notion of a Strong AI possessing a ‘mind’ or a ‘consciousness’, and interesting reading for those interested in the intersection of AI and philosophy of mind.
  • Gödel, Escher, Bach: An Eternal Golden Braid – Written by Douglas Hofstadter and taglined “a metaphorical fugue on minds and machines in the spirit of Lewis Carroll”, this wonderful journey into the the fundamental concepts of mathematics,symmetry and intelligence won a Pulitzer Price for Non-Fiction in 1979. A major theme throughout is the emergence of meaning from seemingly ‘meaningless’ elements, like 1’s and 0’s, arranged in special patterns.
  • Life 3.0: Being Human in the Age of Artificial Intelligence – Max Tegmark, professor of Physics at MIT, discusses how Artificial Intelligence may affect crime, war, justice, jobs, society and our very sense of being human both in the near and far future.

Free Content

  • Foundations Of Computational Agents – This book is published by Cambridge University Press, 2010
  • The Quest For Artificial Intelligence – This book traces the history of the subject, from the early dreams of eighteenth-century (and earlier) pioneers to the more successful work of today’s AI engineers.
  • Stanford CS229 – Machine Learning – This course provides a broad introduction to machine learning and statistical pattern recognition.
  • Computers and Thought: A practical Introduction to Artificial Intelligence – The book covers computer simulation of human activities, such as problem solving and natural language understanding; computer vision; AI tools and techniques; an introduction to AI programming; symbolic and neural network models of cognition; the nature of mind and intelligence; and the social implications of AI and cognitive science.
  • Society of Mind – Marvin Minsky’s seminal work on how our mind works. Lot of Symbolic AI concepts have been derived from this basis.
  • Artificial Intelligence and Molecular Biology – The current volume is an effort to bridge that range of exploration, from nucleotide to abstract concept, in contemporary AI/MB research.
  • Brief Introduction To Educational Implications Of Artificial Intelligence – This book is designed to help preservice and inservice teachers learn about some of the educational implications of current uses of Artificial Intelligence as an aid to solving problems and accomplishing tasks.
  • Encyclopedia: Computational intelligence – Scholarpedia is a peer-reviewed open-access encyclopedia written and maintained by scholarly experts from around the world.
  • Ethical Artificial Intelligence – a book by Bill Hibbard that combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence.
  • Golden Artificial Intelligence – a cluster of pages on artificial intelligence and machine learning.
  • R2D3 – A website with explanations on topics from Machine Learning to Statistics. All helped with beautiful animated infographics and real life examples. Available in various languages.

Code

  • ExplainX– ExplainX is a fast, light-weight, and scalable explainable AI framework for data scientists to explain any black-box model to business stakeholders.
  • AIMACode – Source code for “Artificial Intelligence: A Modern Approach” in Common Lisp, Java, Python. More to come.
  • FANN – Fast Artificial Neural Network Library, native for C
  • FARGonautica – Source code of Douglas Hosftadter’s Fluid Concepts and Creative Analogies Ph.D. projects.

Videos

Learning

Organizations

Journals

Competitions

Newsletters

  • AI Digest. A weekly newsletter to keep up to date with AI, machine learning, and data science. Archive.

Misc

5. Artificial Intelligence (AI) – Part II

A curated list of awesome awesomeness about artificial intelligence(AI).

Table of Contents

Artificial Intelligence(AI)

Machine Learning(ML)

Deep Learning(DL)

Computer Vision(CV)

Natural Language Processing(NLP)

Speech Recognition

Other Research Topics

Programming Languages

Framework

Datasets

AI Career

 

 

6. Artificial Intelligence (AI) – Part III

 

A curated list of artificial intelligence resources (Courses, Tools, App, Open Source Project)

Contents

  1. Courses & Articles