AI, ML & Big Data

1. Machine Learning – Part I

Reproduced from GitHub https://github.com/

A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php

Further resources :

For a list of free machine learning books available for download, go here.
For a list of professional machine learning events, go here.
For a list of (mostly) free machine learning courses available online, go here.
For a list of blogs and newsletters on data science and machine learning, go here.
For a list of free-to-attend meetups and local events, go here.

Frameworks and Libraries

Awesome Machine Learning

Tools

Credits

APL

General-Purpose Machine Learning

naive-apl – Naive Bayesian Classifier implementation in APL. [Deprecated]

C

General-Purpose Machine Learning

Darknet – Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.
Recommender – A C library for product recommendations/suggestions using collaborative filtering (CF).
Hybrid Recommender System – A hybrid recommender system based upon scikit-learn algorithms. [Deprecated]
neonrvm – neonrvm is an open source machine learning library based on RVM technique. It’s written in C programming language and comes with Python programming language bindings.
cONNXr – An ONNX runtime written in pure C (99) with zero dependencies focused on small embedded devices. Run inference on your machine learning models no matter which framework you train it with. Easy to install and compiles everywhere, even in very old devices.
libonnx – A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Computer Vision

CCV – C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.
VLFeat – VLFeat is an open and portable library of computer vision algorithms, which has a Matlab toolbox.

C++

Computer Vision

DLib – DLib has C++ and Python interfaces for face detection and training general object detectors.
EBLearn – Eblearn is an object-oriented C++ library that implements various machine learning models [Deprecated]
OpenCV – OpenCV has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
VIGRA – VIGRA is a genertic cross-platform C++ computer vision and machine learning library for volumes of arbitrary dimensionality with Python bindings.
Openpose – A real-time multi-person keypoint detection library for body, face, hands, and foot estimation

General-Purpose Machine Learning

BanditLib – A simple Multi-armed Bandit library. [Deprecated]
Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind. [DEEP LEARNING]
CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box. It is easy to install, contains fast inference implementation and supports CPU and GPU (even multi-GPU) computation.
CNTK – The Computational Network Toolkit (CNTK) by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph.
CUDA – This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
DeepDetect – A machine learning API and server written in C++11. It makes state of the art machine learning easy to work with and integrate into existing applications.
Distributed Machine learning Tool Kit (DMTK) – A distributed machine learning (parameter server) framework by Microsoft. Enables training models on large data sets across multiple machines. Current tools bundled with it include: LightLDA and Distributed (Multisense) Word Embedding.
DLib – A suite of ML tools designed to be easy to imbed in other applications.
DSSTNE – A software library created by Amazon for training and deploying deep neural networks using GPUs which emphasizes speed and scale over experimental flexibility.
DyNet – A dynamic neural network library working well with networks that have dynamic structures that change for every training instance. Written in C++ with bindings in Python.
Fido – A highly-modular C++ machine learning library for embedded electronics and robotics.
igraph – General purpose graph library.
Intel(R) DAAL – A high performance software library developed by Intel and optimized for Intel’s architectures. Library provides algorithmic building blocks for all stages of data analytics and allows to process data in batch, online and distributed modes.
LightGBM – Microsoft’s fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
libfm – A generic approach that allows to mimic most factorization models by feature engineering.
MLDB – The Machine Learning Database is a database designed for machine learning. Send it commands over a RESTful API to store data, explore it using SQL, then train machine learning models and expose them as APIs.
mlpack – A scalable C++ machine learning library.
MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
ParaMonte – A general-purpose library with C/C++ interface for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
proNet-core – A general-purpose network embedding framework: pair-wise representations optimization Network Edit.
PyCUDA – Python interface to CUDA
ROOT – A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualization and storage.
shark – A fast, modular, feature-rich open-source C++ machine learning library.
Shogun – The Shogun Machine Learning Toolbox.
sofia-ml – Suite of fast incremental algorithms.
Stan – A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling.
Timbl – A software package/C++ library implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification, and IGTree, a decision-tree approximation of IB1-IG. Commonly used for NLP.
Vowpal Wabbit (VW) – A fast out-of-core learning system.
Warp-CTC – A fast parallel implementation of Connectionist Temporal Classification (CTC), on both CPU and GPU.
XGBoost – A parallelized optimized general purpose gradient boosting library.
ThunderGBM – A fast library for GBDTs and Random Forests on GPUs.
ThunderSVM – A fast SVM library on GPUs and CPUs.
LKYDeepNN – A header-only C++11 Neural Network library. Low dependency, native traditional chinese document.
xLearn – A high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertising and recommender systems.
Featuretools – A library for automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning using reusable feature engineering “primitives”.
skynet – A library for learning neural networks, has C-interface, net set in JSON. Written in C++ with bindings in Python, C++ and C#.
Feast – A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.
Hopsworks – A data-intensive platform for AI with the industry’s first open-source feature store. The Hopsworks Feature Store provides both a feature warehouse for training and batch based on Apache Hive and a feature serving database, based on MySQL Cluster, for online applications.
Polyaxon – A platform for reproducible and scalable machine learning and deep learning.

Natural Language Processing

BLLIP Parser – BLLIP Natural Language Parser (also known as the Charniak-Johnson parser).
colibri-core – C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
CRF++ – Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks. [Deprecated]
CRFsuite – CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data. [Deprecated]
frog – Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer.
libfolia – C++ library for the FoLiA format
MeTA – MeTA : ModErn Text Analysis is a C++ Data Sciences Toolkit that facilitates mining big text data.
MIT Information Extraction Toolkit – C, C++, and Python tools for named entity recognition and relation extraction
ucto – Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format.

Speech Recognition

Kaldi – Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

Sequence Analysis

ToPS – This is an object-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet. [Deprecated]

Gesture Detection

grt – The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition.

Common Lisp

General-Purpose Machine Learning

mgl – Neural networks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes.
mgl-gpr – Evolutionary algorithms. [Deprecated]
cl-libsvm – Wrapper for the libsvm support vector machine library. [Deprecated]
cl-online-learning – Online learning algorithms (Perceptron, AROW, SCW, Logistic Regression).
cl-random-forest – Implementation of Random Forest in Common Lisp.

Clojure

Natural Language Processing

Clojure-openNLP – Natural Language Processing in Clojure (opennlp).
Infections-clj – Rails-like inflection library for Clojure and ClojureScript.

General-Purpose Machine Learning

tech.ml – A machine learning platform based on tech.ml.dataset, supporting not just ml algorithms, but also relevant ETL processing; wraps multiple machine learning libraries
clj-ml – A machine learning library for Clojure built on top of Weka and friends.
clj-boost – Wrapper for XGBoost
Touchstone – Clojure A/B testing library.
Clojush – The Push programming language and the PushGP genetic programming system implemented in Clojure.
lambda-ml – Simple, concise implementations of machine learning techniques and utilities in Clojure.
Infer – Inference and machine learning in Clojure. [Deprecated]
Encog – Clojure wrapper for Encog (v3) (Machine-Learning framework that specializes in neural-nets). [Deprecated]
Fungp – A genetic programming library for Clojure. [Deprecated]
Statistiker – Basic Machine Learning algorithms in Clojure. [Deprecated]
clortex – General Machine Learning library using Numenta’s Cortical Learning Algorithm. [Deprecated]
comportex – Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm. [Deprecated]

Deep Learning

MXNet – Bindings to Apache MXNet – part of the MXNet project
Deep Diamond – A fast Clojure Tensor & Deep Learning library
jutsu.ai – Clojure wrapper for deeplearning4j with some added syntactic sugar.
cortex – Neural networks, regression and feature learning in Clojure.
Flare – Dynamic Tensor Graph library in Clojure (think PyTorch, DynNet, etc.)
dl4clj – Clojure wrapper for Deeplearning4j.

Data Analysis

tech.ml.dataset – Clojure dataframe library and pipeline for data processing and machine learning
Tablecloth – A dataframe grammar wrapping tech.ml.dataset, inspired by several R libraries
Panthera – Clojure API wrapping Python’s Pandas library
Incanter – Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
PigPen – Map-Reduce for Clojure.
Geni – a Clojure dataframe library that runs on Apache Spark

Data Visualization

Hanami : Clojure(Script) library and framework for creating interactive visualization applications based in Vega-Lite (VGL) and/or Vega (VG) specifications. Automatic framing and layouts along with a powerful templating system for abstracting visualization specs
Saite – Clojure(Script) client/server application for dynamic interactive explorations and the creation of live shareable documents capturing them using Vega/Vega-Lite, CodeMirror, markdown, and LaTeX
Oz – Data visualisation using Vega/Vega-Lite and Hiccup, and a live-reload platform for literate-programming
Envision – Clojure Data Visualisation library, based on Statistiker and D3.
Pink Gorilla Notebook – A Clojure/Clojurescript notebook application/-library based on Gorilla-REPL
clojupyter – A Jupyter kernel for Clojure – run Clojure code in Jupyter Lab, Notebook and Console.
notespace – Notebook experience in your Clojure namespace
Delight – A listener that streams your spark events logs to delight, a free and improved spark UI

Interop

Java Interop – Clojure has Native Java Interop from which Java’s ML ecosystem can be accessed
JavaScript Interop – ClojureScript has Native JavaScript Interop from which JavaScript’s ML ecosystem can be accessed
Libpython-clj – Interop with Python
ClojisR – Interop with R and Renjin (R on the JVM)

Misc

Neanderthal – Fast Clojure Matrix Library (native CPU, GPU, OpenCL, CUDA)
kixistats – A library of statistical distribution sampling and transducing functions
fastmath – A collection of functions for mathematical and statistical computing, macine learning, etc., wrapping several JVM libraries
matlib – a Clojure library of optimisation and control theory tools and convenience functions based on Neanderthal.

Extra

Scicloj – Curated list of ML related resources for Clojure.

Crystal

General-Purpose Machine Learning

machine – Simple machine learning algorithm.
crystal-fann – FANN (Fast Artificial Neural Network) binding.

Elixir

General-Purpose Machine Learning

Simple Bayes – A Simple Bayes / Naive Bayes implementation in Elixir.
emel – A simple and functional machine learning library written in Elixir.
Tensorflex – Tensorflow bindings for the Elixir programming language.

Natural Language Processing

Stemmer – An English (Porter2) stemming implementation in Elixir.

Erlang

General-Purpose Machine Learning

Disco – Map Reduce in Erlang. [Deprecated]

Fortran

General-Purpose Machine Learning

neural-fortran – A parallel neural net microframework. Read the paper here.

Data Analysis / Data Visualization

ParaMonte – A general-purpose Fortran library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.

Go

Natural Language Processing

snowball – Snowball Stemmer for Go.
word-embedding – Word Embeddings: the full implementation of word2vec, GloVe in Go.
sentences – Golang implementation of Punkt sentence tokenizer.
go-ngram – In-memory n-gram index with compression. [Deprecated]
paicehusk – Golang implementation of the Paice/Husk Stemming Algorithm. [Deprecated]
go-porterstemmer – A native Go clean room implementation of the Porter Stemming algorithm. [Deprecated]

General-Purpose Machine Learning

birdland – A recommendation library in Go.
eaopt – An evolutionary optimization library.
leaves – A pure Go implementation of the prediction part of GBRTs, including XGBoost and LightGBM.
gobrain – Neural Networks written in Go.
go-featureprocessing – Fast and convenient feature processing for low latency machine learning in Go.
go-mxnet-predictor – Go binding for MXNet c_predict_api to do inference with a pre-trained model.
go-ml-benchmarks — benchmarks of machine learning inference for Go
go-ml-transpiler – An open source Go transpiler for machine learning models.
golearn – Machine learning for Go.
goml – Machine learning library written in pure Go.
gorgonia – Deep learning in Go.
goro – A high-level machine learning library in the vein of Keras.
gorse – An offline recommender system backend based on collaborative filtering written in Go.
therfoo – An embedded deep learning library for Go.
neat – Plug-and-play, parallel Go framework for NeuroEvolution of Augmenting Topologies (NEAT). [Deprecated]
go-pr – Pattern recognition package in Go lang. [Deprecated]
go-ml – Linear / Logistic regression, Neural Networks, Collaborative Filtering and Gaussian Multivariate Distribution. [Deprecated]
GoNN – GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN. [Deprecated]
bayesian – Naive Bayesian Classification for Golang. [Deprecated]
go-galib – Genetic Algorithms library written in Go / Golang. [Deprecated]
Cloudforest – Ensembles of decision trees in Go/Golang. [Deprecated]
go-dnn – Deep Neural Networks for Golang (powered by MXNet)

Spatial analysis and geometry

go-geom – Go library to handle geometries.
gogeo – Spherical geometry in Go.

Data Analysis / Data Visualization

dataframe-go – Dataframes for machine-learning and statistics (similar to pandas).
gota – Dataframes.
gonum/mat – A linear algebra package for Go.
gonum/optimize – Implementations of optimization algorithms.
gonum/plot – A plotting library.
gonum/stat – A statistics library.
SVGo – The Go Language library for SVG generation.
glot – Glot is a plotting library for Golang built on top of gnuplot.
globe – Globe wireframe visualization.
gonum/graph – General-purpose graph library.
go-graph – Graph library for Go/Golang language. [Deprecated]
RF – Random forests implementation in Go. [Deprecated]

Computer vision

GoCV – Package for computer vision using OpenCV 4 and beyond.

Reinforcement learning

gold – A reinforcement learning library.

Haskell

General-Purpose Machine Learning

haskell-ml – Haskell implementations of various ML algorithms. [Deprecated]
HLearn – a suite of libraries for interpreting machine learning models according to their algebraic structure. [Deprecated]
hnn – Haskell Neural Network library.
hopfield-networks – Hopfield Networks for unsupervised learning in Haskell. [Deprecated]
DNNGraph – A DSL for deep neural networks. [Deprecated]
LambdaNet – Configurable Neural Networks in Haskell. [Deprecated]

Java

Natural Language Processing

Cortical.io – Retina: an API performing complex NLP operations (disambiguation, classification, streaming text filtering, etc…) as quickly and intuitively as the brain.
IRIS – Cortical.io’s FREE NLP, Retina API Analysis Tool (written in JavaFX!) – See the Tutorial Video.
CoreNLP – Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words.
Stanford Parser – A natural language parser is a program that works out the grammatical structure of sentences.
Stanford POS Tagger – A Part-Of-Speech Tagger (POS Tagger).
Stanford Name Entity Recognizer – Stanford NER is a Java implementation of a Named Entity Recognizer.
Stanford Word Segmenter – Tokenization of raw text is a standard pre-processing step for many NLP tasks.
Tregex, Tsurgeon and Semgrex – Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”).
Stanford Phrasal: A Phrase-Based Translation System
Stanford English Tokenizer – Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.
Stanford Tokens Regex – A tokenizer divides text into a sequence of tokens, which roughly correspond to “words”.
Stanford Temporal Tagger – SUTime is a library for recognizing and normalizing time expressions.
Stanford SPIED – Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion.
Twitter Text Java – A Java implementation of Twitter’s text processing library.
MALLET – A Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
OpenNLP – a machine learning based toolkit for the processing of natural language text.
LingPipe – A tool kit for processing text using computational linguistics.
ClearTK – ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA. [Deprecated]
Apache cTAKES – Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text.
NLP4J – The NLP4J project provides software and resources for natural language processing. The project started at the Center for Computational Language and EducAtion Research, and is currently developed by the Center for Language and Information Research at Emory University. [Deprecated]
CogcompNLP – This project collects a number of core libraries for Natural Language Processing (NLP) developed in the University of Illinois’ Cognitive Computation Group, for example illinois-core-utilities which provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc, illinois-edison a library for feature extraction from illinois-core-utilities data structures and many other packages.

General-Purpose Machine Learning

aerosolve – A machine learning library by Airbnb designed from the ground up to be human friendly.
AMIDST Toolbox – A Java Toolbox for Scalable Probabilistic Machine Learning.
Datumbox – Machine Learning framework for rapid development of Machine Learning and Statistical applications.
ELKI – Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
Encog – An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
FlinkML in Apache Flink – Distributed machine learning library in Flink.
H2O – ML engine that supports distributed learning on Hadoop, Spark or your laptop via APIs in R, Python, Scala, REST/JSON.
htm.java – General Machine Learning library using Numenta’s Cortical Learning Algorithm.
liblinear-java – Java version of liblinear.
Mahout – Distributed machine learning.
Meka – An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
MLlib in Apache Spark – Distributed machine learning library in Spark
Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
Neuroph – Neuroph is lightweight Java neural network framework
ORYX – Lambda Architecture Framework using Apache Spark and Apache Kafka with a specialization for real-time large-scale machine learning.
Samoa SAMOA is a framework that includes distributed machine learning for data streams with an interface to plug-in different stream processing platforms.
RankLib – RankLib is a library of learning to rank algorithms. [Deprecated]
rapaio – statistics, data mining and machine learning toolbox in Java.
RapidMiner – RapidMiner integration into Java code.
Stanford Classifier – A classifier is a machine learning tool that will take data items and place them into one of k classes.
Smile – Statistical Machine Intelligence & Learning Engine.
SystemML – flexible, scalable machine learning (ML) language.
Weka – Weka is a collection of machine learning algorithms for data mining tasks.
LBJava – Learning Based Java is a modeling language for the rapid development of software systems, offers a convenient, declarative syntax for classifier and constraint definition directly in terms of the objects in the programmer’s application.

Speech Recognition

CMU Sphinx – Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.

Data Analysis / Data Visualization

Flink – Open source platform for distributed stream and batch data processing.
Hadoop – Hadoop/HDFS.
Onyx – Distributed, masterless, high performance, fault tolerant data processing. Written entirely in Clojure.
Spark – Spark is a fast and general engine for large-scale data processing.
Storm – Storm is a distributed realtime computation system.
Impala – Real-time Query for Hadoop.
DataMelt – Mathematics software for numeric computation, statistics, symbolic calculations, data analysis and data visualization.
Dr. Michael Thomas Flanagan’s Java Scientific Library [Deprecated]

Deep Learning

Deeplearning4j – Scalable deep learning for industry with parallel GPUs.
Keras Beginner Tutorial – Friendly guide on using Keras to implement a simple Neural Network in Python

Javascript

Natural Language Processing

Twitter-text – A JavaScript implementation of Twitter’s text processing library.
natural – General natural language facilities for node.
Knwl.js – A Natural Language Processor in JS.
Retext – Extensible system for analyzing and manipulating natural language.
NLP Compromise – Natural Language processing in the browser.
nlp.js – An NLP library built in node over Natural, with entity extraction, sentiment analysis, automatic language identify, and so more

Data Analysis / Data Visualization

D3.js
High Charts
NVD3.js
dc.js
chartjs
dimple
amCharts
D3xter – Straight forward plotting built on D3. [Deprecated]
statkit – Statistics kit for JavaScript. [Deprecated]
datakit – A lightweight framework for data analysis in JavaScript
science.js – Scientific and statistical computing in JavaScript. [Deprecated]
Z3d – Easily make interactive 3d plots built on Three.js [Deprecated]
Sigma.js – JavaScript library dedicated to graph drawing.
C3.js – customizable library based on D3.js for easy chart drawing.
Datamaps – Customizable SVG map/geo visualizations using D3.js. [Deprecated]
ZingChart – library written on Vanilla JS for big data visualization.
cheminfo – Platform for data visualization and analysis, using the visualizer project.
Learn JS Data
AnyChart
FusionCharts
Nivo – built on top of the awesome d3 and Reactjs libraries

General-Purpose Machine Learning

Auto ML – Automated machine learning, data formatting, ensembling, and hyperparameter optimization for competitions and exploration- just give it a .csv file!
Convnet.js – ConvNetJS is a Javascript library for training Deep Learning models[DEEP LEARNING] [Deprecated]
Clusterfck – Agglomerative hierarchical clustering implemented in Javascript for Node.js and the browser. [Deprecated]
Clustering.js – Clustering algorithms implemented in Javascript for Node.js and the browser. [Deprecated]
Decision Trees – NodeJS Implementation of Decision Tree using ID3 Algorithm. [Deprecated]
DN2A – Digital Neural Networks Architecture. [Deprecated]
figue – K-means, fuzzy c-means and agglomerative clustering.
Gaussian Mixture Model – Unsupervised machine learning with multivariate Gaussian mixture model.
Node-fann – FANN (Fast Artificial Neural Network Library) bindings for Node.js [Deprecated]
Keras.js – Run Keras models in the browser, with GPU support provided by WebGL 2.
Kmeans.js – Simple Javascript implementation of the k-means algorithm, for node.js and the browser. [Deprecated]
LDA.js – LDA topic modeling for Node.js
Learning.js – Javascript implementation of logistic regression/c4.5 decision tree [Deprecated]
machinelearn.js – Machine Learning library for the web, Node.js and developers
mil-tokyo – List of several machine learning libraries.
Node-SVM – Support Vector Machine for Node.js
Brain – Neural networks in JavaScript [Deprecated]
Brain.js – Neural networks in JavaScript – continued community fork of Brain.
Bayesian-Bandit – Bayesian bandit implementation for Node and the browser. [Deprecated]
Synaptic – Architecture-free neural network library for Node.js and the browser.
kNear – JavaScript implementation of the k nearest neighbors algorithm for supervised learning.
NeuralN – C++ Neural Network library for Node.js. It has advantage on large dataset and multi-threaded training. [Deprecated]
kalman – Kalman filter for Javascript. [Deprecated]
shaman – Node.js library with support for both simple and multiple linear regression. [Deprecated]
ml.js – Machine learning and numerical analysis tools for Node.js and the Browser!
ml5 – Friendly machine learning for the web!
Pavlov.js – Reinforcement learning using Markov Decision Processes.
MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
TensorFlow.js – A WebGL accelerated, browser based JavaScript library for training and deploying ML models.
JSMLT – Machine learning toolkit with classification and clustering for Node.js; supports visualization (see visualml.io).
xgboost-node – Run XGBoost model and make predictions in Node.js.
Netron – Visualizer for machine learning models.
WebDNN – Fast Deep Neural Network Javascript Framework. WebDNN uses next generation JavaScript API, WebGPU for GPU execution, and WebAssembly for CPU execution.

Misc

stdlib – A standard library for JavaScript and Node.js, with an emphasis on numeric computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.
sylvester – Vector and Matrix math for JavaScript. [Deprecated]
simple-statistics – A JavaScript implementation of descriptive, regression, and inference statistics. Implemented in literate JavaScript with no dependencies, designed to work in all modern browsers (including IE) as well as in Node.js.
regression-js – A javascript library containing a collection of least squares fitting methods for finding a trend in a set of data.
Lyric – Linear Regression library. [Deprecated]
GreatCircle – Library for calculating great circle distance.
MLPleaseHelp – MLPleaseHelp is a simple ML resource search engine. You can use this search engine right now at https://jgreenemi.github.io/MLPleaseHelp/, provided via Github Pages.
Pipcook – A JavaScript application framework for machine learning and its engineering.

Demos and Scripts

The Bot – Example of how the neural network learns to predict the angle between two points created with Synaptic.
Half Beer – Beer glass classifier created with Synaptic.
NSFWJS – Indecent content checker with TensorFlow.js
Rock Paper Scissors – Rock Paper Scissors trained in the browser with TensorFlow.js

Julia

General-Purpose Machine Learning

MachineLearning – Julia Machine Learning library. [Deprecated]
MLBase – A set of functions to support the development of machine learning algorithms.
PGM – A Julia framework for probabilistic graphical models.
DA – Julia package for Regularized Discriminant Analysis.
Regression – Algorithms for regression analysis (e.g. linear regression and logistic regression). [Deprecated]
Local Regression – Local regression, so smooooth!
Naive Bayes – Simple Naive Bayes implementation in Julia. [Deprecated]
Mixed Models – A Julia package for fitting (statistical) mixed-effects models.
Simple MCMC – basic mcmc sampler implemented in Julia. [Deprecated]
Distances – Julia module for Distance evaluation.
Decision Tree – Decision Tree Classifier and Regressor.
Neural – A neural network in Julia.
MCMC – MCMC tools for Julia. [Deprecated]
Mamba – Markov chain Monte Carlo (MCMC) for Bayesian analysis in Julia.
GLM – Generalized linear models in Julia.
Gaussian Processes – Julia package for Gaussian processes.
Online Learning [Deprecated]
GLMNet – Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet.
Clustering – Basic functions for clustering data: k-means, dp-means, etc.
SVM – SVM for Julia. [Deprecated]
Kernel Density – Kernel density estimators for julia.
MultivariateStats – Methods for dimensionality reduction.
NMF – A Julia package for non-negative matrix factorization.
ANN – Julia artificial neural networks. [Deprecated]
Mocha – Deep Learning framework for Julia inspired by Caffe. [Deprecated]
XGBoost – eXtreme Gradient Boosting Package in Julia.
ManifoldLearning – A Julia package for manifold learning and nonlinear dimensionality reduction.
MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
Merlin – Flexible Deep Learning Framework in Julia.
ROCAnalysis – Receiver Operating Characteristics and functions for evaluation probabilistic binary classifiers.
GaussianMixtures – Large scale Gaussian Mixture Models.
ScikitLearn – Julia implementation of the scikit-learn API.
Knet – Koç University Deep Learning Framework.
Flux – Relax! Flux is the ML library that doesn’t make you tensor
MLJ – A Julia machine learning framework

Natural Language Processing

Topic Models – TopicModels for Julia. [Deprecated]
Text Analysis – Julia package for text analysis.
Word Tokenizers – Tokenizers for Natural Language Processing in Julia
Corpus Loaders – A julia package providing a variety of loaders for various NLP corpora.
Embeddings – Functions and data dependencies for loading various word embeddings
Languages – Julia package for working with various human languages
WordNet – A Julia package for Princeton’s WordNet

Data Analysis / Data Visualization

Graph Layout – Graph layout algorithms in pure Julia.
LightGraphs – Graph modeling and analysis.
Data Frames Meta – Metaprogramming tools for DataFrames.
Julia Data – library for working with tabular data in Julia. [Deprecated]
Data Read – Read files from Stata, SAS, and SPSS.
Hypothesis Tests – Hypothesis tests for Julia.
Gadfly – Crafty statistical graphics for Julia.
Stats – Statistical tests for Julia.
RDataSets – Julia package for loading many of the data sets available in R.
DataFrames – library for working with tabular data in Julia.
Distributions – A Julia package for probability distributions and associated functions.
Data Arrays – Data structures that allow missing values. [Deprecated]
Time Series – Time series toolkit for Julia.
Sampling – Basic sampling algorithms for Julia.

Misc Stuff / Presentations

DSP – Digital Signal Processing (filtering, periodograms, spectrograms, window functions).
JuliaCon Presentations – Presentations for JuliaCon.
SignalProcessing – Signal Processing tools for Julia.
Images – An image library for Julia.
DataDeps – Reproducible data setup for reproducible science.

Lua

General-Purpose Machine Learning

Torch7
- cephes – Cephes mathematical functions library, wrapped for Torch. Provides and wraps the 180+ special mathematical functions from the Cephes mathematical library, developed by Stephen L. Moshier. It is used, among many other places, at the heart of SciPy. [Deprecated]
- autograd – Autograd automatically differentiates native Torch code. Inspired by the original Python version.
- graph – Graph package for Torch. [Deprecated]
- randomkit – Numpy’s randomkit, wrapped for Torch. [Deprecated]
- signal – A signal processing toolbox for Torch-7. FFT, DCT, Hilbert, cepstrums, stft.
- nn – Neural Network package for Torch.
- torchnet – framework for torch which provides a set of abstractions aiming at encouraging code re-use as well as encouraging modular programming.
- nngraph – This package provides graphical computation for nn library in Torch7.
- nnx – A completely unstable and experimental package that extends Torch’s builtin nn library.
- rnn – A Recurrent Neural Network library that extends Torch’s nn. RNNs, LSTMs, GRUs, BRNNs, BLSTMs, etc.
- dpnn – Many useful features that aren’t part of the main nn package.
- dp – A deep learning library designed for streamlining research and development using the Torch7 distribution. It emphasizes flexibility through the elegant use of object-oriented design patterns. [Deprecated]
- optim – An optimization library for Torch. SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp and more.
- unsup – A package for unsupervised learning in Torch. Provides modules that are compatible with nn (LinearPsd, ConvPsd, AutoEncoder, …), and self-contained algorithms (k-means, PCA). [Deprecated]
- manifold – A package to manipulate manifolds.
- svm – Torch-SVM library. [Deprecated]
- lbfgs – FFI Wrapper for liblbfgs. [Deprecated]
- vowpalwabbit – An old vowpalwabbit interface to torch. [Deprecated]
- OpenGM – OpenGM is a C++ library for graphical modeling, and inference. The Lua bindings provide a simple way of describing graphs, from Lua, and then optimizing them with OpenGM. [Deprecated]
- spaghetti – Spaghetti (sparse linear) module for torch7 by @MichaelMathieu [Deprecated]
- LuaSHKit – A lua wrapper around the Locality sensitive hashing library SHKit [Deprecated]
- kernel smoothing – KNN, kernel-weighted average, local linear regression smoothers. [Deprecated]
- cutorch – Torch CUDA Implementation.
- cunn – Torch CUDA Neural Network Implementation.
- imgraph – An image/graph library for Torch. This package provides routines to construct graphs on images, segment them, build trees out of them, and convert them back to images. [Deprecated]
- videograph – A video/graph library for Torch. This package provides routines to construct graphs on videos, segment them, build trees out of them, and convert them back to videos. [Deprecated]
- saliency – code and tools around integral images. A library for finding interest points based on fast integral histograms. [Deprecated]
- stitch – allows us to use hugin to stitch images and apply same stitching to a video sequence. [Deprecated]
- sfm – A bundle adjustment/structure from motion package. [Deprecated]
- fex – A package for feature extraction in Torch. Provides SIFT and dSIFT modules. [Deprecated]
- OverFeat – A state-of-the-art generic dense feature extractor. [Deprecated]
- wav2letter – a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research.
Numeric Lua
Lunatic Python
SciLua
Lua – Numerical Algorithms [Deprecated]
Lunum [Deprecated]

Demos and Scripts

Core torch7 demos repository.
- linear-regression, logistic-regression
- face detector (training and detection as separate demos)
- mst-based-segmenter
- train-a-digit-classifier
- train-autoencoder
- optical flow demo
- train-on-housenumbers
- train-on-cifar
- tracking with deep nets
- kinect demo
- filter-bank visualization
- saliency-networks
Training a Convnet for the Galaxy-Zoo Kaggle challenge(CUDA demo)
Music Tagging – Music Tagging scripts for torch7.
torch-datasets – Scripts to load several popular datasets including:
- BSR 500
- CIFAR-10
- COIL
- Street View House Numbers
- MNIST
- NORB
Atari2600 – Scripts to generate a dataset with static frames from the Arcade Learning Environment.

Matlab

Computer Vision

Contourlets – MATLAB source code that implements the contourlet transform and its utility functions.
Shearlets – MATLAB code for shearlet transform.
Curvelets – The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles.
Bandlets – MATLAB code for bandlet transform.
mexopencv – Collection and a development kit of MATLAB mex functions for OpenCV library.

Natural Language Processing

NLP – A NLP library for Matlab.

General-Purpose Machine Learning

Training a deep autoencoder or a classifier on MNIST digits – Training a deep autoencoder or a classifier on MNIST digits[DEEP LEARNING].
Convolutional-Recursive Deep Learning for 3D Object Classification – Convolutional-Recursive Deep Learning for 3D Object Classification[DEEP LEARNING].
Spider – The spider is intended to be a complete object orientated environment for machine learning in Matlab.
LibSVM – A Library for Support Vector Machines.
ThunderSVM – An Open-Source SVM Library on GPUs and CPUs
LibLinear – A Library for Large Linear Classification.
Machine Learning Module – Class on machine w/ PDF, lectures, code
Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind.
Pattern Recognition Toolbox – A complete object-oriented environment for machine learning in Matlab.
Pattern Recognition and Machine Learning – This package contains the matlab implementation of the algorithms described in the book Pattern Recognition and Machine Learning by C. Bishop.
Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly with MATLAB.
MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
Machine Learning in MatLab/Octave – examples of popular machine learning algorithms (neural networks, linear/logistic regressions, K-Means, etc.) with code examples and mathematics behind them being explained.

Data Analysis / Data Visualization

ParaMonte – A general-purpose MATLAB library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
matlab_bgl – MatlabBGL is a Matlab package for working with graphs.
gaimc – Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL’s mex functions.

.NET

Computer Vision

OpenCVDotNet – A wrapper for the OpenCV project to be used with .NET applications.
Emgu CV – Cross platform wrapper of OpenCV which can be compiled in Mono to be run on Windows, Linus, Mac OS X, iOS, and Android.
AForge.NET – Open source C# framework for developers and researchers in the fields of Computer Vision and Artificial Intelligence. Development has now shifted to GitHub.
Accord.NET – Together with AForge.NET, this library can provide image processing and computer vision algorithms to Windows, Windows RT and Windows Phone. Some components are also available for Java and Android.

Natural Language Processing

Stanford.NLP for .NET – A full port of Stanford NLP packages to .NET and also available precompiled as a NuGet package.

General-Purpose Machine Learning

Accord-Framework -The Accord.NET Framework is a complete framework for building machine learning, computer vision, computer audition, signal processing and statistical applications.
Accord.MachineLearning – Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications. This package is part of the Accord.NET Framework.
DiffSharp – An automatic differentiation (AD) library providing exact and efficient derivatives (gradients, Hessians, Jacobians, directional derivatives, and matrix-free Hessian- and Jacobian-vector products) for machine learning and optimization applications. Operations can be nested to any level, meaning that you can compute exact higher-order derivatives and differentiate functions that are internally making use of differentiation, for applications such as hyperparameter optimization.
Encog – An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
GeneticSharp – Multi-platform genetic algorithm library for .NET Core and .NET Framework. The library has several implementations of GA operators, like: selection, crossover, mutation, reinsertion and termination.
Infer.NET – Infer.NET is a framework for running Bayesian inference in graphical models. One can use Infer.NET to solve many different kinds of machine learning problems, from standard problems like classification, recommendation or clustering through to customized solutions to domain-specific problems. Infer.NET has been used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision, and many others.
ML.NET – ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers. ML.NET was originally developed in Microsoft Research and evolved into a significant framework over the last decade and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel and more.
Neural Network Designer – DBMS management system and designer for neural networks. The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feedback. The chat bots can even scrape the internet for information to return in their output as well as to use for learning.
Synapses – Neural network library in F#.
Vulpes – Deep belief and deep learning implementation written in F# and leverages CUDA GPU execution with Alea.cuBase.
MxNet.Sharp – .NET Standard bindings for Apache MxNet with Imperative, Symbolic and Gluon Interface for developing, training and deploying Machine Learning models in C#. https://mxnet.tech-quantum.com/

Data Analysis / Data Visualization

numl – numl is a machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering.
Math.NET Numerics – Numerical foundation of the Math.NET project, aiming to provide methods and algorithms for numerical computations in science, engineering and everyday use. Supports .Net 4.0, .Net 3.5 and Mono on Windows, Linux and Mac; Silverlight 5, WindowsPhone/SL 8, WindowsPhone 8.1 and Windows 8 with PCL Portable Profiles 47 and 344; Android/iOS with Xamarin.
Sho – Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.

Objective C

General-Purpose Machine Learning

YCML – A Machine Learning framework for Objective-C and Swift (OS X / iOS).
MLPNeuralNet – Fast multilayer perceptron neural network library for iOS and Mac OS X. MLPNeuralNet predicts new examples by trained neural networks. It is built on top of the Apple’s Accelerate Framework, using vectorized operations and hardware acceleration if available. [Deprecated]
MAChineLearning – An Objective-C multilayer perceptron library, with full support for training through backpropagation. Implemented using vDSP and vecLib, it’s 20 times faster than its Java equivalent. Includes sample code for use from Swift.
BPN-NeuralNetwork – It implemented 3 layers of neural networks ( Input Layer, Hidden Layer and Output Layer ) and it was named Back Propagation Neural Networks (BPN). This network can be used in products recommendation, user behavior analysis, data mining and data analysis. [Deprecated]
Multi-Perceptron-NeuralNetwork – it implemented multi-perceptrons neural network (ニューラルネットワーク) based on Back Propagation Neural Networks (BPN) and designed unlimited-hidden-layers.
KRHebbian-Algorithm – It is a non-supervisor and self-learning algorithm (adjust the weights) in the neural network of Machine Learning. [Deprecated]
KRKmeans-Algorithm – It implemented K-Means clustering and classification algorithm. It could be used in data mining and image compression. [Deprecated]
KRFuzzyCMeans-Algorithm – It implemented Fuzzy C-Means (FCM) the fuzzy clustering / classification algorithm on Machine Learning. It could be used in data mining and image compression. [Deprecated]

OCaml

General-Purpose Machine Learning

Oml – A general statistics and machine learning library.
GPR – Efficient Gaussian Process Regression in OCaml.
Libra-Tk – Algorithms for learning and inference with discrete probabilistic models.
TensorFlow – OCaml bindings for TensorFlow.

Perl

Data Analysis / Data Visualization

Perl Data Language, a pluggable architecture for data and image processing, which can be used for machine learning.

General-Purpose Machine Learning

MXnet for Deep Learning, in Perl, also released in CPAN.
Perl Data Language, using AWS machine learning platform from Perl.
Algorithm::SVMLight, implementation of Support Vector Machines with SVMLight under it. [Deprecated]
Several machine learning and artificial intelligence models are included in the AI namespace. For instance, you can find Naïve Bayes.

Perl 6

Data Analysis / Data Visualization

Perl Data Language, a pluggable architecture for data and image processing, which can be used for machine learning.

General-Purpose Machine Learning

PHP

Natural Language Processing

jieba-php – Chinese Words Segmentation Utilities.

General-Purpose Machine Learning

PHP-ML – Machine Learning library for PHP. Algorithms, Cross Validation, Neural Network, Preprocessing, Feature Extraction and much more in one library.
PredictionBuilder – A library for machine learning that builds predictions using a linear regression.
Rubix ML – A high-level machine learning (ML) library that lets you build programs that learn from data using the PHP language.
19 Questions – A machine learning / bayesian inference assigning attributes to objects.

Python

Computer Vision

Scikit-Image – A collection of algorithms for image processing in Python.
Jobtensor – A powerful tool for learning Python
Scikit-Opt – Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,Artificial Fish Swarm Algorithm in Python)
SimpleCV – An open source computer vision framework that gives access to several high-powered computer vision libraries, such as OpenCV. Written on Python and runs on Mac, Windows, and Ubuntu Linux.
Vigranumpy – Python bindings for the VIGRA C++ computer vision library.
OpenFace – Free and open source face recognition with deep neural networks.
PCV – Open source Python module for computer vision. [Deprecated]
face_recognition – Face recognition library that recognizes and manipulates faces from Python or from the command line.
dockerface – Easy to install and use deep learning Faster R-CNN face detection for images and video in a docker container.
Detectron – FAIR’s software system that implements state-of-the-art object detection algorithms, including Mask R-CNN. It is written in Python and powered by the Caffe2 deep learning framework. [Deprecated]
detectron2 – FAIR’s next-generation research platform for object detection and segmentation. It is a ground-up rewrite of the previous version, Detectron, and is powered by the PyTorch deep learning framework.
albumentations – А fast and framework agnostic image augmentation library that implements a diverse set of augmentation techniques. Supports classification, segmentation, detection out of the box. Was used to win a number of Deep Learning competitions at Kaggle, Topcoder and those that were a part of the CVPR workshops.
pytessarct – Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images. Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine.
imutils – A library containing Convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.
PyTorchCV – A PyTorch-Based Framework for Deep Learning in Computer Vision.
Self-supervised learning
neural-style-pt – A PyTorch implementation of Justin Johnson’s neural-style (neural style transfer).
Detecto – Train and run a computer vision model with 5-10 lines of code.
neural-dream – A PyTorch implementation of DeepDream.
Openpose – A real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Deep High-Resolution-Net – A PyTorch implementation of CVPR2019 paper “Deep High-Resolution Representation Learning for Human Pose Estimation”
dream-creator – A PyTorch implementation of DeepDream. Allows individuals to quickly and easily train their own custom GoogleNet models with custom datasets for DeepDream.
Lucent – Tensorflow and OpenAI Clarity’s Lucid adapted for PyTorch.
lightly – Lightly is a computer vision framework for self-supervised learning.
Learnergy – Energy-based machine learning models built upon PyTorch.

Natural Language Processing

pkuseg-python – A better version of Jieba, developed by Peking University.
NLTK – A leading platform for building Python programs to work with human language data.
Pattern – A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
Quepy – A python framework to transform natural language questions to queries in a database query language.
TextBlob – Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
YAlign – A sentence aligner, a friendly tool for extracting parallel sentences from comparable corpora. [Deprecated]
jieba – Chinese Words Segmentation Utilities.
SnowNLP – A library for processing Chinese text.
spammy – A library for email Spam filtering built on top of nltk
loso – Another Chinese segmentation library. [Deprecated]
genius – A Chinese segment based on Conditional Random Field.
KoNLPy – A Python package for Korean natural language processing.
nut – Natural language Understanding Toolkit. [Deprecated]
Rosetta – Text processing tools and wrappers (e.g. Vowpal Wabbit)
BLLIP Parser – Python bindings for the BLLIP Natural Language Parser (also known as the Charniak-Johnson parser). [Deprecated]
PyNLPl – Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for FoLiA, but also ARPA language models, Moses phrasetables, GIZA++ alignments.
PySS3 – Python package that implements a novel white-box machine learning model for text classification, called SS3. Since SS3 has the ability to visually explain its rationale, this package also comes with easy-to-use interactive visualizations tools (online demos).
python-ucto – Python binding to ucto (a unicode-aware rule-based tokenizer for various languages).
python-frog – Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
python-zpar – Python bindings for ZPar, a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.
colibri-core – Python binding to C++ library for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
spaCy – Industrial strength NLP with Python and Cython.
PyStanfordDependencies – Python interface for converting Penn Treebank trees to Stanford Dependencies.
Distance – Levenshtein and Hamming distance computation. [Deprecated]
Fuzzy Wuzzy – Fuzzy String Matching in Python.
jellyfish – a python library for doing approximate and phonetic matching of strings.
editdistance – fast implementation of edit distance.
textacy – higher-level NLP built on Spacy.
stanford-corenlp-python – Python wrapper for Stanford CoreNLP [Deprecated]
CLTK – The Classical Language Toolkit.
Rasa – A “machine learning framework to automate text-and voice-based conversations.”
yase – Transcode sentence (or other sequence) to list of word vector .
Polyglot – Multilingual text (NLP) processing toolkit.
DrQA – Reading Wikipedia to answer open-domain questions.
Dedupe – A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Snips NLU – Natural Language Understanding library for intent classification and entity extraction
NeuroNER – Named-entity recognition using neural networks providing state-of-the-art-results
DeepPavlov – conversational AI library with many pre-trained Russian NLP models.
BigARTM – topic modelling platform.
NALP – A Natural Adversarial Language Processing framework built over Tensorflow.

General-Purpose Machine Learning

Shapley -> A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
igel -> A delightful machine learning tool that allows you to train/fit, test and use models without writing code
ML Model building -> A Repository Containing Classification, Clustering, Regression, Recommender Notebooks with illustration to make them.
ML/DL project template
PyTorch Geometric Temporal -> A temporal extension of PyTorch Geometric for dynamic graph representation learning.
Little Ball of Fur -> A graph sampling extension library for NetworkX with a Scikit-Learn like API.
Karate Club -> An unsupervised machine learning extension library for NetworkX with a Scikit-Learn like API.
Auto_ViML -> Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced “auto vimal”, is a comprehensive and scalable Python AutoML toolkit with imbalanced handling, ensembling, stacking and built-in feature selection. Featured in Medium article.
PyOD -> Python Outlier Detection, comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. Featured for Advanced models, including Neural Networks/Deep Learning and Outlier Ensembles.
steppy -> Lightweight, Python library for fast and reproducible machine learning experimentation. Introduces a very simple interface that enables clean machine learning pipeline design.
steppy-toolkit -> Curated collection of the neural networks, transformers and models that make your machine learning work faster and more effective.
CNTK – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit. Documentation can be found here.
Couler – Unified interface for constructing and managing machine learning workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
auto_ml – Automated machine learning for production and analytics. Lets you focus on the fun parts of ML, while outputting production-ready code, and detailed analytics of your dataset and results. Includes support for NLP, XGBoost, CatBoost, LightGBM, and soon, deep learning.
machine learning – automated build consisting of a web-interface, and set of programmatic-interface API, for support vector machines. Corresponding dataset(s) are stored into a SQL database, then generated model(s) used for prediction(s), are stored into a NoSQL datastore.
XGBoost – Python bindings for eXtreme Gradient Boosting (Tree) Library.
Apache SINGA – An Apache Incubating project for developing an open source machine learning library.
Bayesian Methods for Hackers – Book/iPython notebooks on Probabilistic Programming in Python.
Featureforge A set of tools for creating and testing machine learning features, with a scikit-learn compatible API.
MLlib in Apache Spark – Distributed machine learning library in Spark
Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
scikit-learn – A Python module for machine learning built on top of SciPy.
metric-learn – A Python module for metric learning.
SimpleAI Python implementation of many of the artificial intelligence algorithms described in the book “Artificial Intelligence, a Modern Approach”. It focuses on providing an easy to use, well documented and tested library.
astroML – Machine Learning and Data Mining for Astronomy.
graphlab-create – A library with various machine learning models (regression, clustering, recommender systems, graph analytics, etc.) implemented on top of a disk-backed DataFrame.
BigML – A library that contacts external servers.
pattern – Web mining module for Python.
NuPIC – Numenta Platform for Intelligent Computing.
Pylearn2 – A Machine Learning library based on Theano. [Deprecated]
keras – High-level neural networks frontend for TensorFlow, CNTK and Theano.
Lasagne – Lightweight library to build and train neural networks in Theano.
hebel – GPU-Accelerated Deep Learning Library in Python. [Deprecated]
Chainer – Flexible neural network framework.
prophet – Fast and automated time series forecasting framework by Facebook.
gensim – Topic Modelling for Humans.
topik – Topic modelling toolkit. [Deprecated]
PyBrain – Another Python Machine Learning Library.
Brainstorm – Fast, flexible and fun neural networks. This is the successor of PyBrain.
Surprise – A scikit for building and analyzing recommender systems.
implicit – Fast Python Collaborative Filtering for Implicit Datasets.
LightFM – A Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback.
Crab – A flexible, fast recommender engine. [Deprecated]
python-recsys – A Python library for implementing a Recommender System.
thinking bayes – Book on Bayesian Analysis.
Image-to-Image Translation with Conditional Adversarial Networks – Implementation of image to image (pix2pix) translation from the paper by isola et al.[DEEP LEARNING]
Restricted Boltzmann Machines -Restricted Boltzmann Machines in Python. [DEEP LEARNING]
Bolt – Bolt Online Learning Toolbox. [Deprecated]
CoverTree – Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree [Deprecated]
nilearn – Machine learning for NeuroImaging in Python.
neuropredict – Aimed at novice machine learners and non-expert programmers, this package offers easy (no coding needed) and comprehensive machine learning (evaluation and full report of predictive performance WITHOUT requiring you to code) in Python for NeuroImaging and any other type of features. This is aimed at absorbing much of the ML workflow, unlike other packages like nilearn and pymvpa, which require you to learn their API and code to produce anything useful.
imbalanced-learn – Python module to perform under sampling and oversampling with various techniques.
Shogun – The Shogun Machine Learning Toolbox.
Pyevolve – Genetic algorithm framework. [Deprecated]
Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind.
breze – Theano based library for deep and recurrent neural networks.
Cortex – Open source platform for deploying machine learning models in production.
pyhsmm – library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
SKLL – A wrapper around scikit-learn that makes it simpler to conduct experiments.
neurolab
Spearmint – Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012. [Deprecated]
Pebl – Python Environment for Bayesian Learning. [Deprecated]
Theano – Optimizing GPU-meta-programming code generating array oriented optimizing math compiler in Python.
TensorFlow – Open source software library for numerical computation using data flow graphs.
pomegranate – Hidden Markov Models for Python, implemented in Cython for speed and efficiency.
python-timbl – A Python extension module wrapping the full TiMBL C++ programming interface. Timbl is an elaborate k-Nearest Neighbours machine learning toolkit.
deap – Evolutionary algorithm framework.
pydeep – Deep Learning In Python. [Deprecated]
mlxtend – A library consisting of useful tools for data science and machine learning tasks.
neon – Nervana’s high-performance Python-based Deep Learning framework [DEEP LEARNING]. [Deprecated]
Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search.
Neural Networks and Deep Learning – Code samples for my book “Neural Networks and Deep Learning” [DEEP LEARNING].
Annoy – Approximate nearest neighbours implementation.
TPOT – Tool that automatically creates and optimizes machine learning pipelines using genetic programming. Consider it your personal data science assistant, automating a tedious part of machine learning.
pgmpy A python library for working with Probabilistic Graphical Models.
DIGITS – The Deep Learning GPU Training System (DIGITS) is a web application for training deep learning models.
Orange – Open source data visualization and data analysis for novices and experts.
MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
milk – Machine learning toolkit focused on supervised classification. [Deprecated]
TFLearn – Deep learning library featuring a higher-level API for TensorFlow.
REP – an IPython-based environment for conducting data-driven research in a consistent and reproducible way. REP is not trying to substitute scikit-learn, but extends it and provides better user experience. [Deprecated]
rgf_python – Python bindings for Regularized Greedy Forest (Tree) Library.
skbayes – Python package for Bayesian Machine Learning with scikit-learn API.
fuku-ml – Simple machine learning library, including Perceptron, Regression, Support Vector Machine, Decision Tree and more, it’s easy to use and easy to learn for beginners.
Xcessiv – A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
PyTorch – Tensors and Dynamic neural networks in Python with strong GPU acceleration
PyTorch Lightning – The lightweight PyTorch wrapper for high-performance AI research.
PyTorch Lightning Bolts – Toolbox of models, callbacks, and datasets for AI/ML researchers.
skorch – A scikit-learn compatible neural network library that wraps PyTorch.
ML-From-Scratch – Implementations of Machine Learning models from scratch in Python with a focus on transparency. Aims to showcase the nuts and bolts of ML in an accessible way.
Edward – A library for probabilistic modeling, inference, and criticism. Built on top of TensorFlow.
xRBM – A library for Restricted Boltzmann Machine (RBM) and its conditional variants in Tensorflow.
CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box. It is easy to install, well documented and supports CPU and GPU (even multi-GPU) computation.
stacked_generalization – Implementation of machine learning stacking technique as a handy library in Python.
modAL – A modular active learning framework for Python, built on top of scikit-learn.
Cogitare: A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python.
Parris – Parris, the automated infrastructure setup tool for machine learning algorithms.
neonrvm – neonrvm is an open source machine learning library based on RVM technique. It’s written in C programming language and comes with Python programming language bindings.
Turi Create – Machine learning from Apple. Turi Create simplifies the development of custom machine learning models. You don’t have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
xLearn – A high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems.
mlens – A high performance, memory efficient, maximally parallelized ensemble learning, integrated with scikit-learn.
Netron – Visualizer for machine learning models.
Thampi – Machine Learning Prediction System on AWS Lambda
MindsDB – Open Source framework to streamline use of neural networks.
Microsoft Recommenders: Examples and best practices for building recommendation systems, provided as Jupyter notebooks. The repo contains some of the latest state of the art algorithms from Microsoft Research as well as from other companies and institutions.
StellarGraph: Machine Learning on Graphs, a Python library for machine learning on graph-structured (network-structured) data.
BentoML: Toolkit for package and deploy machine learning models for serving in production
MiraiML: An asynchronous engine for continuous & autonomous machine learning, built for real-time usage.
numpy-ML: Reference implementations of ML models written in numpy
creme: A framework for online machine learning.
Neuraxle: A framework providing the right abstractions to ease research, development, and deployment of your ML pipelines.
Cornac – A comparative framework for multimodal recommender systems with a focus on models leveraging auxiliary data.
JAX – JAX is Autograd and XLA, brought together for high-performance machine learning research.
Catalyst – High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing. Being able to research/develop something new, rather than write another regular train loop.
Fastai – High-level wrapper built on the top of Pytorch which supports vision, text, tabular data and collaborative filtering.
scikit-multiflow – A machine learning framework for multi-output/multi-label and stream data.
Lightwood – A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with objective to build predictive models with one line of code.
bayeso – A simple, but essential Bayesian optimization package, written in Python.
mljar-supervised – An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides explanations and markdown reports.
evostra – A fast Evolution Strategy implementation in Python.
Determined – Scalable deep learning training platform, including integrated support for distributed training, hyperparameter tuning, experiment tracking, and model management.
PySyft – A Python library for secure and private Deep Learning built on PyTorch and TensorFlow.
PyGrid – Peer-to-peer network of data owners and data scientists who can collectively train AI models using PySyft
sktime – A unified framework for machine learning with time series
OPFython – A Python-inspired implementation of the Optimum-Path Forest classifier.
Opytimizer – Python-based meta-heuristic optimization techniques.
Gradio – A Python library for quickly creating and sharing demos of models. Debug models interactively in your browser, get feedback from collaborators, and generate public links without deploying anything.
Hub – Fastest unstructured dataset management for TensorFlow/PyTorch. Stream & version-control data. Store even petabyte-scale data in a single numpy-like array on the cloud accessible on any machine. Visit activeloop.ai for more info.
Synthia – Multidimensional synthetic data generation in Python.
ByteHub – An easy-to-use, Python-based feature store. Optimized for time-series data.

Data Analysis / Data Visualization

DataVisualization – A Github Repository Where you can Learn Datavisualizatoin Basics to Intermediate level.
Cartopy – Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
SciPy – A Python-based ecosystem of open-source software for mathematics, science, and engineering.
NumPy – A fundamental package for scientific computing with Python.
AutoViz AutoViz performs automatic visualization of any dataset with a single line of Python code. Give it any input file (CSV, txt or json) of any size and AutoViz will visualize it. See Medium article.
Numba – Python JIT (just in time) compiler to LLVM aimed at scientific Python by the developers of Cython and NumPy.
Mars – A tensor-based framework for large-scale data computation which is often regarded as a parallel and distributed version of NumPy.
NetworkX – A high-productivity software for complex networks.
igraph – binding to igraph library – General purpose graph library.
Pandas – A library providing high-performance, easy-to-use data structures and data analysis tools.
ParaMonte – A general-purpose Python library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
Open Mining – Business Intelligence (BI) in Python (Pandas web interface) [Deprecated]
PyMC – Markov Chain Monte Carlo sampling toolkit.
zipline – A Pythonic algorithmic trading library.
PyDy – Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
SymPy – A Python library for symbolic mathematics.
statsmodels – Statistical modeling and econometrics in Python.
astropy – A community Python library for Astronomy.
matplotlib – A Python 2D plotting library.
bokeh – Interactive Web Plotting for Python.
plotly – Collaborative web plotting for Python and matplotlib.
altair – A Python to Vega translator.
d3py – A plotting library for Python, based on D3.js.
PyDexter – Simple plotting for Python. Wrapper for D3xterjs; easily render charts in-browser.
ggplot – Same API as ggplot2 for R. [Deprecated]
ggfortify – Unified interface to ggplot2 popular R packages.
Kartograph.py – Rendering beautiful SVG maps in Python.
pygal – A Python SVG Charts Creator.
PyQtGraph – A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
pycascading [Deprecated]
Petrel – Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
Blaze – NumPy and Pandas interface to Big Data.
emcee – The Python ensemble sampling toolkit for affine-invariant MCMC.
windML – A Python Framework for Wind Energy Analysis and Prediction.
vispy – GPU-based high-performance interactive OpenGL 2D/3D data visualization library.
cerebro2 A web-based visualization and debugging platform for NuPIC. [Deprecated]
NuPIC Studio An all-in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool! [Deprecated]
SparklingPandas Pandas on PySpark (POPS).
Seaborn – A python visualization library based on matplotlib.
bqplot – An API for plotting in Jupyter (IPython).
pastalog – Simple, realtime visualization of neural network training performance.
Superset – A data exploration platform designed to be visual, intuitive, and interactive.
Dora – Tools for exploratory data analysis in Python.
Ruffus – Computation Pipeline library for python.
SOMPY – Self Organizing Map written in Python (Uses neural networks for data analysis).
somoclu Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters, has python API.
HDBScan – implementation of the hdbscan algorithm in Python – used for clustering
visualize_ML – A python package for data exploration and data analysis. [Deprecated]
scikit-plot – A visualization library for quick and easy generation of common plots in data analysis and machine learning.
Bowtie – A dashboard library for interactive visualizations using flask socketio and react.
lime – Lime is about explaining what machine learning classifiers (or models) are doing. It is able to explain any black box classifier, with two or more classes.
PyCM – PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
Dash – A framework for creating analytical web applications built on top of Plotly.js, React, and Flask
Lambdo – A workflow engine for solving machine learning problems by combining in one analysis pipeline (i) feature engineering and machine learning (ii) model training and prediction (iii) table population and column evaluation via user-defined (Python) functions.
TensorWatch – Debugging and visualization tool for machine learning and data science. It extensively leverages Jupyter Notebook to show real-time visualizations of data in running processes such as machine learning training.
dowel – A little logger for machine learning research. Output any object to the terminal, CSV, TensorBoard, text logs on disk, and more with just one call to logger.log().

Misc Scripts / iPython Notebooks / Codebases

MiniGrad – A minimal, educational, Pythonic implementation of autograd (~100 loc).
Map/Reduce implementations of common ML algorithms: Jupyter notebooks that cover how to implement from scratch different ML algorithms (ordinary least squares, gradient descent, k-means, alternating least squares), using Python NumPy, and how to then make these implementations scalable using Map/Reduce and Spark.
BioPy – Biologically-Inspired and Machine Learning Algorithms in Python. [Deprecated]
CAEs for Data Assimilation – Convolutional autoencoders for 3D image/field compression applied to reduced order Data Assimilation.
SVM Explorer – Interactive SVM Explorer, using Dash and scikit-learn
pattern_classification
thinking stats 2
hyperopt
numpic
2012-paper-diginorm
A gallery of interesting IPython notebooks
ipython-notebooks
data-science-ipython-notebooks – Continually updated Data Science Python Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
decision-weights
Sarah Palin LDA – Topic Modeling the Sarah Palin emails.
Diffusion Segmentation – A collection of image segmentation algorithms based on diffusion methods.
Scipy Tutorials – SciPy tutorials. This is outdated, check out scipy-lecture-notes.
Crab – A recommendation engine library for Python.
BayesPy – Bayesian Inference Tools in Python.
scikit-learn tutorials – Series of notebooks for learning scikit-learn.
sentiment-analyzer – Tweets Sentiment Analyzer
sentiment_classifier – Sentiment classifier using word sense disambiguation.
group-lasso – Some experiments with the coordinate descent algorithm used in the (Sparse) Group Lasso model.
jProcessing – Kanji / Hiragana / Katakana to Romaji Converter. Edict Dictionary & parallel sentences Search. Sentence Similarity between two JP Sentences. Sentiment Analysis of Japanese Text. Run Cabocha(ISO–8859-1 configured) in Python.
mne-python-notebooks – IPython notebooks for EEG/MEG data processing using mne-python.
Neon Course – IPython notebooks for a complete course around understanding Nervana’s Neon.
pandas cookbook – Recipes for using Python’s pandas library.
climin – Optimization library focused on machine learning, pythonic implementations of gradient descent, LBFGS, rmsprop, adadelta and others.
Allen Downey’s Data Science Course – Code for Data Science at Olin College, Spring 2014.
Allen Downey’s Think Bayes Code – Code repository for Think Bayes.
Allen Downey’s Think Complexity Code – Code for Allen Downey’s book Think Complexity.
Allen Downey’s Think OS Code – Text and supporting code for Think OS: A Brief Introduction to Operating Systems.
Python Programming for the Humanities – Course for Python programming for the Humanities, assuming no prior knowledge. Heavy focus on text processing / NLP.
GreatCircle – Library for calculating great circle distance.
Optunity examples – Examples demonstrating how to use Optunity in synergy with machine learning libraries.
Dive into Machine Learning with Python Jupyter notebook and scikit-learn – “I learned Python by hacking first, and getting serious later. I wanted to do this with Machine Learning. If this is your style, join me in getting a bit ahead of yourself.”
TDB – TensorDebugger (TDB) is a visual debugger for deep learning. It features interactive, node-by-node debugging and visualization for TensorFlow.
Suiron – Machine Learning for RC Cars.
Introduction to machine learning with scikit-learn – IPython notebooks from Data School’s video tutorials on scikit-learn.
Practical XGBoost in Python – comprehensive online course about using XGBoost in Python.
Introduction to Machine Learning with Python – Notebooks and code for the book “Introduction to Machine Learning with Python”
Pydata book – Materials and IPython notebooks for “Python for Data Analysis” by Wes McKinney, published by O’Reilly Media
Homemade Machine Learning – Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
Prodmodel – Build tool for data science pipelines.
the-elements-of-statistical-learning – This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.
Hyperparameter-Optimization-of-Machine-Learning-Algorithms – Code for hyperparameter tuning/optimization of machine learning and deep learning algorithms.

Neural Networks

nn_builder – nn_builder is a python package that lets you build neural networks in 1 line
NeuralTalk – NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
Neuron – Neuron is simple class for time series predictions. It’s utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron – Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm.
NeuralTalk – NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences. [Deprecated]
Neuron – Neuron is simple class for time series predictions. It’s utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron – Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm. [Deprecated]
Data Driven Code – Very simple implementation of neural networks for dummies in python without using any libraries, with detailed comments.
Machine Learning, Data Science and Deep Learning with Python – LiveVideo course that covers machine learning, Tensorflow, artificial intelligence, and neural networks.
TResNet: High Performance GPU-Dedicated Architecture – TResNet models were designed and optimized to give the best speed-accuracy tradeoff out there on GPUs.
TResNet: Simple and powerful neural network library for python – Variety of supported types of Artificial Neural Network and learning algorithms.
Jina AI An easier way to build neural search in the cloud. Compatible with Jupyter Notebooks.
sequitur PyTorch library for creating and training sequence autoencoders in just two lines of code

Kaggle Competition Source Code

open-solution-home-credit -> source code and experiments results for Home Credit Default Risk.
open-solution-googleai-object-detection -> source code and experiments results for Google AI Open Images – Object Detection Track.
open-solution-salt-identification -> source code and experiments results for TGS Salt Identification Challenge.
open-solution-ship-detection -> source code and experiments results for Airbus Ship Detection Challenge.
open-solution-data-science-bowl-2018 -> source code and experiments results for 2018 Data Science Bowl.
open-solution-value-prediction -> source code and experiments results for Santander Value Prediction Challenge.
open-solution-toxic-comments -> source code for Toxic Comment Classification Challenge.
wiki challenge – An implementation of Dell Zhang’s solution to Wikipedia’s Participation Challenge on Kaggle.
kaggle insults – Kaggle Submission for “Detecting Insults in Social Commentary”.
kaggle_acquire-valued-shoppers-challenge – Code for the Kaggle acquire valued shoppers challenge.
kaggle-cifar – Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet.
kaggle-blackbox – Deep learning made easy.
kaggle-accelerometer – Code for Accelerometer Biometric Competition at Kaggle.
kaggle-advertised-salaries – Predicting job salaries from ads – a Kaggle competition.
kaggle amazon – Amazon access control challenge.
kaggle-bestbuy_big – Code for the Best Buy competition at Kaggle.
kaggle-bestbuy_small
Kaggle Dogs vs. Cats – Code for Kaggle Dogs vs. Cats competition.
Kaggle Galaxy Challenge – Winning solution for the Galaxy Challenge on Kaggle.
Kaggle Gender – A Kaggle competition: discriminate gender based on handwriting.
Kaggle Merck – Merck challenge at Kaggle.
Kaggle Stackoverflow – Predicting closed questions on Stack Overflow.
kaggle_acquire-valued-shoppers-challenge – Code for the Kaggle acquire valued shoppers challenge.
wine-quality – Predicting wine quality.

Reinforcement Learning

DeepMind Lab – DeepMind Lab is a 3D learning environment based on id Software’s Quake III Arena via ioquake3 and other open source software. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning.
Gym – OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
Serpent.AI – Serpent.AI is a game agent framework that allows you to turn any video game you own into a sandbox to develop AI and machine learning experiments. For both researchers and hobbyists.
ViZDoom – ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research in machine visual learning, and deep reinforcement learning, in particular.
Roboschool – Open-source software for robot simulation, integrated with OpenAI Gym.
Retro – Retro Games in Gym
SLM Lab – Modular Deep Reinforcement Learning framework in PyTorch.
Coach – Reinforcement Learning Coach by Intel® AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
garage – A toolkit for reproducible reinforcement learning research
metaworld – An open source robotics benchmark for meta- and multi-task reinforcement learning
acme – An Open Source Distributed Framework for Reinforcement Learning that makes build and train your agents easily.
Spinning Up – An educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning

Ruby

Natural Language Processing

Awesome NLP with Ruby – Curated link list for practical natural language processing in Ruby.
Treat – Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby.
Stemmer – Expose libstemmer_c to Ruby. [Deprecated]
Raspell – raspell is an interface binding for ruby. [Deprecated]
UEA Stemmer – Ruby port of UEALite Stemmer – a conservative stemmer for search and indexing.
Twitter-text-rb – A library that does auto linking and extraction of usernames, lists and hashtags in tweets.

General-Purpose Machine Learning

Awesome Machine Learning with Ruby – Curated list of ML related resources for Ruby.
Ruby Machine Learning – Some Machine Learning algorithms, implemented in Ruby. [Deprecated]
Machine Learning Ruby [Deprecated]
jRuby Mahout – JRuby Mahout is a gem that unleashes the power of Apache Mahout in the world of JRuby. [Deprecated]
CardMagic-Classifier – A general classifier module to allow Bayesian and other types of classifications.
rb-libsvm – Ruby language bindings for LIBSVM which is a Library for Support Vector Machines.
Scoruby – Creates Random Forest classifiers from PMML files.
rumale – Rumale is a machine learning library in Ruby

Data Analysis / Data Visualization

rsruby – Ruby – R bridge.
data-visualization-ruby – Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby. [Deprecated]
ruby-plot – gnuplot wrapper for Ruby, especially for plotting ROC curves into SVG files. [Deprecated]
plot-rb – A plotting library in Ruby built on top of Vega and D3. [Deprecated]
scruffy – A beautiful graphing toolkit for Ruby.
SciRuby
Glean – A data management tool for humans. [Deprecated]
Bioruby
Arel [Deprecated]

Misc

Big Data For Chimps
Listof – Community based data collection, packed in gem. Get list of pretty much anything (stop words, countries, non words) in txt, json or hash. Demo/Search for a list

Rust

General-Purpose Machine Learning

deeplearn-rs – deeplearn-rs provides simple networks that use matrix multiplication, addition, and ReLU under the MIT license.
rustlearn – a machine learning framework featuring logistic regression, support vector machines, decision trees and random forests.
rusty-machine – a pure-rust machine learning library.
leaf – open source framework for machine intelligence, sharing concepts from TensorFlow and Caffe. Available under the MIT license. [Deprecated]
RustNN – RustNN is a feedforward neural network library. [Deprecated]
RusticSOM – A Rust library for Self Organising Maps (SOM).

R

General-Purpose Machine Learning

ahaz – ahaz: Regularization for semiparametric additive hazards regression. [Deprecated]
arules – arules: Mining Association Rules and Frequent Itemsets
biglasso – biglasso: Extending Lasso Model Fitting to Big Data in R.
bmrm – bmrm: Bundle Methods for Regularized Risk Minimization Package.
Boruta – Boruta: A wrapper algorithm for all-relevant feature selection.
bst – bst: Gradient Boosting.
C50 – C50: C5.0 Decision Trees and Rule-Based Models.
caret – Classification and Regression Training: Unified interface to ~150 ML algorithms in R.
caretEnsemble – caretEnsemble: Framework for fitting multiple caret models as well as creating ensembles of such models. [Deprecated]
CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box for R.
Clever Algorithms For Machine Learning
CORElearn – CORElearn: Classification, regression, feature evaluation and ordinal evaluation.
CoxBoost – CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks [Deprecated]
Cubist – Cubist: Rule- and Instance-Based Regression Modeling.
e1071 – e1071: Misc Functions of the Department of Statistics (e1071), TU Wien
earth – earth: Multivariate Adaptive Regression Spline Models
elasticnet – elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA.
ElemStatLearn – ElemStatLearn: Data sets, functions and examples from the book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman.
evtree – evtree: Evolutionary Learning of Globally Optimal Trees.
forecast – forecast: Timeseries forecasting using ARIMA, ETS, STLM, TBATS, and neural network models.
forecastHybrid – forecastHybrid: Automatic ensemble and cross validation of ARIMA, ETS, STLM, TBATS, and neural network models from the “forecast” package.
fpc – fpc: Flexible procedures for clustering.
frbs – frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks. [Deprecated]
GAMBoost – GAMBoost: Generalized linear and additive models by likelihood based boosting. [Deprecated]
gamboostLSS – gamboostLSS: Boosting Methods for GAMLSS.
gbm – gbm: Generalized Boosted Regression Models.
glmnet – glmnet: Lasso and elastic-net regularized generalized linear models.
glmpath – glmpath: L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model.
GMMBoost – GMMBoost: Likelihood-based Boosting for Generalized mixed models. [Deprecated]
grplasso – grplasso: Fitting user specified models with Group Lasso penalty.
grpreg – grpreg: Regularization paths for regression models with grouped covariates.
h2o – A framework for fast, parallel, and distributed machine learning algorithms at scale — Deeplearning, Random forests, GBM, KMeans, PCA, GLM.
hda – hda: Heteroscedastic Discriminant Analysis. [Deprecated]
Introduction to Statistical Learning
ipred – ipred: Improved Predictors.
kernlab – kernlab: Kernel-based Machine Learning Lab.
klaR – klaR: Classification and visualization.
L0Learn – L0Learn: Fast algorithms for best subset selection.
lars – lars: Least Angle Regression, Lasso and Forward Stagewise. [Deprecated]
lasso2 – lasso2: L1 constrained estimation aka ‘lasso’.
LiblineaR – LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library.
LogicReg – LogicReg: Logic Regression.
Machine Learning For Hackers
maptree – maptree: Mapping, pruning, and graphing tree models. [Deprecated]
mboost – mboost: Model-Based Boosting.
medley – medley: Blending regression models, using a greedy stepwise approach.
mlr – mlr: Machine Learning in R.
ncvreg – ncvreg: Regularization paths for SCAD- and MCP-penalized regression models.
nnet – nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models. [Deprecated]
pamr – pamr: Pam: prediction analysis for microarrays. [Deprecated]
party – party: A Laboratory for Recursive Partitioning
partykit – partykit: A Toolkit for Recursive Partitioning.
penalized – penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model.
penalizedLDA – penalizedLDA: Penalized classification using Fisher’s linear discriminant. [Deprecated]
penalizedSVM – penalizedSVM: Feature Selection SVM using penalty functions.
quantregForest – quantregForest: Quantile Regression Forests.
randomForest – randomForest: Breiman and Cutler’s random forests for classification and regression.
randomForestSRC – randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC).
rattle – rattle: Graphical user interface for data mining in R.
rda – rda: Shrunken Centroids Regularized Discriminant Analysis.
rdetools – rdetools: Relevant Dimension Estimation (RDE) in Feature Spaces. [Deprecated]
REEMtree – REEMtree: Regression Trees with Random Effects for Longitudinal (Panel) Data. [Deprecated]
relaxo – relaxo: Relaxed Lasso. [Deprecated]
rgenoud – rgenoud: R version of GENetic Optimization Using Derivatives
Rmalschains – Rmalschains: Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R.
rminer – rminer: Simpler use of data mining methods (e.g. NN and SVM) in classification and regression. [Deprecated]
ROCR – ROCR: Visualizing the performance of scoring classifiers. [Deprecated]
RoughSets – RoughSets: Data Analysis Using Rough Set and Fuzzy Rough Set Theories. [Deprecated]
rpart – rpart: Recursive Partitioning and Regression Trees.
RPMM – RPMM: Recursively Partitioned Mixture Model.
RSNNS – RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS).
RWeka – RWeka: R/Weka interface.
RXshrink – RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression.
sda – sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection. [Deprecated]
spectralGraphTopology – spectralGraphTopology: Learning Graphs from Data via Spectral Constraints.
SuperLearner – Multi-algorithm ensemble learning packages.
svmpath – svmpath: svmpath: the SVM Path algorithm. [Deprecated]
tgp – tgp: Bayesian treed Gaussian process models. [Deprecated]
tree – tree: Classification and regression trees.
varSelRF – varSelRF: Variable selection using random forests.
XGBoost.R – R binding for eXtreme Gradient Boosting (Tree) Library.
Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R.
igraph – binding to igraph library – General purpose graph library.
MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
TDSP-Utilities – Two data science utilities in R from Microsoft: 1) Interactive Data Exploration, Analysis, and Reporting (IDEAR) ; 2) Automated Modeling and Reporting (AMR).

Data Manipulation | Data Analysis | Data Visualization

dplyr – A data manipulation package that helps to solve the most common data manipulation problems.
ggplot2 – A data visualization package based on the grammar of graphics.
tmap for visualizing geospatial data with static maps and leaflet for interactive maps
tm and quanteda are the main packages for managing, analyzing, and visualizing textual data.
shiny is the basis for truly interactive displays and dashboards in R. However, some measure of interactivity can be achieved with htmlwidgets bringing javascript libraries to R. These include, plotly, dygraphs, highcharter, and several others.

SAS

General-Purpose Machine Learning

Visual Data Mining and Machine Learning – Interactive, automated, and programmatic modeling with the latest machine learning algorithms in and end-to-end analytics environment, from data prep to deployment. Free trial available.
Enterprise Miner – Data mining and machine learning that creates deployable models using a GUI or code.
Factory Miner – Automatically creates deployable machine learning models across numerous market or customer segments using a GUI.

Data Analysis / Data Visualization

SAS/STAT – For conducting advanced statistical analysis.
University Edition – FREE! Includes all SAS packages necessary for data analysis and visualization, and includes online SAS courses.

Natural Language Processing

Contextual Analysis – Add structure to unstructured text using a GUI.
Sentiment Analysis – Extract sentiment from text using a GUI.
Text Miner – Text mining using a GUI or code.

Demos and Scripts

ML_Tables – Concise cheat sheets containing machine learning best practices.
enlighten-apply – Example code and materials that illustrate applications of SAS machine learning techniques.
enlighten-integration – Example code and materials that illustrate techniques for integrating SAS with other analytics technologies in Java, PMML, Python and R.
enlighten-deep – Example code and materials that illustrate using neural networks with several hidden layers in SAS.
dm-flow – Library of SAS Enterprise Miner process flow diagrams to help you learn by example about specific data mining topics.

Scala

Natural Language Processing

ScalaNLP – ScalaNLP is a suite of machine learning and numerical computing libraries.
Breeze – Breeze is a numerical processing library for Scala.
Chalk – Chalk is a natural language processing library. [Deprecated]
FACTORIE – FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
Montague – Montague is a semantic parsing library for Scala with an easy-to-use DSL.
Spark NLP – Natural language processing library built on top of Apache Spark ML to provide simple, performant, and accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

Data Analysis / Data Visualization

MLlib in Apache Spark – Distributed machine learning library in Spark
Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
Scalding – A Scala API for Cascading.
Summing Bird – Streaming MapReduce with Scalding and Storm.
Algebird – Abstract Algebra for Scala.
xerial – Data management utilities for Scala. [Deprecated]
PredictionIO – PredictionIO, a machine learning server for software developers and data engineers.
BIDMat – CPU and GPU-accelerated matrix library intended to support large-scale exploratory data analysis.
Flink – Open source platform for distributed stream and batch data processing.
Spark Notebook – Interactive and Reactive Data Science using Scala and Spark.

General-Purpose Machine Learning

DeepLearning.scala – Creating statically typed dynamic neural networks from object-oriented & functional programming constructs.
Conjecture – Scalable Machine Learning in Scalding.
brushfire – Distributed decision tree ensemble learning in Scala.
ganitha – Scalding powered machine learning. [Deprecated]
adam – A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
bioscala – Bioinformatics for the Scala programming language
BIDMach – CPU and GPU-accelerated Machine Learning Library.
Figaro – a Scala library for constructing probabilistic models.
H2O Sparkling Water – H2O and Spark interoperability.
FlinkML in Apache Flink – Distributed machine learning library in Flink.
DynaML – Scala Library/REPL for Machine Learning Research.
Saul – Flexible Declarative Learning-Based Programming.
SwiftLearner – Simply written algorithms to help study ML or write your own implementations.
Smile – Statistical Machine Intelligence and Learning Engine.
doddle-model – An in-memory machine learning library built on top of Breeze. It provides immutable objects and exposes its functionality through a scikit-learn-like API.
TensorFlow Scala – Strongly-typed Scala API for TensorFlow.

Scheme

Neural Networks

layer – Neural network inference from the command line, implemented in CHICKEN Scheme.

Swift

General-Purpose Machine Learning

Bender – Fast Neural Networks framework built on top of Metal. Supports TensorFlow models.
Swift AI – Highly optimized artificial intelligence and machine learning library written in Swift.
Swift for Tensorflow – a next-generation platform for machine learning, incorporating the latest research across machine learning, compilers, differentiable programming, systems design, and beyond.
BrainCore – The iOS and OS X neural network framework.
swix – A bare bones library that includes a general matrix language and wraps some OpenCV for iOS development. [Deprecated]
AIToolbox – A toolbox framework of AI modules written in Swift: Graphs/Trees, Linear Regression, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Algorithms, MDP, Mixture of Gaussians.
MLKit – A simple Machine Learning Framework written in Swift. Currently features Simple Linear Regression, Polynomial Regression, and Ridge Regression.
Swift Brain – The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X development. This project includes algorithms focused on Bayes theorem, neural networks, SVMs, Matrices, etc…
Perfect TensorFlow – Swift Language Bindings of TensorFlow. Using native TensorFlow models on both macOS / Linux.
PredictionBuilder – A library for machine learning that builds predictions using a linear regression.
Awesome CoreML – A curated list of pretrained CoreML models.
Awesome Core ML Models – A curated list of machine learning models in CoreML format.

TensorFlow

General-Purpose Machine Learning

Awesome TensorFlow – A list of all things related to TensorFlow.
Golden TensorFlow – A page of content on TensorFlow, including academic papers and links to related topics.

Tools

Neural Networks

layer – Neural network inference from the command line

Misc

Pinecone – Vector database for applications that require real-time, scalable vector embedding and similarity search.
CatalyzeX – Browser extension (Chrome and Firefox) that automatically finds and shows code implementations for machine learning papers anywhere: Google, Twitter, Arxiv, Scholar, etc.
ML Workspace – All-in-one web-based IDE for machine learning and data science. The workspace is deployed as a docker container and is preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch) and dev tools (e.g., Jupyter, VS Code).
Notebooks – A starter kit for Jupyter notebooks and machine learning. Companion docker images consist of all combinations of python versions, machine learning frameworks (Keras, PyTorch and Tensorflow) and CPU/CUDA versions.
DVC – Data Science Version Control is an open-source version control system for machine learning projects with pipelines support. It makes ML projects reproducible and shareable.
Kedro – Kedro is a data and development workflow framework that implements best practices for data pipelines with an eye towards productionizing machine learning models.
guild.ai – Tool to log, analyze, compare and “optimize” experiments. It’s cross-platform and framework independent, and provided integrated visualizers such as tensorboard.
Sacred – Python tool to help you configure, organize, log and reproduce experiments. Like a notebook lab in the context of Chemistry/Biology. The community has built multiple add-ons leveraging the proposed standard.
MLFlow – platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. Framework and language agnostic, take a look at all the built-in integrations.
Weights & Biases – Machine learning experiment tracking, dataset versioning, hyperparameter search, visualization, and collaboration
More tools to improve the ML lifecycle: Catalyst, PachydermIO. The following are Github-alike and targeting teams Weights & Biases, Neptune.Ml, Comet.ml, Valohai.ai, DAGsHub.
MachineLearningWithTensorFlow2ed – a book on general purpose machine learning techniques regression, classification, unsupervised clustering, reinforcement learning, auto encoders, convolutional neural networks, RNNs, LSTMs, using TensorFlow 1.14.1.
m2cgen – A tool that allows the conversion of ML models into native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart) with zero dependencies.
CML – A library for doing continuous integration with ML projects. Use GitHub Actions & GitLab CI to train and evaluate models in production like environments and automatically generate visual reports with metrics and graphs in pull/merge requests. Framework & language agnostic.
Pythonizr – An online tool to generate boilerplate machine learning code that uses scikit-learn.

Credits

Some of the python libraries were cut-and-pasted from vinta
References for Go were mostly cut-and-pasted from gopherdata

2. Machine Learning with Python – Part II

This curated list contains 840 awesome open-source projects with a total of 2.8M stars grouped into 32 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome!

Discover other best-of lists or create your own.
Subscribe to our newsletter for updates and trending projects.

Machine Learning Frameworks 54 projects
Data Visualization 49 projects
Text Data & NLP 82 projects
Image Data 49 projects
Graph Data 29 projects
Audio Data 23 projects
Geospatial Data 22 projects
Financial Data 23 projects
Time Series Data 20 projects
Medical Data 19 projects
Optical Character Recognition 11 projects
Data Containers & Structures 28 projects
Data Loading & Extraction 23 projects
Web Scraping & Crawling 1 projects
Data Pipelines & Streaming 35 projects
Distributed Machine Learning 26 projects
Hyperparameter Optimization & AutoML 45 projects
Reinforcement Learning 19 projects
Recommender Systems 14 projects
Privacy Machine Learning 6 projects
Workflow & Experiment Tracking 35 projects
Model Serialization & Conversion 11 projects
Model Interpretability 46 projects
Vector Similarity Search (ANN) 12 projects
Probabilistics & Statistics 21 projects
Adversarial Robustness 8 projects
GPU Utilities 18 projects
Tensorflow Utilities 13 projects
Sklearn Utilities 17 projects
Pytorch Utilities 27 projects
Database Clients 1 projects
Others 52 projects

Explanation

Combined project-quality score
Star count from GitHub
New project (less than 6 months old)
Inactive project (6 months no activity)
Dead project (12 months no activity)
Project is trending up or down
Project was recently added
Warning (e.g. missing/risky license)
Contributors count from GitHub
Fork count from GitHub
Issue count from GitHub
Last update timestamp on package manager
Download count from package manager
Number of dependent projects
Tensorflow related project
Sklearn related project
PyTorch related project
MxNet related project
Apache Spark related project
Jupyter related project
PaddlePaddle related project
Pandas related project

Machine Learning Frameworks

General-purpose machine learning and deep learning frameworks.

Tensorflow (44 · 160K) – An Open Source Machine Learning Framework for Everyone. Apache-2 PyTorch (39 · 47K) – Tensors and Dynamic neural networks in Python with strong GPU.. BSD-3 PySpark (38 · 29K) – Apache Spark Python API. Apache-2 scikit-learn (37 · 45K) – scikit-learn: machine learning in Python. BSD-3 StatsModels (36 · 6.1K) – Statsmodels: statistical modeling and econometrics in Python. BSD-3 Keras (35 · 51K) – Deep Learning for humans. MIT XGBoost (35 · 21K) – Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or.. Apache-2 LightGBM (35 · 12K) – A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT,.. MIT MXNet (34 · 19K) – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning.. Apache-2 Theano (34 · 9.4K) – Theano is a Python library that allows you to define, optimize, and.. BSD-3 PyFlink (33 · 16K) – Apache Flink Python API. Apache-2 pytorch-lightning (33 · 12K) – The lightweight PyTorch wrapper for high-performance.. Apache-2 Fastai (32 · 21K) – The fastai deep learning library. Apache-2 jax (32 · 12K) – Composable transformations of Python+NumPy programs: differentiate,.. Apache-2 Thinc (32 · 2.2K) – A refreshing functional take on deep learning, compatible with your favorite.. MIT Catboost (31 · 5.8K) – A fast, scalable, high performance Gradient Boosting on Decision.. Apache-2 Chainer (31 · 5.5K) – A flexible framework of neural networks for deep learning. MIT PaddlePaddle (30 · 15K) – PArallel Distributed Deep LEarning: Machine Learning.. Apache-2 TFlearn (30 · 9.5K) – Deep learning library featuring a higher-level API for TensorFlow. MIT Vowpal Wabbit (30 · 7.5K) – Vowpal Wabbit is a machine learning system which pushes the.. BSD-3 Turi Create (28 · 10K) – Turi Create simplifies the development of custom machine learning.. BSD-3 Sonnet (28 · 8.8K) – TensorFlow-based neural network library. Apache-2 dyNET (28 · 3.2K) – DyNet: The Dynamic Neural Network Toolkit. Apache-2 tensorpack (27 · 6K · ) – A Neural Net Training Interface on TensorFlow, with focus.. Apache-2 Ignite (27 · 3.5K) – High-level library to help with training and evaluating neural.. BSD-3 Jina (27 · 2.5K) – An easier way to build neural search on the cloud. Apache-2 Flax (27 · 1.5K) – Flax is a neural network ecosystem for JAX that is designed for.. Apache-2 jaxCNTK (26 · 17K · ) – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit. MIT skorch (26 · 3.8K) – A scikit-learn compatible neural network library that wraps.. BSD-3 mlpack (26 · 3.6K) – mlpack: a scalable C++ machine learning library –. BSD-3 Ludwig (25 · 7.6K) – Ludwig is a toolbox that allows to train and evaluate deep.. Apache-2 xLearn (25 · 2.9K · ) – High performance, easy-to-use, and scalable machine learning (ML).. Apache-2 Neural Network Libraries (24 · 2.4K) – Neural Network Libraries. Apache-2 ktrain (24 · 760) – ktrain is a Python library that makes deep learning and AI more.. Apache-2 tensorflow-upstream (24 · 550) – TensorFlow ROCm port. Apache-2 SHOGUN (23 · 2.8K) – Unified and efficient Machine Learning. BSD-3 einops (23 · 2.6K) – Deep learning operations reinvented (for pytorch, tensorflow, jax and.. MIT fklearn (23 · 1.3K) – fklearn: Functional Machine Learning. Apache-2 mace (21 · 4.3K) – MACE is a deep learning inference framework optimized for mobile.. Apache-2 Neural Tangents (21 · 1.3K) – Fast and Easy Infinite Neural Networks in Python. Apache-2 ThunderSVM (20 · 1.3K) – ThunderSVM: A Fast SVM Library on GPUs and CPUs. Apache-2 Haiku (20 · 1K) – JAX-based neural network library. Apache-2 Torchbearer (20 · 590) – torchbearer: A model fitting library for PyTorch. MIT Objax (19 · 580) – Objax is a machine learning framework that provides an Object.. Apache-2 jaxelegy (17 · 180) – Elegy is a framework-agnostic Trainer interface for the Jax.. Apache-2 jaxThunderGBM (16 · 580) – ThunderGBM: Fast GBDTs and Random Forests on GPUs. Apache-2 NeoML (13 · 570) – Machine learning framework for both deep learning and traditional.. Apache-2Show 7 hidden projects…

Data Visualization

General-purpose and task-specific data visualization libraries.

Matplotlib (41 · 13K) – matplotlib: plotting with Python. Python-2.0 Seaborn (37 · 8.2K) – Statistical data visualization using matplotlib. BSD-3 Plotly (35 · 9.1K) – The interactive graphing library for Python (includes Plotly Express). MIT dash (34 · 14K) – Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required. MIT Bokeh (33 · 15K) – Interactive Data Visualization in the browser, from Python. BSD-3 pyecharts (31 · 11K) – Python Echarts Plotting Library. MIT wordcloud (31 · 7.9K) – A little word cloud generator in Python. MIT Altair (31 · 6.5K) – Declarative statistical visualization library for Python. BSD-3 UMAP (30 · 4.6K) – Uniform Manifold Approximation and Projection. BSD-3 bqplot (30 · 3K) – Plotting library for IPython/Jupyter notebooks. Apache-2 PyQtGraph (30 · 2.3K) – Fast data visualization and GUI tools for scientific / engineering.. MIT pandas-profiling (29 · 6.9K) – Create HTML profiling reports from pandas DataFrame.. MIT VisPy (29 · 2.6K) – High-performance interactive 2D/3D data visualization library. BSD-3 Graphviz (29 · 940) – Simple Python interface for Graphviz. MIT datashader (28 · 2.4K) – Quickly and accurately render even the largest data. BSD-3 HoloViews (28 · 1.8K) – With Holoviews, your data visualizes itself. BSD-3 Cufflinks (27 · 2.1K) – Productivity Tools for Plotly + Pandas. MIT PyVista (27 · 720) – 3D plotting and mesh analysis through a streamlined interface for the.. MIT data-validation (27 · 530) – Library for exploring and validating machine learning.. Apache-2 Perspective (26 · 3.3K) – Streaming pivot visualization via WebAssembly. Apache-2 missingno (26 · 2.7K) – Missing data visualization module for Python. MIT pythreejs (26 · 710) – A Jupyter – Three.js bridge. BSD-3 Facets Overview (25 · 6.5K) – Visualizations for machine learning datasets. Apache-2 Chartify (25 · 2.8K) – Python library that makes it easy for data scientists to create.. Apache-2 HyperTools (25 · 1.6K) – A Python toolbox for gaining geometric insights into high-dimensional.. MIT hvPlot (25 · 360) – A high-level plotting API for pandas, dask, xarray, and networkx built on.. BSD-3 openTSNE (24 · 760) – Extensible, parallel implementations of t-SNE. BSD-3 PandasGUI (23 · 2.1K) – A GUI for Pandas DataFrames. MIT python-ternary (23 · 400) – Ternary plotting library for python with matplotlib. MIT D-Tale (22 · 2.1K) – Visualizer for pandas data structures. ❗️LGPL-2.1 Multicore-TSNE (22 · 1.5K · ) – Parallel t-SNE implementation with Python and Torch.. BSD-3 Pandas-Bokeh (22 · 630) – Bokeh Plotting Backend for Pandas and GeoPandas. MIT vega (22 · 300) – IPython/Jupyter notebook module for Vega and Vega-Lite. BSD-3 Sweetviz (20 · 1.4K) – Visualize and compare datasets, target values and associations, with one.. MIT lets-plot (20 · 520) – An open-source plotting library for statistical data. MIT joypy (20 · 320) – Joyplots in Python with matplotlib & pandas. MIT HiPlot (19 · 2K) – HiPlot makes understanding high dimensional data easy. MIT animatplot (19 · 360) – A python package for animating plots build on matplotlib. MIT PyWaffle (18 · 400 · ) – Make Waffle Charts in Python. MIT AutoViz (18 · 310) – Automatically Visualize any dataset, any size with a single line of.. Apache-2 FiftyOne (18 · 220) – Visualize, create, and debug image and video datasets.. Apache-2 data-describe (14 · 270) – datadescribe: Pythonic EDA Accelerator for Data Science. Apache-2 nx-altair (14 · 160 · ) – Draw interactive NetworkX graphs with Altair. MIT Show 6 hidden projects…

Text Data & NLP

Libraries for processing, cleaning, manipulating, and analyzing text data as well as libraries for NLP tasks such as language detection, fuzzy matching, classification, seq2seq learning, conversational AI, keyword extraction, and translation.

spaCy (37 · 20K) – Industrial-strength Natural Language Processing (NLP) in Python. MIT transformers (36 · 42K) – Transformers: State-of-the-art Natural Language.. Apache-2 gensim (35 · 12K) – Topic Modelling for Humans. ❗️LGPL-2.1 nltk (34 · 9.7K) – Suite of libraries and programs for symbolic and statistical natural.. Apache-2 AllenNLP (32 · 9.8K) – An open-source NLP research library, built on PyTorch. Apache-2 fairseq (31 · 11K) – Facebook AI Research Sequence-to-Sequence Toolkit written in Python. MIT ChatterBot (31 · 11K · ) – ChatterBot is a machine learning, conversational dialog engine.. BSD-3 sentencepiece (31 · 4.9K) – Unsupervised text tokenizer for Neural Network-based text.. Apache-2 fastText (30 · 22K · ) – Library for fast text representation and classification. MIT flair (30 · 10K) – A very simple framework for state-of-the-art Natural Language Processing.. MIT snowballstemmer (30 · 480) – Snowball compiler and stemming algorithms. BSD-3 TextBlob (29 · 7.6K) – Simple, Pythonic, text processing–Sentiment analysis, part-of-speech.. MIT torchtext (29 · 2.7K · ) – Data loaders and abstractions for text and NLP. BSD-3 Rasa (28 · 11K) – Open source machine learning framework to automate text- and voice-.. Apache-2 OpenNMT (28 · 4.9K) – Open Source Neural Machine Translation in PyTorch. MIT sentence-transformers (28 · 4.4K) – Sentence Embeddings with BERT & XLNet. Apache-2 Tokenizers (28 · 4.3K) – Fast State-of-the-Art Tokenizers optimized for Research and.. Apache-2 Dedupe (28 · 2.9K) – A python library for accurate and scalable fuzzy matching, record.. MIT phonenumbers (28 · 2.6K) – Python port of Google’s libphonenumber. Apache-2 DeepPavlov (26 · 5.1K) – An open source library for deep learning end-to-end dialog.. Apache-2 ftfy (26 · 2.9K) – Fixes mojibake and other glitches in Unicode text, after the fact. MIT GluonNLP (26 · 2.2K) – Toolkit that enables easy text preprocessing, datasets loading.. Apache-2 TextDistance (26 · 1.9K) – Compute distance between sequences. 30+ algorithms, pure python.. MIT textacy (26 · 1.6K) – NLP, before and after spaCy. Apache-2 jellyfish (26 · 1.4K) – a python library for doing approximate and phonetic matching of.. BSD-2 TensorFlow Text (26 · 700) – Making text a first-class citizen in TensorFlow. Apache-2 CLTK (26 · 650) – The Classical Language Toolkit. MIT inflect (26 · 490) – Correctly generate plurals, ordinals, indefinite articles; convert numbers.. MIT ParlAI (25 · 7K) – A framework for training and evaluating AI models on a variety of.. MIT PyText (25 · 6.1K) – A natural language modeling framework based on PyTorch. BSD-3 stanza (25 · 5.3K · ) – Official Stanford NLP Python Library for Many Human Languages. Apache-2 vaderSentiment (25 · 2.9K · ) – VADER Sentiment Analysis. VADER (Valence Aware Dictionary.. MIT spark-nlp (25 · 2K) – State of the Art Natural Language Processing. Apache-2 haystack (25 · 1.5K) – End-to-end Python framework for building natural language search.. Apache-2 pyahocorasick (25 · 590) – Python module (C extension and plain python) implementing Aho-.. BSD-3 T5 (24 · 3.2K) – Code for the paper Exploring the Limits of Transfer Learning with a.. Apache-2 Sumy (24 · 2.5K) – Module for automatic summarization of text documents and HTML pages. Apache-2 fastNLP (24 · 2K) – fastNLP: A Modularized and Extensible NLP Framework. Currently still.. Apache-2 pytorch-nlp (24 · 1.9K) – Basic Utilities for PyTorch Natural Language Processing (NLP). BSD-3 scattertext (24 · 1.5K · ) – Beautiful visualizations of how language differs among.. Apache-2 sense2vec (24 · 1.2K) – Contextually-keyed word vectors. MIT spacy-transformers (24 · 920) – Use pretrained transformers like BERT, XLNet and GPT-2.. MIT spacySciSpacy (24 · 850) – A full spaCy pipeline and models for scientific/biomedical documents. Apache-2 Ciphey (23 · 6.5K) – Automatically decrypt encryptions without knowing the key or cipher,.. MIT flashtext (23 · 4.7K · ) – Extract Keywords from sentence or Replace keywords in sentences. MIT neuralcoref (23 · 2.2K) – Fast Coreference Resolution in spaCy with Neural Networks. MIT pySBD (23 · 290) – pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence.. MIT textgenrnn (22 · 4.3K · ) – Easily train your own text-generating neural network of any.. MIT fast-bert (22 · 1.5K) – Super easy library for BERT based NLP models. Apache-2 PyTextRank (22 · 1.5K · ) – Python implementation of TextRank for phrase extraction and.. MIT FARM (22 · 1.1K) – Fast & easy transfer learning for NLP. Harvesting language models.. Apache-2 DeepMatcher (21 · 3.5K · ) – Python package for performing Entity and Text Matching using.. BSD-3 gpt-2-simple (21 · 2.5K) – Python package to easily retrain OpenAI’s GPT-2 text-.. MIT Texar (21 · 2.1K · ) – Toolkit for Machine Learning, Natural Language Processing, and.. Apache-2 NLP Architect (20 · 2.6K) – A model library for exploring state-of-the-art deep learning.. Apache-2 NeMo (20 · 2.5K) – NeMo: a toolkit for conversational AI. Apache-2 DELTA (20 · 1.4K) – DELTA is a deep learning based natural language and speech.. Apache-2 Sockeye (20 · 990) – Sequence-to-sequence framework with a focus on Neural Machine.. Apache-2 YouTokenToMe (20 · 720) – Unsupervised text tokenizer focused on computational efficiency. MIT finetune (20 · 630) – Scikit-learn style model finetuning for NLP. MPL-2.0 Texthero (19 · 2.1K) – Text preprocessing, representation and visualization from zero to hero. MIT textpipe (19 · 280) – Textpipe: clean and extract metadata from text. MIT Kashgari (18 · 2K) – Kashgari is a production-level NLP Transfer learning framework.. Apache-2 Camphr (18 · 330) – spaCy plugin for Transformers , Udify, ELmo, etc. Apache-2 spacyskift (18 · 210) – scikit-learn wrappers for Python fastText. MIT Translate (15 · 680) – Translate – a PyTorch Language Library. BSD-3 VizSeq (15 · 310) – An Analysis Toolkit for Natural Language Generation (Translation,.. MIT OpenNRE (14 · 3K) – An Open-Source Package for Neural Relation Extraction (NRE). MIT TransferNLP (14 · 290 · ) – NLP library designed for reproducible experimentation.. MIT NeuralQA (14 · 180) – NeuralQA: A Usable Library for Question Answering on Large Datasets with.. MIT textvec (13 · 170) – Text vectorization tool to outperform TFIDF for classification tasks. MIT Show 11 hidden projects…

Image Data

Libraries for image & video processing, manipulation, and augmentation as well as libraries for computer vision tasks such as facial recognition, object detection, and classification.

Pillow (39 · 8.3K) – The friendly PIL fork (Python Imaging Library). ❗️PIL torchvision (36 · 8.6K) – Datasets, Transforms and Models specific to Computer Vision. BSD-3 scikit-image (33 · 4.2K) – Image processing in Python. BSD-2 imgaug (31 · 11K · ) – Image augmentation for machine learning experiments. MIT imageio (31 · 840) – Python library for reading and writing image data. BSD-2 opencv-python (30 · 1.8K) – Automated CI toolchain to produce precompiled opencv-python,.. MIT Wand (30 · 1.1K) – The ctypes-based simple ImageMagick binding for Python. MIT Face Recognition (29 · 39K) – The world’s simplest facial recognition api for Python.. MIT MoviePy (29 · 7.3K) – Video editing with Python. MIT PyTorch Image Models (28 · 7.9K · ) – PyTorch image models, scripts, pretrained weights –.. Apache-2 Albumentations (28 · 7.5K) – Fast image augmentation library and easy to use wrapper.. MIT Kornia (28 · 3.7K) – Open Source Differentiable Computer Vision Library for PyTorch. Apache-2 imutils (28 · 3.6K) – A series of convenience functions to make basic image processing.. MIT ImageHash (28 · 1.9K) – A Python Perceptual Image Hashing Module. BSD-2 imageai (27 · 6K) – A python library built to empower developers to build applications and.. MIT GluonCV (27 · 4.6K) – Gluon CV Toolkit. Apache-2 detectron2 (26 · 15K) – Detectron2 is FAIR’s next-generation platform for object.. Apache-2 InsightFace (26 · 8.7K) – Face Analysis Project on MXNet. MIT MMDetection (25 · 14K) – OpenMMLab Detection Toolbox and Benchmark. Apache-2 PyTorch3D (25 · 4.6K) – PyTorch3D is FAIR’s library of reusable components for deep.. MIT facenet-pytorch (25 · 1.9K) – Pretrained Pytorch face detection (MTCNN) and recognition.. MIT mahotas (25 · 670) – Computer Vision in Python. MIT Augmentor (24 · 4.3K · ) – Image augmentation library in Python for machine learning. MIT mtcnn (24 · 1.4K) – MTCNN face detection implementation for TensorFlow, as a PIP package. MIT Face Alignment (23 · 4.7K) – 2D and 3D Face alignment library build using pytorch. BSD-3 CellProfiler (23 · 550) – An open-source application for biological image analysis. BSD-3 segmentation_models (22 · 3K · ) – Segmentation models with pretrained backbones. Keras.. MIT vidgear (22 · 1.7K) – High-performance cross-platform Video Processing Python framework.. Apache-2 pyvips (22 · 300) – python binding for libvips using cffi. MIT Image Deduplicator (21 · 3.4K) – Finding duplicate images made easy!. Apache-2 Image Super-Resolution (21 · 2.6K) – Super-scale your images and run experiments with.. Apache-2 tensorflow-graphics (21 · 2.4K) – TensorFlow Graphics: Differentiable Graphics Layers.. Apache-2 Classy Vision (21 · 1.2K) – An end-to-end PyTorch framework for image and video.. MIT Torch Points 3D (21 · 1.1K) – Pytorch framework for doing deep learning on point clouds. BSD-3 MMF (20 · 4.2K) – A modular framework for vision & language multimodal research from.. BSD-3 image-match (20 · 2.5K) – Quickly search over billions of images. Apache-2 nude.py (20 · 790) – Nudity detection with Python. MIT Caer (20 · 450) – A lightweight Computer Vision library. Scale your models, not boilerplate. MIT vit-pytorch (18 · 2.9K · ) – Implementation of Vision Transformer, a simple way to.. MIT Norfair (18 · 920) – Lightweight Python library for adding real-time 2D object tracking to.. BSD-3 PaddleDetection (17 · 2.3K) – Object detection and instance segmentation toolkit.. Apache-2 lightly (17 · 430 · ) – A python library for self-supervised learning on images. MIT pycls (15 · 1.5K) – Codebase for Image Classification Research, written in PyTorch. MIT DE⫶TR (14 · 6.4K) – End-to-End Object Detection with Transformers. Apache-2 PySlowFast (14 · 3.4K) – PySlowFast: video understanding codebase from FAIR for.. Apache-2 Show 4 hidden projects…

Graph Data

Libraries for graph processing, clustering, embedding, and machine learning tasks.

networkx (33 · 8.8K · ) – Network Analysis in Python. BSD-3 PyTorch Geometric (29 · 10K · ) – Geometric Deep Learning Extension Library for PyTorch. MIT dgl (26 · 6.8K) – Python package built to ease deep learning on graph, on top of existing.. Apache-2 StellarGraph (25 · 1.8K) – StellarGraph – Machine Learning on Graphs. Apache-2 Spektral (23 · 1.7K) – Graph Neural Networks with Keras and Tensorflow 2. MIT ogb (22 · 770) – Benchmark datasets, data loaders, and evaluators for graph machine learning. MIT Node2Vec (22 · 650) – Implementation of the node2vec algorithm. MIT torch-cluster (21 · 340) – PyTorch Extension Library of Optimized Graph Cluster.. MIT AmpliGraph (20 · 1.4K · ) – Python library for Representation Learning on Knowledge.. Apache-2 PyTorch-BigGraph (19 · 2.7K) – Generate embeddings from large-scale graph-structured.. BSD-3 PyKEEN (19 · 330) – A Python library for learning and evaluating knowledge graph embeddings. MIT graph-nets (18 · 4.8K) – Build Graph Nets in Tensorflow. Apache-2 DeepGraph (18 · 230) – Analyze Data with Pandas-based Networks. Documentation:. BSD-3 Paddle Graph Learning (17 · 920) – Paddle Graph Learning (PGL) is an efficient and.. Apache-2 kglib (16 · 400) – Grakn Knowledge Graph Library (ML R&D). Apache-2 pytorch_geometric_temporal (16 · 370) – A Temporal Extension Library for PyTorch Geometric. MIT GraphEmbedding (15 · 1.8K) – Implementation and experiments of graph embedding algorithms. MIT Euler (14 · 2.5K · ) – A distributed graph deep learning framework. Apache-2 AutoGL (14 · 590 · ) – An autoML framework & toolkit for machine learning on graphs. MIT OpenKE (13 · 2.4K · ) – An Open-Source Package for Knowledge Embedding (KE). MIT GraphVite (13 · 860) – GraphVite: A General and High-performance Graph Embedding System. Apache-2Show 8 hidden projects…

Audio Data

Libraries for audio analysis, manipulation, transformation, and extraction, as well as speech recognition and music generation tasks.

DeepSpeech (31 · 17K) – DeepSpeech is an open source embedded (offline, on-device).. MPL-2.0 Pydub (30 · 5.2K · ) – Manipulate audio with a simple and easy high level interface. MIT Magenta (29 · 16K) – Magenta: Music and Art Generation with Machine Intelligence. Apache-2 torchaudio (29 · 1.3K · ) – Data manipulation and transformation for audio signal.. BSD-2 librosa (27 · 4.3K) – Python library for audio and music analysis. ISC audioread (26 · 360) – cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.. MIT spleeter (25 · 16K) – Deezer source separation library including pretrained models. MIT pyAudioAnalysis (25 · 3.8K) – Python Audio Analysis Library: Feature Extraction,.. Apache-2 python-soundfile (25 · 370) – SoundFile is an audio library based on libsndfile, CFFI, and.. BSD-3 espnet (24 · 3.5K) – End-to-End Speech Processing Toolkit. Apache-2 python_speech_features (23 · 1.9K) – This library provides common speech features for ASR.. MIT tinytag (23 · 440) – Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and.. MIT Porcupine (22 · 2.4K) – On-device wake word detection powered by deep learning. Apache-2 DDSP (22 · 1.8K) – DDSP: Differentiable Digital Signal Processing. Apache-2 kapre (21 · 720) – kapre: Keras Audio Preprocessors. MIT Dejavu (20 · 5.3K · ) – Audio fingerprinting and recognition in Python. MIT TTS (20 · 3.3K) – Deep learning for Text to Speech (Discussion forum:.. MPL-2.0 Muda (17 · 180 · ) – A library for augmenting annotated audio data. ISC Julius (14 · 180 · ) – Fast PyTorch based DSP for audio and 1D signals. MIT Show 4 hidden projects…

Geospatial Data

Libraries to load, process, analyze, and write geographic data as well as libraries for spatial analysis, map visualization, and geocoding.

pydeck (33 · 8.5K) – WebGL2 powered geospatial visualization layers. MIT folium (32 · 5.2K) – Python Data. Leaflet.js Maps. MIT geopy (32 · 3.2K) – Geocoding library for Python. MIT Shapely (32 · 2.2K) – Manipulation and analysis of geometric objects. BSD-3 GeoPandas (31 · 2.5K) – Python tools for geographic data. BSD-3 pyproj (31 · 580 · ) – Python interface to PROJ (cartographic projections and coordinate.. MIT Rasterio (30 · 1.4K) – Rasterio reads and writes geospatial raster datasets. BSD-3 Fiona (30 · 780) – Fiona reads and writes geographic data files. BSD-3 ipyleaflet (28 · 1.1K · ) – A Jupyter – Leaflet.js bridge. MIT geojson (26 · 600) – Python bindings and utilities for GeoJSON. BSD-3 ArcGIS API (25 · 980) – Documentation and samples for ArcGIS API for Python. Apache-2 PySAL (25 · 830) – PySAL: Python Spatial Analysis Library Meta-Package. BSD-3 GeoViews (22 · 330) – Simple, concise geographical visualization in Python. BSD-3 EarthPy (20 · 230) – A package built to support working with spatial data using open source.. BSD-3 pymap3d (19 · 180) – pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef.. BSD-2Show 7 hidden projects…

Financial Data

Libraries for algorithmic stock/crypto trading, risk analytics, backtesting, technical analysis, and other tasks on financial data.

zipline (30 · 14K) – Zipline, a Pythonic Algorithmic Trading Library. Apache-2 yfinance (30 · 4.5K) – Yahoo! Finance market data downloader (+faster Pandas Datareader). Apache-2 Alpha Vantage (27 · 3.2K) – A python wrapper for Alpha Vantage API for financial data. MIT ta (27 · 1.9K) – Technical Analysis Library using Pandas and Numpy. MIT pyfolio (26 · 3.6K · ) – Portfolio and risk analytics in Python. Apache-2 empyrical (25 · 740) – Common financial risk and performance metrics. Used by zipline and.. Apache-2 Alphalens (24 · 1.8K · ) – Performance analysis of predictive (alpha) stock factors. Apache-2 IB-insync (24 · 1.3K) – Python sync/async framework for Interactive Brokers API. BSD-2 bt (24 · 980) – bt – flexible backtesting for Python. MIT ffn (24 · 800) – ffn – a financial function library for Python. MIT Enigma Catalyst (23 · 2K) – An Algorithmic Trading Library for Crypto-Assets in Python. Apache-2 stockstats (23 · 730) – Supply a wrapper “StockDataFrame“ based on the.. BSD-3 TensorTrade (21 · 3K) – An open source reinforcement learning framework for training,.. Apache-2 finmarketpy (20 · 2.5K) – Python library for backtesting trading strategies & analyzing.. Apache-2 Qlib (19 · 4.6K) – Qlib is an AI-oriented quantitative investment platform, which aims to.. MIT tf-quant-finance (19 · 2.5K) – High-performance TensorFlow library for quantitative.. Apache-2 Crypto Signals (18 · 2.7K) – Github.com/CryptoSignal – #1 Quant Trading & Technical Analysis.. MITShow 6 hidden projects…

Time Series Data

Libraries for forecasting, anomaly detection, feature extraction, and machine learning on time-series and sequential data.

Prophet (28 · 12K) – Tool for producing high quality forecasts for time series data that has.. MIT tsfresh (27 · 5.5K) – Automatic extraction of relevant features from time series:. MIT sktime (27 · 3.7K) – A unified framework for machine learning with time series. BSD-3 pmdarima (26 · 830) – A statistical library designed to fill the void in Python’s time series.. MIT tslearn (25 · 1.5K) – A machine learning toolkit dedicated to time-series data. BSD-2 Streamz (24 · 920) – Real-time stream processing for python. BSD-3 GluonTS (23 · 1.8K) – Probabilistic time series modeling in Python. Apache-2 Darts (22 · 750) – A python library for easy manipulation and forecasting of time series. Apache-2 STUMPY (20 · 1.7K) – STUMPY is a powerful and scalable Python library for computing a Matrix.. BSD-3 pyts (20 · 890 · ) – A Python package for time series classification. BSD-3 pytorch-forecasting (19 · 830) – Time series forecasting with PyTorch. MIT seglearn (19 · 430) – Python module for machine learning time series:. BSD-3 matrixprofile-ts (18 · 620 · ) – A Python library for detecting patterns and anomalies.. Apache-2 Auto TS (18 · 190) – Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost.. Apache-2 ADTK (17 · 610 · ) – A Python toolkit for rule-based/unsupervised anomaly detection in time.. MPL-2.0 tick (17 · 320 · ) – Module for statistical learning, with a particular emphasis on time-.. BSD-3 atspy (16 · 340) – AtsPy: Automated Time Series Models in Python (by @firmai). MITShow 3 hidden projects…

Medical Data

Libraries for processing and analyzing medical data such as MRIs, EEGs, genomic data, and other medical imaging formats.

Lifelines (29 · 1.6K) – Survival analysis in Python. MIT Nilearn (29 · 710) – Machine learning for NeuroImaging in Python. BSD-3 NIPYPE (29 · 560) – Workflows and interfaces for neuroimaging packages. Apache-2 NiBabel (29 · 390) – Python package to access a cacophony of neuro-imaging file formats. MIT MNE (27 · 1.5K) – MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python. BSD-3 DIPY (27 · 390) – DIPY is the paragon 3D/4D+ imaging library in Python. Contains generic.. BSD-3 Hail (24 · 700) – Scalable genomic data analysis. MIT NIPY (23 · 290) – Neuroimaging in Python FMRI analysis package. BSD-3 MONAI (22 · 1.8K) – AI Toolkit for Healthcare Imaging. Apache-2 DeepVariant (21 · 2.2K) – DeepVariant is an analysis pipeline that uses a deep neural.. BSD-3 NiftyNet (21 · 1.3K · ) – [unmaintained] An open-source convolutional neural.. Apache-2 Brainiak (19 · 230) – Brain Imaging Analysis Kit. Apache-2 Glow (19 · 160) – An open-source toolkit for large-scale genomic analysis. Apache-2 Medical Detection Toolkit (12 · 910 · ) – The Medical Detection Toolkit contains 2D + 3D.. Apache-2 MedicalNet (11 · 1.1K · ) – Many studies have shown that the performance on deep learning is.. MITShow 4 hidden projects…

Optical Character Recognition

Libraries for optical character recognition (OCR) and text extraction from images or videos.

Tesseract (30 · 3.5K) – Python-tesseract is an optical character recognition (OCR) tool.. Apache-2 EasyOCR (28 · 11K) – Ready-to-use OCR with 80+ supported languages and all popular writing.. Apache-2 OCRmyPDF (27 · 4K) – OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to.. MPL-2.0 tesserocr (26 · 1.4K) – A Python wrapper for the tesseract-ocr API. MIT PaddleOCR (24 · 11K) – Awesome multilingual OCR toolkits based on PaddlePaddle.. Apache-2 attention-ocr (21 · 840) – A Tensorflow model for text recognition (CNN + seq2seq with.. MIT keras-ocr (20 · 780) – A packaged and flexible version of the CRAFT text detector and.. MIT calamari (19 · 790) – Line based ATR Engine based on OCRopy. Apache-2 doc2text (18 · 1.2K) – Detect text blocks and OCR poorly scanned PDFs in bulk. Python module.. MIT Mozart (10 · 240 · ) – An optical music recognition (OMR) system. Converts sheet.. Apache-2 Show 1 hidden projects…

Data Containers & Structures

General-purpose data containers & structures as well as utilities & extensions for pandas.

pandas (40 · 29K) – Flexible and powerful data analysis / manipulation library for.. BSD-3 numpy (38 · 17K) – The fundamental package for scientific computing with Python. BSD-3 h5py (36 · 1.5K) – HDF5 for Python — The h5py package is a Pythonic interface to the HDF5.. BSD-3 Arrow (35 · 7.5K) – Apache Arrow is a cross-language development platform for in-memory.. Apache-2 xarray (32 · 2K) – N-D labeled arrays and datasets in Python. Apache-2 numexpr (31 · 1.6K) – Fast numerical array expression evaluator for Python, NumPy, PyTables,.. MIT TinyDB (29 · 4.1K) – TinyDB is a lightweight document oriented database optimized for your.. MIT Koalas (29 · 2.7K) – Koalas: pandas API on Apache Spark. Apache-2 Bottleneck (29 · 580) – Fast NumPy array functions written in C. BSD-2 Modin (28 · 5.8K) – Modin: Speed up your Pandas workflows by changing a single line of.. Apache-2 PyTables (28 · 1K) – A Python package to manage extremely large amounts of data. BSD-3 datasketch (27 · 1.4K) – MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog,.. MIT zarr (26 · 660) – An implementation of chunked, compressed, N-dimensional arrays for Python. MIT bcolz (25 · 910) – A columnar data container that can be compressed. BSD-3 Arctic (24 · 2.2K) – Arctic is a high performance datastore for numeric data. ❗️LGPL-2.1 swifter (24 · 1.6K) – A package which efficiently applies any function to a pandas.. MIT Pandaral·lel (24 · 1.4K) – A simple and efficient tool to parallelize Pandas.. BSD-3 Vaex (23 · 5.9K) – Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and.. MIT datatable (21 · 1.2K) – A Python package for manipulating 2-dimensional tabular data.. MPL-2.0 StaticFrame (21 · 220) – Immutable and grow-only Pandas-like DataFrames with a more explicit.. MIT fletcher (20 · 210) – Pandas ExtensionDType/Array backed by Apache Arrow. MIT Bounter (17 · 900 · ) – Efficient Counter that uses a limited (bounded) amount of memory.. MIT PandaPy (14 · 470) – PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x.. MIT Show 5 hidden projects…

Data Loading & Extraction

Libraries for loading, collecting, and extracting data from a variety of data sources and formats.

Faker (36 · 12K) – Faker is a Python package that generates fake data for you. MIT xlrd (34 · 1.9K) – Please use openpyxl where you can… BSD-3 xmltodict (32 · 4.3K · ) – Python module that makes working with XML feel like you are.. MIT TensorFlow Datasets (32 · 2.7K) – TFDS is a collection of datasets ready to use with.. Apache-2 python-magic (32 · 1.8K) – A python wrapper for libmagic. MIT Tablib (31 · 3.9K) – Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c. MIT smart-open (30 · 2K) – Utils for streaming large files (S3, HDFS, gzip, bz2…). MIT Datasets (29 · 6.9K) – The largest hub of ready-to-use NLP datasets for ML models with.. Apache-2 pandas-datareader (29 · 1.9K) – Extract data from a wide range of Internet sources.. BSD-3 snorkel (28 · 4.5K · ) – A system for quickly generating training data with weak.. Apache-2 csvkit (28 · 4.5K) – A suite of utilities for converting to and working with CSV, the king of.. MIT tabulator-py (26 · 200) – Python library for reading and writing tabular data via streams. MIT Intake (25 · 530) – Intake is a lightweight package for finding, investigating, loading and.. BSD-2 SDV (21 · 360) – Synthetic Data Generation for tabular, relational and time series data. MIT datatest (21 · 240) – Tools for test driven data-wrangling and data validation. Apache-2Show 8 hidden projects…

Web Scraping & Crawling

Libraries for web scraping, crawling, downloading, and mining as well as libraries.

best-of-web-python – Web Scraping ( 1.1K · ) – Collection of web-scraping and crawling libraries.

Data Pipelines & Streaming

Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.

Celery (39 · 17K · ) – Asynchronous task queue/job queue based on distributed message passing. BSD-3 Airflow (36 · 21K · ) – Platform to programmatically author, schedule, and monitor.. Apache-2 joblib (35 · 2.4K) – Computing with Python functions. BSD-3 rq (33 · 7.6K) – Simple job queues for Python. BSD-3 luigi (32 · 14K) – Luigi is a Python module that helps you build complex pipelines of batch.. Apache-2 Beam (32 · 4.6K) – Unified programming model to define and execute data processing.. Apache-2 Prefect (30 · 6K) – The easiest way to automate your data. Apache-2 dbt (29 · 2.7K) – dbt (data build tool) enables data analysts and engineers to transform.. Apache-2 faust (28 · 5.4K) – Python Stream Processing. BSD-3 Kedro (28 · 3.6K) – A Python framework for creating reproducible, maintainable and modular.. Apache-2 Dagster (27 · 3K) – A data orchestrator for machine learning, analytics, and ETL. Apache-2 mrjob (27 · 2.5K) – Run MapReduce jobs on Hadoop or Amazon Web Services. Apache-2 petl (27 · 860) – Python Extract Transform and Load Tables of Data. MIT PyFunctional (26 · 1.8K) – Python library for creating data pipelines with chain functional.. MIT Hub (25 · 2.7K) – Fastest unstructured dataset management for TensorFlow/PyTorch… MPL-2.0 TFX (25 · 1.4K) – TFX is an end-to-end platform for deploying production ML pipelines. Apache-2 Great Expectations (24 · 3.9K) – Always know what to expect from your data. Apache-2 streamparse (23 · 1.4K) – Run Python in Apache Storm topologies. Pythonic API, CLI.. Apache-2 bonobo (23 · 1.4K) – Extract Transform Load for Python 3.5+. Apache-2 Optimus (23 · 980) – Agile Data Preparation Workflows madeeasy with dask, cudf,.. Apache-2 pysparkling (23 · 230) – A pure Python implementation of Apache Spark’s RDD and DStream.. MIT Pypeline (22 · 1.2K) – Concurrent data pipelines in Python . MIT dpark (20 · 2.6K) – Python clone of Spark, a MapReduce alike framework in Python. BSD-3 mrq (20 · 840) – Mr. Queue – A distributed worker task queue in Python using Redis & gevent. MIT pdpipe (20 · 590) – Easy pipelines for pandas DataFrames. MIT ploomber (20 · 210) – A convention over configuration workflow orchestrator. Develop.. Apache-2 spark-deep-learning (18 · 1.8K) – Deep Learning Pipelines for Apache Spark. Apache-2 Mara Pipelines (18 · 1.6K) – A lightweight opinionated ETL framework, halfway between plain.. MIT TaskTiger (18 · 1K) – Python task queue using Redis. MIT Databolt Flow (18 · 900) – Python library for building highly effective data science workflows. MIT BatchFlow (18 · 160) – BatchFlow helps you conveniently work with random or sequential.. Apache-2 flupy (18 · 150) – Fluent data pipelines for python and your shell. MIT riko (17 · 1.6K · ) – A Python stream processing engine modeled after Yahoo! Pipes. MIT zenml (14 · 900 · ) – ZenML: Bring Zen to your ML with reproducible pipelines. Apache-2Show 1 hidden projects…

Distributed Machine Learning

Libraries that provide capabilities to distribute and parallelize machine learning tasks across large-scale compute infrastructure.

Ray (32 · 15K) – An open source framework that provides a simple, universal API for.. Apache-2 dask (32 · 8K · ) – Parallel computing with task scheduling. BSD-3 dask.distributed (31 · 1.2K · ) – A distributed task scheduler for Dask. BSD-3 horovod (29 · 11K) – Distributed training framework for TensorFlow, Keras, PyTorch, and.. Apache-2 ipyparallel (28 · 1.9K) – Interactive Parallel Computing in Python. BSD-3 Mesh (26 · 910) – Mesh TensorFlow: Model Parallelism Made Easier. Apache-2 BigDL (25 · 3.7K) – BigDL: Distributed Deep Learning Framework for Apache Spark. Apache-2 Elephas (25 · 1.5K) – Distributed Deep learning with Keras & Spark. MIT keras petastorm (25 · 1.1K) – Petastorm library enables single machine or distributed training.. Apache-2 mpi4py (25 · 390) – Python bindings for MPI. BSD-3 DeepSpeed (24 · 4.5K) – DeepSpeed is a deep learning optimization library that makes.. MIT TensorFlowOnSpark (24 · 3.6K) – TensorFlowOnSpark brings TensorFlow programs to.. Apache-2 dask-ml (24 · 690) – Scalable Machine Learning with Dask. BSD-3 MMLSpark (23 · 2.3K) – Microsoft Machine Learning for Apache Spark. MIT analytics-zoo (22 · 2.2K) – Distributed Tensorflow, Keras and PyTorch on Apache.. Apache-2 FairScale (21 · 850) – PyTorch extensions for high performance and large scale training. BSD-3 Submit it (21 · 310) – Python 3.6+ toolbox for submitting jobs to Slurm. MIT Apache Singa (19 · 2.2K) – a distributed deep learning platform. Apache-2 BytePS (18 · 2.7K) – A high performance and generic framework for distributed DNN training. Apache-2 Fiber (18 · 860) – Distributed Computing for AI Made Simple. Apache-2 Hivemind (18 · 660) – Decentralized deep learning in PyTorch. Built to train models on.. MIT sk-dist (18 · 260) – Distributed scikit-learn meta-estimators in PySpark. Apache-2 somoclu (18 · 220 · ) – Massively parallel self-organizing maps: accelerate training on.. MITShow 3 hidden projects…

Hyperparameter Optimization & AutoML

Libraries for hyperparameter optimization, automl and neural architecture search.

Optuna (31 · 4.2K) – A hyperparameter optimization framework. MIT Hyperopt (30 · 5.5K) – Distributed Asynchronous Hyperparameter Optimization in Python. BSD-3 scikit-optimize (29 · 2.1K) – Sequential model-based optimization with a `scipy.optimize`.. BSD-3 Keras Tuner (28 · 2.3K) – Hyperparameter tuning for humans. Apache-2 AutoKeras (27 · 7.8K) – AutoML library for deep learning. Apache-2 Bayesian Optimization (27 · 4.9K) – A Python implementation of global optimization with.. MIT NNI (26 · 9.3K) – An open source AutoML toolkit for automate machine learning lifecycle,.. MIT auto-sklearn (26 · 5.3K) – Automated Machine Learning with scikit-learn. BSD-3 AutoGluon (26 · 3K) – AutoGluon: AutoML for Text, Image, and Tabular Data. Apache-2 nevergrad (26 · 2.8K) – A Python toolbox for performing gradient-free optimization. MIT BoTorch (26 · 1.9K) – Bayesian optimization in PyTorch. MIT SMAC3 (26 · 560) – Sequential Model-based Algorithm Configuration. BSD-3 featuretools (25 · 5.4K) – An open source python library for automated feature engineering. BSD-3 Ax (25 · 1.4K) – Adaptive Experimentation Platform. MIT Hyperas (23 · 2.1K) – Keras + Hyperopt: A very simple wrapper for convenient.. MIT GPyOpt (23 · 720) – Gaussian Process Optimization using GPy. BSD-3 Talos (22 · 1.4K) – Hyperparameter Optimization for TensorFlow, Keras and PyTorch. MIT Orion (22 · 180) – Asynchronous Distributed Hyperparameter Optimization. BSD-3 AdaNet (21 · 3.2K · ) – Fast and flexible AutoML with learning guarantees. Apache-2 mljar-supervised (21 · 950) – Automates Machine Learning Pipeline with Feature Engineering.. MIT Neuraxle (21 · 380) – A Sklearn-like Framework for Hyperparameter Tuning and AutoML in.. Apache-2 lazypredict (20 · 400) – Lazy Predict help build a lot of basic models without much code.. MIT optunity (20 · 360 · ) – optimization routines for hyperparameter tuning. BSD-3 Auto ViML (20 · 220) – Automatically Build Multiple ML Models with a Single Line of Code… Apache-2 Test Tube (19 · 660 · ) – Python library to easily log experiments and parallelize.. MIT Dragonfly (17 · 570 · ) – An open source python library for scalable Bayesian optimisation. MIT HyperparameterHunter (16 · 650) – Easy hyperparameter optimization and automatic result.. MIT AlphaPy (16 · 560) – Automated Machine Learning [AutoML] with Python, scikit-learn, Keras,.. Apache-2 Parfit (15 · 200 · ) – A package for parallelizing the fit and flexibly scoring of.. MIT ENAS (13 · 2.4K · ) – PyTorch implementation of Efficient Neural Architecture Search via.. Apache-2 Devol (11 · 920 · ) – Genetic neural architecture search with Keras. MITShow 14 hidden projects…

Reinforcement Learning

Libraries for building and evaluating reinforcement learning & agent-based systems.

OpenAI Gym (35 · 24K) – A toolkit for developing and comparing reinforcement learning.. MIT Dopamine (27 · 9.3K) – Dopamine is a research framework for fast prototyping of.. Apache-2 TensorLayer (27 · 6.5K) – Deep Learning and Reinforcement Learning Library for.. Apache-2 TF-Agents (27 · 1.8K) – TF-Agents: A reliable, scalable and easy to use TensorFlow.. Apache-2 TensorForce (25 · 2.9K) – Tensorforce: a TensorFlow library for applied.. Apache-2 ViZDoom (25 · 1.2K) – Doom-based AI Research Platform for Reinforcement Learning from Raw.. MIT Stable Baselines (24 · 3K) – A fork of OpenAI Baselines, implementations of reinforcement.. MIT Acme (23 · 2K) – A library of reinforcement learning components and agents. Apache-2 garage (22 · 1.1K) – A toolkit for reproducible reinforcement learning research. MIT ChainerRL (22 · 930) – ChainerRL is a deep reinforcement learning library built on top of.. MIT PARL (21 · 1.9K) – A high-performance distributed training framework for Reinforcement.. Apache-2 TRFL (19 · 3.1K · ) – TensorFlow Reinforcement Learning. Apache-2 Coach (19 · 1.9K) – Reinforcement Learning Coach by Intel AI Lab enables easy.. Apache-2 PFRL (19 · 530) – PFRL: a PyTorch-based deep reinforcement learning library. MIT ReAgent (17 · 2.8K) – A platform for Reasoning systems (Reinforcement Learning,.. BSD-3 RLax (17 · 570) – A library of reinforcement learning building blocks in JAX. Apache-2 jaxShow 3 hidden projects…

Recommender Systems

Libraries for building and evaluating recommendation systems.

lightfm (27 · 3.5K) – A Python implementation of LightFM, a hybrid recommendation algorithm. Apache-2 implicit (27 · 2.3K) – Fast Python Collaborative Filtering for Implicit Feedback Datasets. MIT scikit-surprise (26 · 4.7K · ) – A Python scikit for building and analyzing recommender.. BSD-3 TF Ranking (22 · 2.1K) – Learning to Rank in TensorFlow. Apache-2 Cornac (22 · 310) – A Comparative Framework for Multimodal Recommender Systems. Apache-2 Recommenders (21 · 9.3K) – Best Practices on Recommendation Systems. MIT fastFM (20 · 910 · ) – fastFM: A Library for Factorization Machines. BSD-3 RecBole (20 · 770) – A unified, comprehensive and efficient recommendation library. MIT TF Recommenders (19 · 750) – TensorFlow Recommenders is a library for building.. Apache-2 recmetrics (18 · 240) – A library of metrics for evaluating recommender systems. MIT Case Recommender (16 · 320 · ) – Case Recommender: A Flexible and Extensible Python.. MIT Show 3 hidden projects…

Privacy Machine Learning

Libraries for encrypted and privacy-preserving machine learning using methods like federated learning & differential privacy.

PySyft (26 · 6.9K) – A library for answering questions using data you cannot see. Apache-2 Opacus (22 · 760) – Training PyTorch models with differential privacy. Apache-2 FATE (20 · 2.8K) – An Industrial Grade Federated Learning Framework. Apache-2 TensorFlow Privacy (20 · 1.4K) – Library for training machine learning models with.. Apache-2 TFEncrypted (20 · 830 · ) – A Framework for Encrypted Machine Learning in TensorFlow. Apache-2 CrypTen (16 · 730) – A framework for Privacy Preserving Machine Learning. MIT

Workflow & Experiment Tracking

Libraries to organize, track, and visualize machine learning experiments.

Tensorboard (36 · 5.2K) – TensorFlow’s Visualization Toolkit. Apache-2 mlflow (32 · 8.6K) – Open source platform for the machine learning lifecycle. Apache-2 DVC (30 · 7.5K) – Data Version Control | Git for Data & Models. Apache-2 wandb client (30 · 2.8K) – A tool for visualizing and tracking your machine learning.. MIT SageMaker SDK (30 · 1.3K) – A library for training and deploying machine learning.. Apache-2 kaggle (29 · 3.9K) – Official Kaggle API. Apache-2 AzureML SDK (29 · 2.2K) – Python notebooks with ML and deep learning examples with Azure.. MIT snakemake (29 · 880) – This is the development home of the workflow management system.. MIT tensorboardX (28 · 6.8K) – tensorboard for pytorch (and chainer, mxnet, numpy, …). MIT sacred (28 · 3.3K) – Sacred is a tool to help you configure, organize, log and reproduce.. MIT PyCaret (28 · 3K) – An open-source, low-code machine learning library in Python. MIT Metaflow (26 · 4.2K) – Build and manage real-life data science projects with ease. Apache-2 Catalyst (26 · 2.5K) – Accelerated deep learning R&D. Apache-2 VisualDL (24 · 3.9K) – Deep Learning Visualization Toolkit. Apache-2 ClearML (24 · 2.2K) – ClearML – Auto-Magical Suite of tools to streamline your ML.. Apache-2 TNT (24 · 1.3K) – Simple tools for logging and visualizing, loading and training. BSD-3 livelossplot (24 · 1K) – Live training loss plot in Jupyter Notebook for Keras, PyTorch.. MIT ml-metadata (24 · 290) – For recording and retrieving metadata associated with ML.. Apache-2 TensorWatch (22 · 3K) – Debugging, monitoring and visualization for Python Machine Learning.. MIT knockknock (22 · 2K · ) – Knock Knock: Get notified when your training ends with only two.. MIT lore (21 · 1.5K · ) – Lore makes machine learning approachable for Software Engineers and.. MIT Guild AI (21 · 550) – Experiment tracking, ML developer tools. Apache-2 Studio.ml (21 · 370) – Studio: Simplify and expedite model building process. Apache-2 quinn (21 · 220) – pyspark methods to enhance developer productivity. Apache-2 hiddenlayer (20 · 1.4K · ) – Neural network graphs and training metrics for.. MIT Labml (20 · 500) – Monitor deep learning model training and hardware usage from your mobile.. MIT gokart (19 · 170) – A wrapper of the data pipeline library luigi. MIT aim (15 · 880) – Aim a super-easy way to record, search and compare 1000s of ML training.. Apache-2Show 7 hidden projects…

Model Serialization & Conversion

Libraries to serialize models to files, convert between a variety of model formats, and optimize models for deployment.

onnx (33 · 9.9K) – Open standard for machine learning interoperability. Apache-2 Core ML Tools (26 · 2.1K) – Core ML tools contain supporting tools for Core ML model.. BSD-3 TorchServe (24 · 1.6K) – Model Serving on PyTorch. Apache-2 mmdnn (23 · 5.3K · ) – MMdnn is a set of tools to help users inter-operate among different deep.. MIT cortex (21 · 7.4K) – Model serving at scale. Apache-2 m2cgen (21 · 1.8K) – Transform ML models into a native code (Java, C, Python, Go, JavaScript,.. MIT Hummingbird (20 · 2.3K) – Hummingbird compiles trained ML models into tensor computation for.. MIT pytorch2keras (18 · 670 · ) – PyTorch to Keras model convertor. MIT tfdeploy (16 · 350) – Deploy tensorflow graphs for fast evaluation and export to.. BSD-3 Show 2 hidden projects…

Model Interpretability

Libraries to visualize, explain, debug, evaluate, and interpret machine learning models.

shap (34 · 12K) – A game theoretic approach to explain the output of any machine learning model. MIT Lime (29 · 8.5K) – Lime: Explaining the predictions of any machine learning classifier. BSD-2 pyLDAvis (28 · 1.4K) – Python library for interactive topic model visualization. Port of.. BSD-3 InterpretML (27 · 3.5K) – Fit interpretable models. Explain blackbox machine learning. MIT Model Analysis (27 · 1K) – Model analysis tools for TensorFlow. Apache-2 yellowbrick (25 · 3.1K) – Visual analysis and diagnostic tools to facilitate machine.. Apache-2 Captum (25 · 2.2K) – Model interpretability and understanding for PyTorch. BSD-3 dtreeviz (25 · 1.4K) – A python library for decision tree visualization and model interpretation. MIT Fairness 360 (25 · 1.2K) – A comprehensive set of fairness metrics for datasets and.. Apache-2 arviz (25 · 960) – Exploratory analysis of Bayesian models with Python. Apache-2 Lucid (24 · 4.1K) – A collection of infrastructure and tools for research in neural.. Apache-2 DoWhy (24 · 2.7K) – DoWhy is a Python library for causal inference that supports explicit.. MIT keras-vis (23 · 2.8K · ) – Neural network visualization toolkit for keras. MIT TreeInterpreter (23 · 650) – Package for interpreting scikit-learn’s decision tree.. BSD-3 Alibi (22 · 910) – Algorithms for monitoring and explaining machine learning models. Apache-2 keract (22 · 860) – Activation Maps (Layers Outputs) and Gradients in Keras. MIT random-forest-importances (22 · 420) – Code to compute permutation and drop-column.. MIT Explainability 360 (21 · 780) – Interpretability and explainability of data and machine.. Apache-2 iNNvestigate (21 · 780) – A toolbox to iNNvestigate neural networks’ predictions!. BSD-2 tf-explain (21 · 780) – Interpretability Methods for tf.keras models with Tensorflow 2.x. MIT fairlearn (21 · 710) – A Python package to assess and improve fairness of machine.. MIT aequitas (21 · 360) – Bias and Fairness Audit Toolkit. MIT explainerdashboard (20 · 370) – Quickly build Explainable AI dashboards that show the inner.. MIT checklist (19 · 1.3K) – Beyond Accuracy: Behavioral Testing of NLP models with CheckList. MIT CausalNex (19 · 1K) – A Python library that helps data scientists to infer.. Apache-2 deeplift (19 · 510) – Public facing deeplift repo. MIT What-If Tool (19 · 460) – Source code/webpage/demos for the What-If Tool. Apache-2 sklearn-evaluation (19 · 290) – Machine learning model evaluation made easy: plots,.. MIT tcav (18 · 440) – Code for the TCAV ML interpretability project. Apache-2 fairness-indicators (18 · 180) – Tensorflow’s Fairness Evaluation and Visualization.. Apache-2 LIT (17 · 2.4K) – The Language Interpretability Tool: Interactively analyze NLP models for.. Apache-2 ExplainX.ai (17 · 190) – Explainable AI framework for data scientists. Explain & debug any.. MIT imodels (17 · 190) – Interpretable ML package for concise, transparent, and accurate predictive.. MIT DiCE (16 · 480) – Generate Diverse Counterfactual Explanations for any machine.. MIT LOFO (16 · 310 · ) – Leave One Feature Out Importance. MIT model-card-toolkit (16 · 180) – a tool that leverages rich metadata and lineage.. Apache-2 FlashTorch (15 · 560 · ) – Visualization toolkit for neural networks in PyTorch! Demo –. MIT Anchor (14 · 630) – Code for High-Precision Model-Agnostic Explanations paper. BSD-2Show 8 hidden projects…

Vector Similarity Search (ANN)

Libraries for Approximate Nearest Neighbor Search and Vector Indexing/Similarity Search.

ANN Benchmarks ( 2.1K) – Benchmarks of approximate nearest neighbor libraries in Python.

Faiss (29 · 13K) – A library for efficient similarity search and clustering of dense vectors. MIT Annoy (29 · 8.2K) – Approximate Nearest Neighbors in C++/Python optimized for memory usage.. Apache-2 NMSLIB (28 · 2.3K) – Non-Metric Space Library (NMSLIB): An efficient similarity search.. Apache-2 hnswlib (26 · 1.4K) – Header-only C++/python library for fast approximate nearest neighbors. Apache-2 Milvus (25 · 5.3K) – An open source embedding vector similarity search engine powered by.. Apache-2 PyNNDescent (25 · 380) – A Python nearest neighbor descent for approximate nearest neighbors. BSD-2 Magnitude (23 · 1.4K · ) – A fast, efficient universal vector embedding utility package. MIT NGT (19 · 630) – Nearest Neighbor Search with Neighborhood Graph and Tree for High-.. Apache-2 N2 (19 · 460) – TOROS N2 – lightweight approximate Nearest Neighbor library which runs fast.. Apache-2Show 2 hidden projects…

Probabilistics & Statistics

Libraries providing capabilities for probabilistic programming/reasoning, bayesian inference, gaussian processes, or statistics.

PyMC3 (32 · 5.6K) – Probabilistic Programming in Python: Bayesian Modeling and.. Apache-2 tensorflow-probability (31 · 3.3K) – Probabilistic reasoning and statistical analysis in.. Apache-2 hmmlearn (29 · 2.2K) – Hidden Markov Models in Python, with scikit-learn like API. BSD-3 Pyro (28 · 6.8K) – Deep universal probabilistic programming with Python and PyTorch. Apache-2 GPyTorch (28 · 2.3K) – A highly efficient and modular implementation of Gaussian Processes.. MIT pomegranate (27 · 2.6K) – Fast, flexible and easy to use probabilistic modelling in Python. MIT filterpy (27 · 1.7K) – Python Kalman filtering and optimal estimation library. Implements.. MIT GPflow (27 · 1.4K) – Gaussian processes in TensorFlow. Apache-2 pgmpy (25 · 1.7K) – Python Library for learning (Structure and Parameter) and inference.. MIT SALib (24 · 440) – Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris,.. MIT bambi (20 · 580) – BAyesian Model-Building Interface (Bambi) in Python. MIT scikit-posthocs (20 · 190) – Multiple Pairwise Comparisons (Post Hoc) Tests in Python. MIT Funsor (19 · 160) – Functional tensors for probabilistic programming. Apache-2 pyhsmm (18 · 480 · ) – Bayesian inference in HSMMs and HMMs. MIT Orbit (18 · 340) – A Python package for Bayesian forecasting with object-oriented design.. Apache-2 Baal (17 · 320) – Using approximate bayesian posteriors in deep nets for active learning. Apache-2Show 5 hidden projects…

Adversarial Robustness

Libraries for testing the robustness of machine learning models against attacks with adversarial/malicious examples.

CleverHans (27 · 5K) – An adversarial example library for constructing attacks, building.. MIT Foolbox (27 · 1.8K) – A Python toolbox to create adversarial examples that fool neural networks.. MIT ART (23 · 2.1K) – Adversarial Robustness Toolbox (ART) – Python Library for Machine Learning.. MIT TextAttack (23 · 1.3K) – TextAttack is a Python framework for adversarial attacks, data.. MIT robustness (18 · 490) – A library for experimenting with, training and evaluating neural.. MIT AdvBox (16 · 1.1K · ) – Advbox is a toolbox to generate adversarial examples that fool.. Apache-2Show 2 hidden projects…

GPU Utilities

Libraries that require and make use of CUDA/GPU system capabilities to optimize data handling and machine learning tasks.

CuPy (31 · 4.9K) – A NumPy-compatible array library accelerated by CUDA. MIT gpustat (26 · 2.3K) – A simple command-line utility for querying and monitoring GPU status. MIT PyCUDA (25 · 1.1K · ) – CUDA integration for Python, plus shiny features. MIT Apex (23 · 5.1K) – A PyTorch Extension: Tools for easy mixed precision and distributed.. BSD-3 ArrayFire (23 · 3.3K) – ArrayFire: a general purpose GPU library. BSD-3 scikit-cuda (23 · 800) – Python interface to GPU-powered libraries. BSD-3 cuDF (21 · 3.7K) – cuDF – GPU DataFrame Library. Apache-2 py3nvml (21 · 170 · ) – Python 3 Bindings for NVML library. Get NVIDIA GPU status inside.. BSD-3 DALI (20 · 3.1K) – A library containing both highly optimized building blocks and an.. Apache-2 cuML (19 · 2K) – cuML – RAPIDS Machine Learning Library. Apache-2 BlazingSQL (17 · 1.4K) – BlazingSQL is a lightweight, GPU accelerated, SQL engine for.. Apache-2 Vulkan Kompute (17 · 350) – General purpose GPU compute framework for cross vendor.. Apache-2 cuGraph (16 · 670) – cuGraph – RAPIDS Graph Analytics Library. Apache-2 cuSignal (15 · 460) – GPU accelerated signal processing. Apache-2Show 4 hidden projects…

Tensorflow Utilities

Libraries that extend TensorFlow with additional capabilities.

tensorflow-hub (32 · 2.8K) – A library for transfer learning by reusing parts of.. Apache-2 tensor2tensor (31 · 11K) – Library of deep learning models and datasets designed to.. Apache-2 TF Addons (31 · 1.2K) – Useful extra functionality for TensorFlow 2.x maintained by.. Apache-2 TensorFlow Transform (29 · 860) – Input pipeline framework. Apache-2 TensorFlow I/O (26 · 420) – Dataset, streaming, and file system extensions.. Apache-2 TF Model Optimization (25 · 980) – A toolkit to optimize ML models for deployment for.. Apache-2 efficientnet (23 · 1.7K) – Implementation of EfficientNet model. Keras and.. Apache-2 TensorFlow Cloud (22 · 230) – The TensorFlow Cloud repository provides APIs that.. Apache-2 Neural Structured Learning (21 · 790) – Training neural models with structured signals. Apache-2 TensorNets (19 · 980) – High level network definitions with pre-trained weights in.. MIT tffm (18 · 760 · ) – TensorFlow implementation of an arbitrary order Factorization Machine. MIT TF Compression (18 · 450) – Data compression in TensorFlow. Apache-2 Saliency (17 · 640) – TensorFlow implementation for SmoothGrad, Grad-CAM, Guided.. Apache-2

Sklearn Utilities

Libraries that extend scikit-learn with additional capabilities.

imbalanced-learn (31 · 5.1K) – A Python Package to Tackle the Curse of Imbalanced.. MIT MLxtend (30 · 3.4K) – A library of extension and helper modules for Python’s data.. BSD-3 category_encoders (24 · 1.6K · ) – A library of sklearn compatible categorical variable.. BSD-3 sklearn-contrib-lightning (24 · 1.4K) – Large-scale linear classification, regression and.. BSD-3 scikit-opt (22 · 2K) – Genetic Algorithm, Particle Swarm Optimization, Simulated.. MIT fancyimpute (22 · 940) – Multivariate imputation and matrix completion algorithms.. Apache-2 combo (22 · 480) – (AAAI’ 20) A Python Toolbox for Machine Learning Model.. BSD-2 xgboostscikit-lego (20 · 440) – Extra blocks for scikit-learn pipelines. MIT DESlib (20 · 320) – A Python library for dynamic classifier and ensemble selection. BSD-3 iterative-stratification (19 · 530) – scikit-learn cross validators for iterative.. BSD-3 scikit-tda (19 · 270) – Topological Data Analysis for Python. MIT skggm (16 · 180) – Scikit-learn compatible estimation of general graphical models. MIT Show 5 hidden projects…

Pytorch Utilities

Libraries that extend Pytorch with additional capabilities.

pretrainedmodels (27 · 7.8K · ) – Pretrained ConvNets for pytorch: NASNet, ResNeXt,.. BSD-3 pytorch-summary (25 · 3K · ) – Model summary in PyTorch similar to `model.summary()` in.. MIT pytorch-optimizer (25 · 1.7K) – torch-optimizer — collection of optimizers for.. Apache-2 EfficientNet-PyTorch (24 · 5.5K) – A PyTorch implementation of EfficientNet. Apache-2 torchdiffeq (24 · 3.4K) – Differentiable ODE solvers with full GPU support and.. MIT PML (24 · 2.8K) – The easiest way to use deep metric learning in your application. Modular,.. MIT SRU (23 · 1.9K) – Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755). MIT Torchmeta (21 · 1.2K) – A collection of extensions and data-loaders for few-shot learning.. MIT torch-scatter (21 · 610) – PyTorch Extension Library of Optimized Scatter Operations. MIT PyTorch Sparse (21 · 360) – PyTorch Extension Library of Optimized Autograd Sparse.. MIT reformer-pytorch (20 · 1.4K) – Reformer, the efficient Transformer, in Pytorch. MIT EfficientNets (20 · 1.3K) – Pretrained EfficientNet, EfficientNet-Lite, MixNet,.. Apache-2 Higher (20 · 1.1K) – higher is a pytorch library allowing users to obtain higher.. Apache-2 TabNet (20 · 860) – PyTorch implementation of TabNet paper :.. MIT Pytorch Toolbelt (19 · 940) – PyTorch extensions for fast R&D prototyping and Kaggle.. MIT Performer Pytorch (17 · 540 · ) – An implementation of Performer, a linear attention-.. MIT Tensor Sensor (17 · 530) – The goal of this library is to generate more helpful.. MIT tinygrad (15 · 4.1K · ) – You like pytorch? You like micrograd? You love tinygrad!. MIT Lambda Networks (15 · 1.4K · ) – Implementation of LambdaNetworks, a new approach to.. MIT Torch-Struct (15 · 910) – Fast, general, and tested differentiable structured prediction.. MIT torchsde (15 · 680) – Differentiable SDE solvers with GPU support and efficient.. Apache-2 Pywick (15 · 320) – High-level batteries-included neural network training library for.. MIT Tez (14 · 580 · ) – Tez is a super-simple and lightweight Trainer for PyTorch. It.. Apache-2 micrograd (12 · 1.6K · ) – A tiny scalar-valued autograd engine and a neural net library.. MIT Show 3 hidden projects…

Database Clients

Libraries for connecting to, operating, and querying databases.

best-of-python – DB Clients ( 1.5K · ) – Collection of database clients for python.

Others

scipy (40 · 8K) – Ecosystem of open-source software for mathematics, science, and engineering. BSD-3 SymPy (36 · 7.9K) – A computer algebra system written in pure Python. BSD-3 Autograd (30 · 5.2K) – Efficiently computes derivatives of numpy code. MIT hdbscan (29 · 1.8K) – A high performance implementation of HDBSCAN clustering. BSD-3 PyOD (28 · 4.2K) – (JMLR’19) A Python Toolbox for Scalable Outlier Detection (Anomaly.. BSD-2 Keras-Preprocessing (28 · 920) – Utilities for working with image data, text data, and.. MIT Cython BLIS (28 · 160) – Fast matrix-multiplication as a self-contained Python library no.. BSD-3 Streamlit (27 · 14K) – Streamlit The fastest way to build data apps in Python. Apache-2 carla (26 · 5.7K) – Open-source simulator for autonomous driving research. MIT Datasette (26 · 4.8K) – An open source multi-tool for exploring and publishing data. Apache-2 DeepChem (26 · 2.8K) – Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry,.. MIT agate (26 · 1K) – A Python data analysis library that is optimized for humans instead of machines. MIT pyclustering (26 · 800) – pyclustring is a Python, C++ data mining library. BSD-3 Trax (25 · 5.9K) – Trax Deep Learning with Clear Code and Speed. Apache-2 causalml (25 · 1.8K) – Uplift modeling and causal inference with machine learning.. Apache-2 Pythran (25 · 1.5K) – Ahead of Time compiler for numeric kernels. BSD-3 TabPy (25 · 1K) – Execute Python code on the fly and display results in Tableau visualizations:. MIT kmodes (25 · 820) – Python implementations of the k-modes and k-prototypes clustering.. MIT metric-learn (24 · 1.1K · ) – Metric learning algorithms in Python. MIT PennyLane (24 · 800) – PennyLane is a cross-platform Python library for differentiable.. Apache-2 pyopencl (24 · 790 · ) – OpenCL integration for Python, plus shiny features. MIT PySwarms (24 · 740) – A research toolkit for particle swarm optimization in Python. MIT pyjanitor (24 · 640) – Clean APIs for data cleaning. Python implementation of R package Janitor. MIT findspark (24 · 390 · ) – Find pyspark to make it importable. BSD-3 datalad (24 · 230) – Keep code, data, containers under control with git and git-annex. MIT Gradio (23 · 2.1K) – Wrap UIs around any model, share with anyone. Apache-2 modAL (23 · 1.1K) – A modular active learning framework for Python. MIT PaddleHub (22 · 4.7K) – Awesome pre-trained models toolkit based on.. Apache-2 pycm (22 · 1.1K) – Multi-class confusion matrix library in Python. MIT Prince (22 · 590) – Python factor analysis library (PCA, CA, MCA, MFA, FAMD). MIT SUOD (22 · 240) – (MLSys’ 21) An Acceleration System for Large-scare Unsupervised.. BSD-2 Mars (21 · 2.1K) – Mars is a tensor-based unified framework for large-scale data.. Apache-2 tensorly (21 · 970) – TensorLy: Tensor Learning in Python. BSD-2 StreamAlert (20 · 2.5K) – StreamAlert is a serverless, realtime data analysis framework.. Apache-2 AstroML (20 · 730) – Machine learning, statistics, and data mining for astronomy and.. BSD-2 alibi-detect (20 · 600) – Algorithms for outlier and adversarial instance detection,.. Apache-2 baikal (20 · 570) – A graph-based functional API for building complex scikit-learn pipelines. BSD-3 BioPandas (20 · 330) – Working with molecular structures in pandas DataFrames. BSD-3 scikit-rebate (20 · 310) – A scikit-learn-compatible Python implementation of ReBATE, a.. MIT rrcf (20 · 290 · ) – Implementation of the Robust Random Cut Forest algorithm for anomaly.. MIT Feature Engine (19 · 470) – Feature engineering package with sklearn like functionality. BSD-3 apricot (18 · 310) – apricot implements submodular optimization for the purpose of selecting.. MIT River (17 · 1.4K) – Online machine learning in Python. BSD-3 traingenerator (10 · 940 · ) – A web app to generate template code for machine learning. MITShow 8 hidden projects…

Related Resources

Papers With Code: Discover ML papers, code, and evaluation tables.
Sotabench: Discover & compare open-source ML models.
Google Dataset Search: Dataset search engine by Google.
Dataset List: List of the biggest ML datasets from across the web.
Awesome Public Datasets: A topic-centric list of open datasets.
Best-of lists: Discover other best-of lists with awesome open-source projects on all kinds of topics.
best-of-python-dev: A ranked list of awesome python developer tools and libraries.
best-of-web-python: A ranked list of awesome python libraries for web development.

3. Machine Learning – TensorFlow – Part III

https://www.tensorflow.org/

4. Artificial Intelligence (AI) – Part I

Reproduced from GitHub https://github.com/

A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.

Courses

MIT: Intro to Deep Learning – A seven day bootcamp designed in MIT to introduce deep learning methods and applications
Deep Blueberry: Deep Learning book – A free five-weekend plan to self-learners to learn the basics of deep-learning architectures like CNNs, LSTMs, RNNs, VAEs, GANs, DQN, A3C and more
Spinning Up in Deep Reinforcement Learning – A free deep reinforcement learning course by OpenAI
MIT Artifical Intelligence Videos – MIT AI Course
Grokking Deep Learning in Motion – Beginner’s course to learn deep learning and neural networks without frameworks.
Intro to Artificial Intelligence – Learn the Fundamentals of AI. Course run by Peter Norvig
EdX Artificial Intelligence – The course will introduce the basic ideas and techniques underlying the design of intelligent computer systems
Artificial Intelligence For Robotics – This class will teach you basic methods in Artificial Intelligence, including: probabilistic inference, planning and search, localization, tracking and control, all with a focus on robotics
Machine Learning – Basic machine learning algorithms for supervised and unsupervised learning
Neural Networks For Machine Learning – Algorithmic and practical tricks for artifical neural networks.
Deep Learning – An Introductory course to the world of Deep Learning.
Stanford Statistical Learning – Introductory course on machine learning focusing on: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines.
Knowledge Based Artificial Intelligence – Georgia Tech’s course on Artificial Intelligence focussing on Symbolic AI.
Deep RL Bootcamp Lectures – Deep Reinforcement Bootcamp Lectures – August 2017
Machine Learning Crash Course By Google Machine Learning Crash Course features a series of lessons with video lectures, real-world case studies, and hands-on practice exercises.
Python Class By Google This is a free class for people with a little bit of programming experience who want to learn Python. The class includes written materials, lecture videos, and lots of code exercises to practice Python coding.
Deep Learning Crash Course In this liveVideo course, machine learning expert Oliver Zeigermann teaches you the basics of deep learning.
Artificial Intelligence: A Modern Approach – Stuart Russell & Peter Norvig
- Also consider browsing the list of recommended reading, divided by each chapter in “Artificial Intelligence: A Modern Approach”.
Paradigms Of Artificial Intelligence Programming: Case Studies in Common Lisp – Paradigms of AI Programming is the first text to teach advanced Common Lisp techniques in the context of building major AI systems
Reinforcement Learning: An Introduction – This introductory textbook on reinforcement learning is targeted toward engineers and scientists in artificial intelligence, operations research, neural networks, and control systems, and we hope it will also be of interest to psychologists and neuroscientists.
The Cambridge Handbook Of Artificial Intelligence – Written for non-specialists, it covers the discipline’s foundations, major theories, and principal research areas, plus related topics such as artificial life
The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind – In this mind-expanding book, scientific pioneer Marvin Minsky continues his groundbreaking research, offering a fascinating new model for how our minds work
Artificial Intelligence: A New Synthesis – Beginning with elementary reactive agents, Nilsson gradually increases their cognitive horsepower to illustrate the most important and lasting ideas in AI
On Intelligence – Hawkins develops a powerful theory of how the human brain works, explaining why computers are not intelligent and how, based on this new theory, we can finally build intelligent machines. Also audio version available from audible.com
How To Create A Mind – Kurzweil discusses how the brain works, how the mind emerges, brain-computer interfaces, and the implications of vastly increasing the powers of our intelligence to address the world’s problems
Deep Learning – Goodfellow, Bengio and Courville’s introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction – Hastie and Tibshirani cover a broad range of topics, from supervised learning (prediction) to unsupervised learning including neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book.
Deep Learning and the Game of Go – Deep Learning and the Game of Go teaches you how to apply the power of deep learning to complex human-flavored reasoning tasks by building a Go-playing AI. After exposing you to the foundations of machine and deep learning, you’ll use Python to build a bot and then teach it the rules of the game.
Deep Learning for Search – Deep Learning for Search teaches you how to leverage neural networks, NLP, and deep learning techniques to improve search performance.
Deep Learning with PyTorch – PyTorch puts these superpowers in your hands, providing a comfortable Python experience that gets you started quickly and then grows with you as you—and your deep learning skills—become more sophisticated. Deep Learning with PyTorch will make that journey engaging and fun.
Deep Reinforcement Learning in Action – Deep Reinforcement Learning in Action teaches you the fundamental concepts and terminology of deep reinforcement learning, along with the practical skills and techniques you’ll need to implement it into your own projects.
Grokking Deep Reinforcement Learning – Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching.
Fusion in Action – Fusion in Action teaches you to build a full-featured data analytics pipeline, including document and data search and distributed data clustering.
Real-World Natural Language Processing – Early access book on how to create practical NLP applications using Python.
Grokking Machine Learning – Early access book that introduces the most valuable machine learning techniques.
Succeeding with AI – An introduction to managing successful AI projects and applying AI to real-life situations.
Elements of AI (Part 1) – Reaktor/University of Helsinki – An Introduction to AI is a free online course for everyone interested in learning what AI is, what is possible (and not possible) with AI, and how it affects our lives – with no complicated math or programming required.
Essential Natural Language Processing – A hands-on guide to NLP with practical techniques, numerous Python-based examples and real-world case studies.
Kaggle’s micro courses – A series of micro courses by offering practical and hands-on knowledge ranging from Python to Deep Learning.
Transfer Learning for Natural Language Processing – A book that gets you up to speed with the relevant ML concepts and then dives into transfer learning for NLP.
(Stanford Deep Learning Series][https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb]
Amazon Machine Learning Developer Guide – A book for ML developers which itroduces the ML concepts & strategies with lots of practical usages.
Machine Learning for Humans – A series of simple, plain-English explanations accompanied by math, code, and real-world examples.

Books

Machine Learning for Mortals (Mere and Otherwise) – Early access book that provides basics of machine learning and using R programming language.
How Machine Learning Works – Mostafa Samir. Early access book that introduces machine learning from both practical and theoretical aspects in a non-threating way.
MachineLearningWithTensorFlow2ed – a book on general purpose machine learning techniques regression, classification, unsupervised clustering, reinforcement learning, auto encoders, convolutional neural networks, RNNs, LSTMs, using TensorFlow 1.14.1.
Serverless Machine Learning – a book for machine learning engineers on how to train and deploy machine learning systems on public clouds like AWS, Azure, and GCP, using a code-oriented approach.
The Hundred-Page Machine Learning Book – all you need to know about Machine Learning in a hundred pages, supervised and unsupervised learning, SVM, neural networks, ensemble methods, gradient descent, cluster analysis and dimensionality reduction, autoencoders and transfer learning, feature engineering and hyperparameter tuning.

Programming

Prolog Programming For Artificial Intelligence – This best-selling guide to Prolog and Artificial Intelligence concentrates on the art of using the basic mechanisms of Prolog to solve interesting AI problems.
AI Algorithms, Data Structures and Idioms in Prolog, Lisp and Java – PDF here
Python Tools for Machine Learning
Python for Artificial Intelligence

Philosophy

Super Intelligence – Superintelligence asks the questions: What happens when machines surpass humans in general intelligence. A really great book.
Our Final Invention: Artificial Intelligence And The End Of The Human Era – Our Final Invention explores the perils of the heedless pursuit of advanced AI. Until now, human intelligence has had no rival. Can we coexist with beings whose intelligence dwarfs our own? And will they allow us to?
How to Create a Mind: The Secret of Human Thought Revealed – Ray Kurzweil, director of engineering at Google, explored the process of reverse-engineering the brain to understand precisely how it works, then applies that knowledge to create vastly intelligent machines.
Minds, Brains, And Programs – The 1980 paper by philospher John Searle that contains the famous ‘Chinese Room’ thought experiment. Probably the most famous attack on the notion of a Strong AI possessing a ‘mind’ or a ‘consciousness’, and interesting reading for those interested in the intersection of AI and philosophy of mind.
Gödel, Escher, Bach: An Eternal Golden Braid – Written by Douglas Hofstadter and taglined “a metaphorical fugue on minds and machines in the spirit of Lewis Carroll”, this wonderful journey into the the fundamental concepts of mathematics,symmetry and intelligence won a Pulitzer Price for Non-Fiction in 1979. A major theme throughout is the emergence of meaning from seemingly ‘meaningless’ elements, like 1’s and 0’s, arranged in special patterns.
Life 3.0: Being Human in the Age of Artificial Intelligence – Max Tegmark, professor of Physics at MIT, discusses how Artificial Intelligence may affect crime, war, justice, jobs, society and our very sense of being human both in the near and far future.

Free Content

Foundations Of Computational Agents – This book is published by Cambridge University Press, 2010
The Quest For Artificial Intelligence – This book traces the history of the subject, from the early dreams of eighteenth-century (and earlier) pioneers to the more successful work of today’s AI engineers.
Stanford CS229 – Machine Learning – This course provides a broad introduction to machine learning and statistical pattern recognition.
Computers and Thought: A practical Introduction to Artificial Intelligence – The book covers computer simulation of human activities, such as problem solving and natural language understanding; computer vision; AI tools and techniques; an introduction to AI programming; symbolic and neural network models of cognition; the nature of mind and intelligence; and the social implications of AI and cognitive science.
Society of Mind – Marvin Minsky’s seminal work on how our mind works. Lot of Symbolic AI concepts have been derived from this basis.
Artificial Intelligence and Molecular Biology – The current volume is an effort to bridge that range of exploration, from nucleotide to abstract concept, in contemporary AI/MB research.
Brief Introduction To Educational Implications Of Artificial Intelligence – This book is designed to help preservice and inservice teachers learn about some of the educational implications of current uses of Artificial Intelligence as an aid to solving problems and accomplishing tasks.
Encyclopedia: Computational intelligence – Scholarpedia is a peer-reviewed open-access encyclopedia written and maintained by scholarly experts from around the world.
Ethical Artificial Intelligence – a book by Bill Hibbard that combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence.
Golden Artificial Intelligence – a cluster of pages on artificial intelligence and machine learning.
R2D3 – A website with explanations on topics from Machine Learning to Statistics. All helped with beautiful animated infographics and real life examples. Available in various languages.

Code

ExplainX– ExplainX is a fast, light-weight, and scalable explainable AI framework for data scientists to explain any black-box model to business stakeholders.
AIMACode – Source code for “Artificial Intelligence: A Modern Approach” in Common Lisp, Java, Python. More to come.
FANN – Fast Artificial Neural Network Library, native for C
FARGonautica – Source code of Douglas Hosftadter’s Fluid Concepts and Creative Analogies Ph.D. projects.

Videos

A tutorial on Deep Learning
Basics of Computational Reinforcement Learning
Deep Reinforcement Learning
Intelligent agents and paradigms for AI
The Unreasonable Effectiveness Of Deep Learning – The Director of Facebook’s AI Research, Dr. Yann LeCun gives a talk on deep convolutional neural networks and their applications to machine learning and computer vision
AWS Machine Learning in Motion– This interactive liveVideo course gives you a crash course in using AWS for machine learning, teaching you how to build a fully-working predictive algorithm.
Deep Learning with R in Motion-Deep Learning with R in Motion teaches you to apply deep learning to text and images using the powerful Keras library and its R language interface.
Grokking Deep Learning in Motion-Grokking Deep Learning in Motion will not just teach you how to use a single library or framework, you’ll actually discover how to build these algorithms completely from scratch!
Reinforcement Learning in Motion – This liveVideo breaks down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents.

Learning

Deep Learning. Methods And Applications Free book from Microsoft Research
Neural Networks And Deep Learning – Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts behind neural networks and deep learning
Machine Learning: A Probabilistic Perspective – This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach
Deep Learning – Yoshua Bengio, Ian Goodfellow and Aaron Courville put together this currently free (and draft version) book on deep learning. The book is kept up-to-date and covers a wide range of topics in depth (up to and including sequence-to-sequence learning).
Getting Started with Deep Learning and Python
Machine Learning Mastery
Deep Learning.net – Aggregation site for DL resources
Awesome Machine Learning – Like this Github, but ML-focused
FastML
Awesome Deep Learning Resources – Rough list of learning resources for Deep Learning
Professional and In-Depth Machine Learning Video Courses – A collection of free professional and in depth Machine Learning and Data Science video tutorials and courses
Professional and In-Depth Artificial Intelligence Video Courses – A collection of free professional and in depth Artificial Intelligence video tutorials and courses
Professional and In-Depth Deep Learning Video Courses – A collection of free professional and in depth Deep Learning video tutorials and courses
Introduction to Machine Learning – Introductory level machine learning crash course
Awesome Graph Classification – Learning from graph stuctured data
Awesome Community Detection – Clustering graph structured data
Awesome Decision Tree Papers – Decision tree papers from machine learning conferences
Awesome Gradient Boosting Papers – Gradient boosting papers from machine learning conferences
Awesome Fraud Detection Papers – Fraud detection papers from machine learning conferences
Awesome Neural Art – Creating art and manipulating images using deep neural networks.

Organizations

Journals

Competitions

Newsletters

AI Digest. A weekly newsletter to keep up to date with AI, machine learning, and data science. Archive.

Misc

Open Cognition Project – We’re undertaking a serious effort to build a thinking machine
AITopics – Large aggregation of AI resources
AIResources – Directory of open source software and open access data for the AI research community
Artificial Intelligence Subreddit
AI Experiments with Google

5. Artificial Intelligence (AI) – Part II

A curated list of awesome awesomeness about artificial intelligence(AI).

Artificial Intelligence(AI)

Machine Learning(ML)

Deep Learning(DL)

Computer Vision(CV)

CV
CV2
CV-People
DeepFakes
Event-based Vision Resources
Embodied Vision
Research Topics
- Action Recognition
- Colorization
- Image Classification
  - imgclsmob
- Image Registration
- Object Detection
  - amusi/Object Detection
  - hoya012/Object Detection
  - Tiny Object Detection
  - Small Object Detection
  - Video Object Detection
  - Anchor Free Object Detection
- Face
  - Face Detection & Recognition
  - awesome-face
  - Facial Expression Recognition (FER)
  - Face Landmark Detection
  - Landmark Detection
- Gaze Estimation
- HDR Image Synthesis
- Image Segmentation
  - Semantic Segmentation
  - Segmentation.X
  - Panoptic Segmentation
  - Weakly Supervised Semantic Segmentation
  - Referring Image Segmentation
- Object Tracking
  - Visual Tracking1
  - Visual Tracking2
  - Multi-Object Tracking
  - Tracking and Detection
  - daily-paper-visual-tracking
  - Multimodal Tracking
- Pose estimation
  - Object Pose Estimation
  - Human Pose estimation
    - Human Pose estimation 1
    - Human Pose estimation 2
  - Hand Pose estimation
- Human Motion
- Human-Object Interaction(HOI)
- Long-tailed
- Scene Text
  - Scene Text Localization and Recognition
  - Scene Text Localization & Recognition Resources
  - Scene Text Detection and Recognition
  - Text Detection and Recognition
  - Scene Text Recognition Resources
- Super Resolution
  - Super Resolution (ChaofWang)
  - Super Resolution (ptkin)
  - Image Super Resolution
  - Video Super Resolution
- 3D
  - 3D Reconstruction
- OCR
- Re-ID
  - Person Re-ID(1)
  - Person Re-ID(2)
  - Vehicle Re-ID(1)
  - Vehicle Re-ID(2)
- Pedestrian Attribute Recognition
- Person Search
- Image Captioning
- Question Answering
- Crowd Counting
- Lane Detection
- Low Lignt Enhancement
- Image Retrieval
  - Awesome image retrieval papers (1)
  - Awesome image retrieval papers (2)
- Medical Imaging
  - Medical Data
  - Medical imaging datasets
  - Awesome GAN for Medical Imaging
  - Deep Learning for Medical Applications
  - Medical Image Segmentation
- Image Inpainting
- Image/Video Dehazing
  - awesome-dehazing
  - DehazeZoo
- Image Denoising
  - reproducible-image-denoising-state-of-the-art
  - Image-Denoising-State-of-the-art
  - Image and Video Denoising
  - Awesome-Denoise
- Image Deraining
- Image/Video Deblurring
- Image to Image(img2img)
  - lzhbrian/Image to Image
  - xiaweihao/Image to Image
- Underwater Image Enhancement
- Video Analysis
- Video Object Segmentation(VOS)
- Edge Detection
- Local and Global Descriptor
- Salience
  - Salient Object Detection(SOD)
  - Saliency Detection & Segmentation
  - RGB-D Salient Object Detection
- Fashion + AI
- Event-based Vision Resources
- Video Stabilization
- Visual Transformer
  - Transformer-in-Vision
  - Awesome Transformer for Vision Resources List
  - Awesome-Visual-Transformer
  - Awesome Visual Representation Learning with Transformers

Natural Language Processing(NLP)

Speech Recognition

Programming Languages

Framework

Datasets

Segmentation & Saliency detection

AI Career

Awesome Software Engineering for Machine Learning

6. Artificial Intelligence (AI) – Part III

A curated list of artificial intelligence resources (Courses, Tools, App, Open Source Project)

Courses & Articles

AI & ML Events – Discover the best upcoming hand-picked events in the field of artificial intelligence and machine learning
Machine Learning – Stanford University This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Taught by: Andrew Ng
MIT Artifical Intelligence Videos – MIT This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances.
Machine Learning – Basic machine learning algorithms for supervised and unsupervised learning
Deep Learning for Natural Language Processing – University of Oxford This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks.
Tensorflow for Deep Learning Research –Stanford University This course will cover the fundamentals and contemporary usage of the Tensorflow library for deep learning research. We aim to help students understand the graphical computational model of Tensorflow.
Deep Learning for Natural Language Processing –Stanford University Natural language processing (NLP) is one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate most everything in language: web search, advertisement, emails, customer service, language translation, radiology reports, etc.

Machine Learning – Cornell University This course will introduce you to technologies for building data-centric information systems on the World Wide Web, show the practical applications of such systems, and discuss their design and their social and policy context by examining cross-cutting issues such as citizen science, data journalism and open government. Course work involves lectures and readings as well as weekly homework assignments, and a semester-long project in which the students demonstrate their expertise in building data-centric Web information systems.
Deep Learning Explained – Microsoft This course provides the level of detail needed to enable engineers / data scientists / technology managers to develop an intuitive understanding of the key concepts behind this game changing technology.
Machine Learning: Regression – University of Washington In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,…). This is just one of the many places where regression can be applied.
Machine Learning: Clustering & Retrieval – University of Washington A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?
Neural Networks for Machine Learning –University of Toronto with Geoffrey Hinton Learn about artificial neural networks and how they’re being used for machine learning, as applied to speech and object recognition, image segmentation, modeling language and human motion, etc. We’ll emphasize both the basic algorithms and the practical tricks needed to get them to work well.
Machine Learning With Big Data –University of California, San Diego Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems.

Artificial Intelligence

Introduction to Artificial Intelligence –UC Berkeley This course will introduce the basic ideas and techniques underlying the design of intelligent computer systems. A specific emphasis will be on the statistical and decision-theoretic modeling paradigm.
Advanced Artificial Intelligence –Cornell University The design of systems that are among top 10 performers in the world (human, computer, or hybrid human-computer).
Artificial Intelligence (AI) – Columbia University with Professor Ansaf Salleb-Aouissi This course will provide a broad understanding of the basic techniques for building intelligent computer systems and an understanding of how AI is applied to problems.

Generative Adversarial Networks (GANs)

A Beginner’s Guide to Generative Adversarial Networks (GANs) – Generative adversarial networks (GANs) are deep neural net architectures comprised of two nets, pitting one against the other.
GAN — What is Generative Adversary Networks GAN? – GAN is about creating, like drawing a portrait or composing a symphony. This is hard compared to other deep learning fields..

Robotics

Artificial Intelligence for Robotics – Georgia Tech Artificial Intelligence for Robotics by Sebastian Thrun

Advanced Robotics –UC Berkeley The course introduces the math and algorithms underneath state-of-the-art robotic systems. The majority of these techniques are heavily based on probabilistic reasoning and optimization—two areas with wide applicability in modern Artificial Intelligence.

Artificial Intelligence Company & Research Institute

Business Intelligence & Analytics

Arago/HIRO — optimise and autonomously IT and business operations
Arimo — solution to help predict customer activity and fraud
Ayasdi — a suite of intelligent applications for enterprise
DataRobot — a range of products to improve enterprise products
Dataminr — discovers events and breaking information before the news
Einstein — a smarter Salesforce
Fuzzy AI — adds intelligent decision making to web and mobile apps
Logz.io — helps you index, search, visualise and analyse your data
NXT AI — is a framework for temporal pattern recognition and prediction
Paxata] — to transform raw data into useful information automatically
Poweredby.ai — helps you monitor server bugs
Sundown — automates repetitive tasks within your business
UBIX — making complex data science easy for enterprise

Machine Learning

Geometric Intelligence – Geometric Intelligence apart of the Uber AI Labs
kaggle – a platform for predictive modelling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data

Robotics

Boston Dynamics – an engineering and robotics design company that is best known for the development of BigDog
iRobot – manufacturer of the famous robotic vacuum cleaner
DJI – industry leader in drones for both commerical and industrial needs.
Fetch Robotics – The future of e-commerce fulfillment and R&D robots.
ABB Robotics – the largest manufacturer of industrial robots
Aldebaran Robotics – creator of the [NAO robot
FANUC – industrial robots manufacturer with the biggest install base
Rethink Robotics – creator of the collaborative robot [Baxter]

Conversational Interfaces & Chatbots

API.ai — advanced tools needed to build conversational user interfaces
Chatfuel — build a Facebook chatbot without coding
Comm.ai — add voice and chat API to websites and apps
Conversica — conversational interfaces to help get more sales
EDDI — create, test and deploy chatbots
FPT AI Platform — automated interaction with end-users
Golem.ai — natural language interpretation tool for developers
Gong — analyses and improves sales conversations and discovery calls
Kasisto — conversational AI platform for the finance industry
KITT.AI — create conversational agents using a visual interface
Maluuba — teaching machines to think, reason and communicate
Massively — build chatbots for business
Meya — build, train and host bots in one platform
MindMeld — improved version of Siri
Motion AI — chat bots made easy
msg.ai — chatbot with management dashboard
Octane AI — marketing automation for messaging
OpenAI Gym — open source interface to reinforcement learning tasks
Orbit — tools to help to help automate conversational AI
Pool — personal assistant that helps you get more work done
Recast — collaborative platform to build, train, deploy intelligent bots
Reply.ai — platform to build and manage your conversational strategy
Semantic Machines — conversational AI for work, travel, shop and play
Snips — add a voice Assistant to your connected product
Servo — full spectrum bot and voice which integrates with existing systems
UNU.ai — using the Swarm Intelligence (group brainpower) for chatbots
Unify — e-commerce chatbot
uTu — multi-channel bot analytics and data management
Wechaty – Wechaty is a Bot Framework for Wechat Personal Account which can help you create a bot
Wit.ai — easily create text or voice based bots for preferred platform
Wysh — enterprise scale chatbot with payment methods
Zero AI — voice interface that understands meaning, intent and context

Data Science

BigML — single platform for all predictive use cases
CrowdFlower — helps with sentiment analysis, search relevance, and more
Dataiku — data science platform for prototype, deploy, and run at scale
DataScience — enterprise data science platform for R&D and production
Domino Data Lab — platform for collaborating, building and deploying
Kaggle — helps you learn, work, and play with machine learning models
RapidMiner — makes data science teams more productive
Seldon — helps DS teams put machine learning models into production
SherlockML — a platform to build, test, and deploy AI algorithms
Spark — research engine, capable of discovering complex patterns in data
Tamr — makes data unification of data silos possible
Trifacta — helps put data into useful structures for analysis
Yhat — allows data scientists to deploy and update predictive models rapidly
Yseop — automate the writing of reports, websites, emails, articles and more

Development

AnOdot — detects business incidents
Bonsai — develop more adaptive, trusted and programmable AI models
Deckard.ai — helps predict project timelines
Fuzzy.ai — adds intelligent decision making to web and mobile apps
IBM Watson — AI platform for business
Gigster — connecting projects with the right team
Kite — augments your coding environment with web available knowledge
Layer 6 AI — deep learning platform for prediction and personalisation
Morph — makes developing chatbots for your business easy
Ozz — make your bot smarter, by helping it self learn
RainforestQA — rapidly web and mobile app testing
SignifAI — increase server uptime and predict downtime
Turtle — project management and chat software that’s easy for teams
Neural Network – Libraries by Sony. Sony demonstrates its interest in deep learning by releasing their own open source deep learning framework.
TensorFlow neural network playground – Play with neural networks visually in your browser to get a feel for what they are and what they do.

Vehicle

Vinli — turns any car into a smart car
Apollo – by Baidu. Newly launched source platform for building autonomous vehicles.

Insurance / Legal

Docubot — can advise you on legal issues
Driveway — tracks and rewards safe drivers

Artificial Intelligence Tools

Personal Tools

Amazon Echo / Alexa — everyday personal assistant for in-home
Apple Siri — everyday personal assistant on iPhone and Mac
Brin — helps you make smarter business decisions
Chatfuel — create a Facebook chatbot in 7 minutes
Findo — smart search assistant across email, files and personal cloud.
Fembot — your AI girlfriend
Fin — a powerful personal assistant
Focus — helps you focus, get tasks done and prioritise your day
Gatebox — a holographic anime assistant in an espresso machine
Google Assistant — everyday personal assistant
Howdy – a friendly, trainable bot that helps Slack teams with work
Hound — everyday personal assistant
Julie Desk — meeting scheduling assistant (aimed at C-Suite)
Kono — meeting scheduling assistant
Lifos — dynamic independent entities that interact with the web and social
Ling — similar to Amazon Echo
Luka — chatbot messenger for people and other chatbots
Lyra — monitor analyse your carbon emissions
Magic – Magic is a special phone number you text to get anything you want, hassle free
Microsoft Cortana – Cortana is a voice-controlled virtual assistant for Microsoft Windows Phone 8.1. Comparable to Siri, the intelligent assistant enabled on Apple devices, Microsoft’s Cortana will use the Bing search engine and data stored on the user’s smartphone by to make personalized recommendations
MyWave – Melbourne-based which makes a personal call
Meeco – Sydney-based, a robot lawyer
Mimetic — meeting scheduling assistant
My Ally — handles meeting scheduling and manages calendar
Mycroft — is the world’s first open source voice assistant
myWave — chatbot to help you throughout your daily life
Remi— like Siri with an interface
Replika— your AI friend that you raise through text conversations
SkipFlag — automatically discover and organise your work
Spoken — virtual assistant with an interface
Vesper — virtual assistant aimed at C-Suite
Viv — like Siri but 10x better
x.ai — x.ai is a personal assistant who schedules meetings for you
Zoom.ai — personal assistant to help you at work

Education Tools

Thirdleap — helps children to learn maths
Woogie — the conversational AI robot that makes learning and discovery fun for children
XiaoJing Bot – XiaoJing Bot to support management of wechat groups and remove members of wechat group

Health / Medical Tools

Abi — your virtual health assistant
Ada — can help if you’re feeling unwell
Airi — personal health coach
Alz.ai — helps you care for loved ones with Alzheimer’s
Bitesnap — food recognition from meal photos to help count calories
doc.ai — makes lab results easy to understand
Gyan — helps you go from symptoms to likely conditions
Joy — helps you track and improve your mental health
Kiwi — helps you to reduce and quit smoking
Tess by X2AI — therapist in your pocket
Sleep.ai — diagnose snoring and tooth grinding

Travel AI Tools

Ada — chatbot that helps you navigate and make decisions
Emma — automatically calculates and adds meeting travel time
ETA — helps you manage travel itineraries and meetings
HelloGbye — book complex trips with simple speech
Mezi —helps with booking flights, hotels, restaurant reservations and more
Nexar — dash cam app that helps you drive safer
Ready — traffic forecaster and travel time prediction
Spatial — reveal the social layer of cities

Finance AI Tools

Abe — fast answers about your finances
Andy — a personal Tax Accountant
Ara — helps you budget
Bond — helps you achieve your financial goals
Mylo — rounds up your everyday purchases and invest the spare change
Olivia — helps you manage your finances
Responsive — institutional-grade active portfolio management
Roger — helps you pay bills easily
Xoe.ai — AI lending chatbot

Language / Translation AI Tools

Microsoft Translator — language translator powered by neural networks
Watson.ai — legal, academic and financial translations

IoT / IIoT

Aerial — home activity, movement and identity sensor
Bridge.ai — smart-home platform focused on speech and sound
Cubic — one place to connect your smart home devices
Grojo — grow room controller and monitoring system
Home — autonomous home operations with connected devices
Hello — helps you monitor and improve your sleep
Josh — whole house voice control
Mycroft — is the world’s first open source voice assistant
Nanit — baby monitor that measures sleep and caregiver interactions
Nest — a range of in-home devices such as Thermostat, security and alarms

Research

Apollo — breaks down articles and PDF’s into quick, readable dot points
Ferret.ai — helps you research by summarising articles and search ability
Iris — helps you research and visualise concepts in research papers

Tools

CaptionBot — Microsoft describes any photo Crowdfunding.ai — crowdfunding platform for AI projects Fieldguide — universal field guide that suggests possible matches

Books

Reinforcement Learning: An Introduction – This introductory textbook on reinforcement learning is targeted toward engineers and scientists in artificial intelligence, operations research, neural networks, and control systems, and we hope it will also be of interest to psychologists and neuroscientists.

Blogs, Papers, and Articles

Deep learning reading list – A thorough list of academic survey papers on the subjects of reinforcement learning, computer vision, NLP & speech, disentangling factors, transfer learning, practical tricks, sparse coding, foundation theory, feedforward networks, large scale deep learning, recurrent networks, hyper parameters, optimization, and unsupervised feature learning.
[Deep Learning in a Nutshell] – (https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/) – by Tim Dettmers, via NVidia (2015). These articles are digestible and do not rely heavily on math. There are 3 parts: Part 1(A gentle introduction to deep learning that covers core concepts and vocabulary). Part2 ( History of deep learning and methods of training deep learning architectures quickly and efficiently) Part 3 (Sequence learning with a focus on natural language processing)
TensorFlow – Large-Scale Machine Learning on Heterogeneous Distributed Systems by Google Research (2015). How TensorFlow works.

Development

Caffe – Deep learning framework.

Bot Development

Alexa Skill Kit – Library for effortless Alexa Skill development with AWS Lambda
Facebook Messenger chatbot boilerplate – PHP Facebook Messenger chatbot boilerplate
Facebook Messenger wit.ai node.js boilerplate -Facebook Messenger wit.ai node.js boilerplate
Telegram Bot API PHP SDK – Telegram Bot API PHP SDK. Supports Laravel out of the box
Wechaty – Wechaty is a Bot Framework for Wechat Personal Account which can help you create a bot
Node.js Messenger Bot – A Node client for the Facebook Messenger Platform
BootBot – Facebook Messenger Bot Framework for Node.js
Ruby Telegram bot boilerplate
python-telegram-bot – This library provides a pure Python interface for the Telegram Bot API

Haskell C++ Java Julia Javascript

Twitter-text – Twitter’s text processing library
natural – General natural language facilities for node
Clustering.js – Clustering algorithms implemented for Node.js and the browser
Kmeans.js – Implementation of the k-means algorithm, for node.js and the browser
sylvester – Vector and Matrix math for JavaScript.
DN2A – Digital Neural Networks Architecture
Knwl.js – A Natural Language Processor in JS
NLP Compromise – Natural Language processing in the browser
science.js – Scientific and statistical computing in JavaScript.
Machine Learning – Machine learning library for Node.js
machineJS – Automated machine learning, data formatting, ensembling, and hyperparameter optimization for competitions and exploration.
Node-fann – FANN (Fast Artificial Neural Network Library) bindings for Node.js
brain.js – Neural Networks
Synaptic – Neural Networks
Natural – Natural Language Processing
ConvNetJS – Convolutional Neural Networks
mljs – A set of sub-libraries with a variety of functions
Neataptic – Neural Networks
Webdnn – Deep Learning

Python

Lasagne – Lightweight Python library for deep learning (built on Theano).

PHP R TensorFlow

News

AI Weekly — a weekly collection news and resources on AI and ML
Approximately Correct — AI and Machine Learning blog
Axiomzen — AI newsletter delivered every 2 weeks
Concerning.ai — AI commentators
Fast.ai — dedicated to making the power of deep learning accessible to all
Machinelearning.ai — dedicated news and updates for ML and AI
Machine Learning Weekly — a hand-curated newsletter ML and DL
Artificial Intelligence News — ScienceDaily -Artificial Intelligence News. Everything on AI including futuristic robots with artificial intelligence, computer models of human intelligence and more.

Podcast

Podcast with Yoshua Bengio – The Rise of Neural Networks and Deep Learning in Our Everyday Lives. An exciting overview of the power of neural networks as well as their current influence and future potential.

Events and Conferences

The AI Conference — an annual event where leading AI researchers and top industry practitioners meet and collaborate
The AI Forum — Montreal based AI conference
Artificial Intelligence Conference — Bootstrap Labs Venture firm
Events.ai — the one stop shop for AI/ML/DL events and conferences
Nucl.ai — game AI conference and courses
Chatbot Summit – Chatbot Summit Berlin is the second international Chatbot Summit destined to bring together the leading players of the newly formed Chatbot economy
Deep learning Google Group – Where deep learning enthusiasts and researchers hangout and share latest news.
Deep learning research groups – A list of many of the academic and industry labs focused on deep learning.

Location

Amsterdam — AI community and events
Berlin — AI community and events
Beijing – AI community and events
Brisbane – AI community and events
Hamburg — AI community and events
Hongkong — AI community and events
London — AI community and events
Madrid — AI community and events
Melbourne – AI community and events
Milan — AI community and events
New York — AI community and events
Oslo — AI community and events
San Francisco AI meetup – A local meetup for AI enthusiasts and researchers that we’re involved in.
Seattle — AI community and events
Shanghai — AI community and events
Shenzhen — AI community and events
Singapore — AI community and events
Stockholm — AI community and events
Sydney – AI community and events
Chatbots NYC – Meetup in New York City
Viv — like Siri but 10x better
x.ai— meeting scheduling assistant
Zoom.ai — personal assistant to help you at work

AI, ML & Big Data

1. Machine Learning – Part I

Table of Contents

Frameworks and Libraries

Tools

APL

General-Purpose Machine Learning

C

General-Purpose Machine Learning

Computer Vision

C++

Computer Vision

General-Purpose Machine Learning

Natural Language Processing

Speech Recognition

Sequence Analysis

Gesture Detection

Common Lisp

General-Purpose Machine Learning

Clojure

Natural Language Processing

General-Purpose Machine Learning

Deep Learning

Data Analysis

Data Visualization

Interop

Misc

Extra

Crystal

General-Purpose Machine Learning

Elixir

General-Purpose Machine Learning

Natural Language Processing

Erlang

General-Purpose Machine Learning

Fortran

General-Purpose Machine Learning

Data Analysis / Data Visualization

Go

Natural Language Processing

General-Purpose Machine Learning

Spatial analysis and geometry

Data Analysis / Data Visualization

Computer vision

Reinforcement learning

Haskell

General-Purpose Machine Learning

Java

Natural Language Processing

General-Purpose Machine Learning

Speech Recognition

Data Analysis / Data Visualization

Deep Learning

Javascript

Natural Language Processing

Data Analysis / Data Visualization

General-Purpose Machine Learning

Misc

Demos and Scripts

Julia

General-Purpose Machine Learning

Natural Language Processing

Data Analysis / Data Visualization

Misc Stuff / Presentations

Lua

General-Purpose Machine Learning

Demos and Scripts

Matlab

Computer Vision

Natural Language Processing

General-Purpose Machine Learning

Data Analysis / Data Visualization

.NET

Computer Vision

Natural Language Processing

General-Purpose Machine Learning

Data Analysis / Data Visualization

Objective C

General-Purpose Machine Learning

OCaml