AI, ML & Big Data
1. Machine Learning – Part I
Reproduced from GitHub https://github.com/
A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php
Further resources :
- For a list of free machine learning books available for download, go here.
- For a list of professional machine learning events, go here.
- For a list of (mostly) free machine learning courses available online, go here.
- For a list of blogs and newsletters on data science and machine learning, go here.
- For a list of free-to-attend meetups and local events, go here.
Table of Contents
Frameworks and Libraries
- Awesome Machine Learning
Tools
APL
General-Purpose Machine Learning
- naive-apl – Naive Bayesian Classifier implementation in APL. [Deprecated]
C
General-Purpose Machine Learning
- Darknet – Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.
- Recommender – A C library for product recommendations/suggestions using collaborative filtering (CF).
- Hybrid Recommender System – A hybrid recommender system based upon scikit-learn algorithms. [Deprecated]
- neonrvm – neonrvm is an open source machine learning library based on RVM technique. It’s written in C programming language and comes with Python programming language bindings.
- cONNXr – An ONNX runtime written in pure C (99) with zero dependencies focused on small embedded devices. Run inference on your machine learning models no matter which framework you train it with. Easy to install and compiles everywhere, even in very old devices.
- libonnx – A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.
Computer Vision
- CCV – C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.
- VLFeat – VLFeat is an open and portable library of computer vision algorithms, which has a Matlab toolbox.
C++
Computer Vision
- DLib – DLib has C++ and Python interfaces for face detection and training general object detectors.
- EBLearn – Eblearn is an object-oriented C++ library that implements various machine learning models [Deprecated]
- OpenCV – OpenCV has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
- VIGRA – VIGRA is a genertic cross-platform C++ computer vision and machine learning library for volumes of arbitrary dimensionality with Python bindings.
- Openpose – A real-time multi-person keypoint detection library for body, face, hands, and foot estimation
General-Purpose Machine Learning
- BanditLib – A simple Multi-armed Bandit library. [Deprecated]
- Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind. [DEEP LEARNING]
- CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box. It is easy to install, contains fast inference implementation and supports CPU and GPU (even multi-GPU) computation.
- CNTK – The Computational Network Toolkit (CNTK) by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph.
- CUDA – This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
- DeepDetect – A machine learning API and server written in C++11. It makes state of the art machine learning easy to work with and integrate into existing applications.
- Distributed Machine learning Tool Kit (DMTK) – A distributed machine learning (parameter server) framework by Microsoft. Enables training models on large data sets across multiple machines. Current tools bundled with it include: LightLDA and Distributed (Multisense) Word Embedding.
- DLib – A suite of ML tools designed to be easy to imbed in other applications.
- DSSTNE – A software library created by Amazon for training and deploying deep neural networks using GPUs which emphasizes speed and scale over experimental flexibility.
- DyNet – A dynamic neural network library working well with networks that have dynamic structures that change for every training instance. Written in C++ with bindings in Python.
- Fido – A highly-modular C++ machine learning library for embedded electronics and robotics.
- igraph – General purpose graph library.
- Intel(R) DAAL – A high performance software library developed by Intel and optimized for Intel’s architectures. Library provides algorithmic building blocks for all stages of data analytics and allows to process data in batch, online and distributed modes.
- LightGBM – Microsoft’s fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
- libfm – A generic approach that allows to mimic most factorization models by feature engineering.
- MLDB – The Machine Learning Database is a database designed for machine learning. Send it commands over a RESTful API to store data, explore it using SQL, then train machine learning models and expose them as APIs.
- mlpack – A scalable C++ machine learning library.
- MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
- ParaMonte – A general-purpose library with C/C++ interface for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
- proNet-core – A general-purpose network embedding framework: pair-wise representations optimization Network Edit.
- PyCUDA – Python interface to CUDA
- ROOT – A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualization and storage.
- shark – A fast, modular, feature-rich open-source C++ machine learning library.
- Shogun – The Shogun Machine Learning Toolbox.
- sofia-ml – Suite of fast incremental algorithms.
- Stan – A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling.
- Timbl – A software package/C++ library implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification, and IGTree, a decision-tree approximation of IB1-IG. Commonly used for NLP.
- Vowpal Wabbit (VW) – A fast out-of-core learning system.
- Warp-CTC – A fast parallel implementation of Connectionist Temporal Classification (CTC), on both CPU and GPU.
- XGBoost – A parallelized optimized general purpose gradient boosting library.
- ThunderGBM – A fast library for GBDTs and Random Forests on GPUs.
- ThunderSVM – A fast SVM library on GPUs and CPUs.
- LKYDeepNN – A header-only C++11 Neural Network library. Low dependency, native traditional chinese document.
- xLearn – A high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertising and recommender systems.
- Featuretools – A library for automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning using reusable feature engineering “primitives”.
- skynet – A library for learning neural networks, has C-interface, net set in JSON. Written in C++ with bindings in Python, C++ and C#.
- Feast – A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.
- Hopsworks – A data-intensive platform for AI with the industry’s first open-source feature store. The Hopsworks Feature Store provides both a feature warehouse for training and batch based on Apache Hive and a feature serving database, based on MySQL Cluster, for online applications.
- Polyaxon – A platform for reproducible and scalable machine learning and deep learning.
Natural Language Processing
- BLLIP Parser – BLLIP Natural Language Parser (also known as the Charniak-Johnson parser).
- colibri-core – C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
- CRF++ – Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks. [Deprecated]
- CRFsuite – CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data. [Deprecated]
- frog – Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer.
- libfolia – C++ library for the FoLiA format
- MeTA – MeTA : ModErn Text Analysis is a C++ Data Sciences Toolkit that facilitates mining big text data.
- MIT Information Extraction Toolkit – C, C++, and Python tools for named entity recognition and relation extraction
- ucto – Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format.
Speech Recognition
- Kaldi – Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.
Sequence Analysis
- ToPS – This is an object-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet. [Deprecated]
Gesture Detection
- grt – The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition.
Common Lisp
General-Purpose Machine Learning
- mgl – Neural networks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes.
- mgl-gpr – Evolutionary algorithms. [Deprecated]
- cl-libsvm – Wrapper for the libsvm support vector machine library. [Deprecated]
- cl-online-learning – Online learning algorithms (Perceptron, AROW, SCW, Logistic Regression).
- cl-random-forest – Implementation of Random Forest in Common Lisp.
Clojure
Natural Language Processing
- Clojure-openNLP – Natural Language Processing in Clojure (opennlp).
- Infections-clj – Rails-like inflection library for Clojure and ClojureScript.
General-Purpose Machine Learning
- tech.ml – A machine learning platform based on tech.ml.dataset, supporting not just ml algorithms, but also relevant ETL processing; wraps multiple machine learning libraries
- clj-ml – A machine learning library for Clojure built on top of Weka and friends.
- clj-boost – Wrapper for XGBoost
- Touchstone – Clojure A/B testing library.
- Clojush – The Push programming language and the PushGP genetic programming system implemented in Clojure.
- lambda-ml – Simple, concise implementations of machine learning techniques and utilities in Clojure.
- Infer – Inference and machine learning in Clojure. [Deprecated]
- Encog – Clojure wrapper for Encog (v3) (Machine-Learning framework that specializes in neural-nets). [Deprecated]
- Fungp – A genetic programming library for Clojure. [Deprecated]
- Statistiker – Basic Machine Learning algorithms in Clojure. [Deprecated]
- clortex – General Machine Learning library using Numenta’s Cortical Learning Algorithm. [Deprecated]
- comportex – Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm. [Deprecated]
Deep Learning
- MXNet – Bindings to Apache MXNet – part of the MXNet project
- Deep Diamond – A fast Clojure Tensor & Deep Learning library
- jutsu.ai – Clojure wrapper for deeplearning4j with some added syntactic sugar.
- cortex – Neural networks, regression and feature learning in Clojure.
- Flare – Dynamic Tensor Graph library in Clojure (think PyTorch, DynNet, etc.)
- dl4clj – Clojure wrapper for Deeplearning4j.
Data Analysis
- tech.ml.dataset – Clojure dataframe library and pipeline for data processing and machine learning
- Tablecloth – A dataframe grammar wrapping tech.ml.dataset, inspired by several R libraries
- Panthera – Clojure API wrapping Python’s Pandas library
- Incanter – Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
- PigPen – Map-Reduce for Clojure.
- Geni – a Clojure dataframe library that runs on Apache Spark
Data Visualization
- Hanami : Clojure(Script) library and framework for creating interactive visualization applications based in Vega-Lite (VGL) and/or Vega (VG) specifications. Automatic framing and layouts along with a powerful templating system for abstracting visualization specs
- Saite – Clojure(Script) client/server application for dynamic interactive explorations and the creation of live shareable documents capturing them using Vega/Vega-Lite, CodeMirror, markdown, and LaTeX
- Oz – Data visualisation using Vega/Vega-Lite and Hiccup, and a live-reload platform for literate-programming
- Envision – Clojure Data Visualisation library, based on Statistiker and D3.
- Pink Gorilla Notebook – A Clojure/Clojurescript notebook application/-library based on Gorilla-REPL
- clojupyter – A Jupyter kernel for Clojure – run Clojure code in Jupyter Lab, Notebook and Console.
- notespace – Notebook experience in your Clojure namespace
- Delight – A listener that streams your spark events logs to delight, a free and improved spark UI
Interop
- Java Interop – Clojure has Native Java Interop from which Java’s ML ecosystem can be accessed
- JavaScript Interop – ClojureScript has Native JavaScript Interop from which JavaScript’s ML ecosystem can be accessed
- Libpython-clj – Interop with Python
- ClojisR – Interop with R and Renjin (R on the JVM)
Misc
- Neanderthal – Fast Clojure Matrix Library (native CPU, GPU, OpenCL, CUDA)
- kixistats – A library of statistical distribution sampling and transducing functions
- fastmath – A collection of functions for mathematical and statistical computing, macine learning, etc., wrapping several JVM libraries
- matlib – a Clojure library of optimisation and control theory tools and convenience functions based on Neanderthal.
Extra
- Scicloj – Curated list of ML related resources for Clojure.
Crystal
General-Purpose Machine Learning
- machine – Simple machine learning algorithm.
- crystal-fann – FANN (Fast Artificial Neural Network) binding.
Elixir
General-Purpose Machine Learning
- Simple Bayes – A Simple Bayes / Naive Bayes implementation in Elixir.
- emel – A simple and functional machine learning library written in Elixir.
- Tensorflex – Tensorflow bindings for the Elixir programming language.
Natural Language Processing
- Stemmer – An English (Porter2) stemming implementation in Elixir.
Erlang
General-Purpose Machine Learning
- Disco – Map Reduce in Erlang. [Deprecated]
Fortran
General-Purpose Machine Learning
- neural-fortran – A parallel neural net microframework. Read the paper here.
Data Analysis / Data Visualization
- ParaMonte – A general-purpose Fortran library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
Go
Natural Language Processing
- snowball – Snowball Stemmer for Go.
- word-embedding – Word Embeddings: the full implementation of word2vec, GloVe in Go.
- sentences – Golang implementation of Punkt sentence tokenizer.
- go-ngram – In-memory n-gram index with compression. [Deprecated]
- paicehusk – Golang implementation of the Paice/Husk Stemming Algorithm. [Deprecated]
- go-porterstemmer – A native Go clean room implementation of the Porter Stemming algorithm. [Deprecated]
General-Purpose Machine Learning
- birdland – A recommendation library in Go.
- eaopt – An evolutionary optimization library.
- leaves – A pure Go implementation of the prediction part of GBRTs, including XGBoost and LightGBM.
- gobrain – Neural Networks written in Go.
- go-featureprocessing – Fast and convenient feature processing for low latency machine learning in Go.
- go-mxnet-predictor – Go binding for MXNet c_predict_api to do inference with a pre-trained model.
- go-ml-benchmarks — benchmarks of machine learning inference for Go
- go-ml-transpiler – An open source Go transpiler for machine learning models.
- golearn – Machine learning for Go.
- goml – Machine learning library written in pure Go.
- gorgonia – Deep learning in Go.
- goro – A high-level machine learning library in the vein of Keras.
- gorse – An offline recommender system backend based on collaborative filtering written in Go.
- therfoo – An embedded deep learning library for Go.
- neat – Plug-and-play, parallel Go framework for NeuroEvolution of Augmenting Topologies (NEAT). [Deprecated]
- go-pr – Pattern recognition package in Go lang. [Deprecated]
- go-ml – Linear / Logistic regression, Neural Networks, Collaborative Filtering and Gaussian Multivariate Distribution. [Deprecated]
- GoNN – GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN. [Deprecated]
- bayesian – Naive Bayesian Classification for Golang. [Deprecated]
- go-galib – Genetic Algorithms library written in Go / Golang. [Deprecated]
- Cloudforest – Ensembles of decision trees in Go/Golang. [Deprecated]
- go-dnn – Deep Neural Networks for Golang (powered by MXNet)
Spatial analysis and geometry
Data Analysis / Data Visualization
- dataframe-go – Dataframes for machine-learning and statistics (similar to pandas).
- gota – Dataframes.
- gonum/mat – A linear algebra package for Go.
- gonum/optimize – Implementations of optimization algorithms.
- gonum/plot – A plotting library.
- gonum/stat – A statistics library.
- SVGo – The Go Language library for SVG generation.
- glot – Glot is a plotting library for Golang built on top of gnuplot.
- globe – Globe wireframe visualization.
- gonum/graph – General-purpose graph library.
- go-graph – Graph library for Go/Golang language. [Deprecated]
- RF – Random forests implementation in Go. [Deprecated]
Computer vision
- GoCV – Package for computer vision using OpenCV 4 and beyond.
Reinforcement learning
- gold – A reinforcement learning library.
Haskell
General-Purpose Machine Learning
- haskell-ml – Haskell implementations of various ML algorithms. [Deprecated]
- HLearn – a suite of libraries for interpreting machine learning models according to their algebraic structure. [Deprecated]
- hnn – Haskell Neural Network library.
- hopfield-networks – Hopfield Networks for unsupervised learning in Haskell. [Deprecated]
- DNNGraph – A DSL for deep neural networks. [Deprecated]
- LambdaNet – Configurable Neural Networks in Haskell. [Deprecated]
Java
Natural Language Processing
- Cortical.io – Retina: an API performing complex NLP operations (disambiguation, classification, streaming text filtering, etc…) as quickly and intuitively as the brain.
- IRIS – Cortical.io’s FREE NLP, Retina API Analysis Tool (written in JavaFX!) – See the Tutorial Video.
- CoreNLP – Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words.
- Stanford Parser – A natural language parser is a program that works out the grammatical structure of sentences.
- Stanford POS Tagger – A Part-Of-Speech Tagger (POS Tagger).
- Stanford Name Entity Recognizer – Stanford NER is a Java implementation of a Named Entity Recognizer.
- Stanford Word Segmenter – Tokenization of raw text is a standard pre-processing step for many NLP tasks.
- Tregex, Tsurgeon and Semgrex – Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”).
- Stanford Phrasal: A Phrase-Based Translation System
- Stanford English Tokenizer – Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.
- Stanford Tokens Regex – A tokenizer divides text into a sequence of tokens, which roughly correspond to “words”.
- Stanford Temporal Tagger – SUTime is a library for recognizing and normalizing time expressions.
- Stanford SPIED – Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion.
- Twitter Text Java – A Java implementation of Twitter’s text processing library.
- MALLET – A Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
- OpenNLP – a machine learning based toolkit for the processing of natural language text.
- LingPipe – A tool kit for processing text using computational linguistics.
- ClearTK – ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA. [Deprecated]
- Apache cTAKES – Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text.
- NLP4J – The NLP4J project provides software and resources for natural language processing. The project started at the Center for Computational Language and EducAtion Research, and is currently developed by the Center for Language and Information Research at Emory University. [Deprecated]
- CogcompNLP – This project collects a number of core libraries for Natural Language Processing (NLP) developed in the University of Illinois’ Cognitive Computation Group, for example illinois-core-utilities which provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc, illinois-edison a library for feature extraction from illinois-core-utilities data structures and many other packages.
General-Purpose Machine Learning
- aerosolve – A machine learning library by Airbnb designed from the ground up to be human friendly.
- AMIDST Toolbox – A Java Toolbox for Scalable Probabilistic Machine Learning.
- Datumbox – Machine Learning framework for rapid development of Machine Learning and Statistical applications.
- ELKI – Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
- Encog – An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
- FlinkML in Apache Flink – Distributed machine learning library in Flink.
- H2O – ML engine that supports distributed learning on Hadoop, Spark or your laptop via APIs in R, Python, Scala, REST/JSON.
- htm.java – General Machine Learning library using Numenta’s Cortical Learning Algorithm.
- liblinear-java – Java version of liblinear.
- Mahout – Distributed machine learning.
- Meka – An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
- MLlib in Apache Spark – Distributed machine learning library in Spark
- Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
- Neuroph – Neuroph is lightweight Java neural network framework
- ORYX – Lambda Architecture Framework using Apache Spark and Apache Kafka with a specialization for real-time large-scale machine learning.
- Samoa SAMOA is a framework that includes distributed machine learning for data streams with an interface to plug-in different stream processing platforms.
- RankLib – RankLib is a library of learning to rank algorithms. [Deprecated]
- rapaio – statistics, data mining and machine learning toolbox in Java.
- RapidMiner – RapidMiner integration into Java code.
- Stanford Classifier – A classifier is a machine learning tool that will take data items and place them into one of k classes.
- Smile – Statistical Machine Intelligence & Learning Engine.
- SystemML – flexible, scalable machine learning (ML) language.
- Weka – Weka is a collection of machine learning algorithms for data mining tasks.
- LBJava – Learning Based Java is a modeling language for the rapid development of software systems, offers a convenient, declarative syntax for classifier and constraint definition directly in terms of the objects in the programmer’s application.
Speech Recognition
- CMU Sphinx – Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.
Data Analysis / Data Visualization
- Flink – Open source platform for distributed stream and batch data processing.
- Hadoop – Hadoop/HDFS.
- Onyx – Distributed, masterless, high performance, fault tolerant data processing. Written entirely in Clojure.
- Spark – Spark is a fast and general engine for large-scale data processing.
- Storm – Storm is a distributed realtime computation system.
- Impala – Real-time Query for Hadoop.
- DataMelt – Mathematics software for numeric computation, statistics, symbolic calculations, data analysis and data visualization.
- Dr. Michael Thomas Flanagan’s Java Scientific Library [Deprecated]
Deep Learning
- Deeplearning4j – Scalable deep learning for industry with parallel GPUs.
- Keras Beginner Tutorial – Friendly guide on using Keras to implement a simple Neural Network in Python
Javascript
Natural Language Processing
- Twitter-text – A JavaScript implementation of Twitter’s text processing library.
- natural – General natural language facilities for node.
- Knwl.js – A Natural Language Processor in JS.
- Retext – Extensible system for analyzing and manipulating natural language.
- NLP Compromise – Natural Language processing in the browser.
- nlp.js – An NLP library built in node over Natural, with entity extraction, sentiment analysis, automatic language identify, and so more
Data Analysis / Data Visualization
- D3.js
- High Charts
- NVD3.js
- dc.js
- chartjs
- dimple
- amCharts
- D3xter – Straight forward plotting built on D3. [Deprecated]
- statkit – Statistics kit for JavaScript. [Deprecated]
- datakit – A lightweight framework for data analysis in JavaScript
- science.js – Scientific and statistical computing in JavaScript. [Deprecated]
- Z3d – Easily make interactive 3d plots built on Three.js [Deprecated]
- Sigma.js – JavaScript library dedicated to graph drawing.
- C3.js – customizable library based on D3.js for easy chart drawing.
- Datamaps – Customizable SVG map/geo visualizations using D3.js. [Deprecated]
- ZingChart – library written on Vanilla JS for big data visualization.
- cheminfo – Platform for data visualization and analysis, using the visualizer project.
- Learn JS Data
- AnyChart
- FusionCharts
- Nivo – built on top of the awesome d3 and Reactjs libraries
General-Purpose Machine Learning
- Auto ML – Automated machine learning, data formatting, ensembling, and hyperparameter optimization for competitions and exploration- just give it a .csv file!
- Convnet.js – ConvNetJS is a Javascript library for training Deep Learning models[DEEP LEARNING] [Deprecated]
- Clusterfck – Agglomerative hierarchical clustering implemented in Javascript for Node.js and the browser. [Deprecated]
- Clustering.js – Clustering algorithms implemented in Javascript for Node.js and the browser. [Deprecated]
- Decision Trees – NodeJS Implementation of Decision Tree using ID3 Algorithm. [Deprecated]
- DN2A – Digital Neural Networks Architecture. [Deprecated]
- figue – K-means, fuzzy c-means and agglomerative clustering.
- Gaussian Mixture Model – Unsupervised machine learning with multivariate Gaussian mixture model.
- Node-fann – FANN (Fast Artificial Neural Network Library) bindings for Node.js [Deprecated]
- Keras.js – Run Keras models in the browser, with GPU support provided by WebGL 2.
- Kmeans.js – Simple Javascript implementation of the k-means algorithm, for node.js and the browser. [Deprecated]
- LDA.js – LDA topic modeling for Node.js
- Learning.js – Javascript implementation of logistic regression/c4.5 decision tree [Deprecated]
- machinelearn.js – Machine Learning library for the web, Node.js and developers
- mil-tokyo – List of several machine learning libraries.
- Node-SVM – Support Vector Machine for Node.js
- Brain – Neural networks in JavaScript [Deprecated]
- Brain.js – Neural networks in JavaScript – continued community fork of Brain.
- Bayesian-Bandit – Bayesian bandit implementation for Node and the browser. [Deprecated]
- Synaptic – Architecture-free neural network library for Node.js and the browser.
- kNear – JavaScript implementation of the k nearest neighbors algorithm for supervised learning.
- NeuralN – C++ Neural Network library for Node.js. It has advantage on large dataset and multi-threaded training. [Deprecated]
- kalman – Kalman filter for Javascript. [Deprecated]
- shaman – Node.js library with support for both simple and multiple linear regression. [Deprecated]
- ml.js – Machine learning and numerical analysis tools for Node.js and the Browser!
- ml5 – Friendly machine learning for the web!
- Pavlov.js – Reinforcement learning using Markov Decision Processes.
- MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
- TensorFlow.js – A WebGL accelerated, browser based JavaScript library for training and deploying ML models.
- JSMLT – Machine learning toolkit with classification and clustering for Node.js; supports visualization (see visualml.io).
- xgboost-node – Run XGBoost model and make predictions in Node.js.
- Netron – Visualizer for machine learning models.
- WebDNN – Fast Deep Neural Network Javascript Framework. WebDNN uses next generation JavaScript API, WebGPU for GPU execution, and WebAssembly for CPU execution.
Misc
- stdlib – A standard library for JavaScript and Node.js, with an emphasis on numeric computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.
- sylvester – Vector and Matrix math for JavaScript. [Deprecated]
- simple-statistics – A JavaScript implementation of descriptive, regression, and inference statistics. Implemented in literate JavaScript with no dependencies, designed to work in all modern browsers (including IE) as well as in Node.js.
- regression-js – A javascript library containing a collection of least squares fitting methods for finding a trend in a set of data.
- Lyric – Linear Regression library. [Deprecated]
- GreatCircle – Library for calculating great circle distance.
- MLPleaseHelp – MLPleaseHelp is a simple ML resource search engine. You can use this search engine right now at https://jgreenemi.github.io/MLPleaseHelp/, provided via Github Pages.
- Pipcook – A JavaScript application framework for machine learning and its engineering.
Demos and Scripts
- The Bot – Example of how the neural network learns to predict the angle between two points created with Synaptic.
- Half Beer – Beer glass classifier created with Synaptic.
- NSFWJS – Indecent content checker with TensorFlow.js
- Rock Paper Scissors – Rock Paper Scissors trained in the browser with TensorFlow.js
Julia
General-Purpose Machine Learning
- MachineLearning – Julia Machine Learning library. [Deprecated]
- MLBase – A set of functions to support the development of machine learning algorithms.
- PGM – A Julia framework for probabilistic graphical models.
- DA – Julia package for Regularized Discriminant Analysis.
- Regression – Algorithms for regression analysis (e.g. linear regression and logistic regression). [Deprecated]
- Local Regression – Local regression, so smooooth!
- Naive Bayes – Simple Naive Bayes implementation in Julia. [Deprecated]
- Mixed Models – A Julia package for fitting (statistical) mixed-effects models.
- Simple MCMC – basic mcmc sampler implemented in Julia. [Deprecated]
- Distances – Julia module for Distance evaluation.
- Decision Tree – Decision Tree Classifier and Regressor.
- Neural – A neural network in Julia.
- MCMC – MCMC tools for Julia. [Deprecated]
- Mamba – Markov chain Monte Carlo (MCMC) for Bayesian analysis in Julia.
- GLM – Generalized linear models in Julia.
- Gaussian Processes – Julia package for Gaussian processes.
- Online Learning [Deprecated]
- GLMNet – Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet.
- Clustering – Basic functions for clustering data: k-means, dp-means, etc.
- SVM – SVM for Julia. [Deprecated]
- Kernel Density – Kernel density estimators for julia.
- MultivariateStats – Methods for dimensionality reduction.
- NMF – A Julia package for non-negative matrix factorization.
- ANN – Julia artificial neural networks. [Deprecated]
- Mocha – Deep Learning framework for Julia inspired by Caffe. [Deprecated]
- XGBoost – eXtreme Gradient Boosting Package in Julia.
- ManifoldLearning – A Julia package for manifold learning and nonlinear dimensionality reduction.
- MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
- Merlin – Flexible Deep Learning Framework in Julia.
- ROCAnalysis – Receiver Operating Characteristics and functions for evaluation probabilistic binary classifiers.
- GaussianMixtures – Large scale Gaussian Mixture Models.
- ScikitLearn – Julia implementation of the scikit-learn API.
- Knet – Koç University Deep Learning Framework.
- Flux – Relax! Flux is the ML library that doesn’t make you tensor
- MLJ – A Julia machine learning framework
Natural Language Processing
- Topic Models – TopicModels for Julia. [Deprecated]
- Text Analysis – Julia package for text analysis.
- Word Tokenizers – Tokenizers for Natural Language Processing in Julia
- Corpus Loaders – A julia package providing a variety of loaders for various NLP corpora.
- Embeddings – Functions and data dependencies for loading various word embeddings
- Languages – Julia package for working with various human languages
- WordNet – A Julia package for Princeton’s WordNet
Data Analysis / Data Visualization
- Graph Layout – Graph layout algorithms in pure Julia.
- LightGraphs – Graph modeling and analysis.
- Data Frames Meta – Metaprogramming tools for DataFrames.
- Julia Data – library for working with tabular data in Julia. [Deprecated]
- Data Read – Read files from Stata, SAS, and SPSS.
- Hypothesis Tests – Hypothesis tests for Julia.
- Gadfly – Crafty statistical graphics for Julia.
- Stats – Statistical tests for Julia.
- RDataSets – Julia package for loading many of the data sets available in R.
- DataFrames – library for working with tabular data in Julia.
- Distributions – A Julia package for probability distributions and associated functions.
- Data Arrays – Data structures that allow missing values. [Deprecated]
- Time Series – Time series toolkit for Julia.
- Sampling – Basic sampling algorithms for Julia.
Misc Stuff / Presentations
- DSP – Digital Signal Processing (filtering, periodograms, spectrograms, window functions).
- JuliaCon Presentations – Presentations for JuliaCon.
- SignalProcessing – Signal Processing tools for Julia.
- Images – An image library for Julia.
- DataDeps – Reproducible data setup for reproducible science.
Lua
General-Purpose Machine Learning
- Torch7
- cephes – Cephes mathematical functions library, wrapped for Torch. Provides and wraps the 180+ special mathematical functions from the Cephes mathematical library, developed by Stephen L. Moshier. It is used, among many other places, at the heart of SciPy. [Deprecated]
- autograd – Autograd automatically differentiates native Torch code. Inspired by the original Python version.
- graph – Graph package for Torch. [Deprecated]
- randomkit – Numpy’s randomkit, wrapped for Torch. [Deprecated]
- signal – A signal processing toolbox for Torch-7. FFT, DCT, Hilbert, cepstrums, stft.
- nn – Neural Network package for Torch.
- torchnet – framework for torch which provides a set of abstractions aiming at encouraging code re-use as well as encouraging modular programming.
- nngraph – This package provides graphical computation for nn library in Torch7.
- nnx – A completely unstable and experimental package that extends Torch’s builtin nn library.
- rnn – A Recurrent Neural Network library that extends Torch’s nn. RNNs, LSTMs, GRUs, BRNNs, BLSTMs, etc.
- dpnn – Many useful features that aren’t part of the main nn package.
- dp – A deep learning library designed for streamlining research and development using the Torch7 distribution. It emphasizes flexibility through the elegant use of object-oriented design patterns. [Deprecated]
- optim – An optimization library for Torch. SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp and more.
- unsup – A package for unsupervised learning in Torch. Provides modules that are compatible with nn (LinearPsd, ConvPsd, AutoEncoder, …), and self-contained algorithms (k-means, PCA). [Deprecated]
- manifold – A package to manipulate manifolds.
- svm – Torch-SVM library. [Deprecated]
- lbfgs – FFI Wrapper for liblbfgs. [Deprecated]
- vowpalwabbit – An old vowpalwabbit interface to torch. [Deprecated]
- OpenGM – OpenGM is a C++ library for graphical modeling, and inference. The Lua bindings provide a simple way of describing graphs, from Lua, and then optimizing them with OpenGM. [Deprecated]
- spaghetti – Spaghetti (sparse linear) module for torch7 by @MichaelMathieu [Deprecated]
- LuaSHKit – A lua wrapper around the Locality sensitive hashing library SHKit [Deprecated]
- kernel smoothing – KNN, kernel-weighted average, local linear regression smoothers. [Deprecated]
- cutorch – Torch CUDA Implementation.
- cunn – Torch CUDA Neural Network Implementation.
- imgraph – An image/graph library for Torch. This package provides routines to construct graphs on images, segment them, build trees out of them, and convert them back to images. [Deprecated]
- videograph – A video/graph library for Torch. This package provides routines to construct graphs on videos, segment them, build trees out of them, and convert them back to videos. [Deprecated]
- saliency – code and tools around integral images. A library for finding interest points based on fast integral histograms. [Deprecated]
- stitch – allows us to use hugin to stitch images and apply same stitching to a video sequence. [Deprecated]
- sfm – A bundle adjustment/structure from motion package. [Deprecated]
- fex – A package for feature extraction in Torch. Provides SIFT and dSIFT modules. [Deprecated]
- OverFeat – A state-of-the-art generic dense feature extractor. [Deprecated]
- wav2letter – a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research.
- Numeric Lua
- Lunatic Python
- SciLua
- Lua – Numerical Algorithms [Deprecated]
- Lunum [Deprecated]
Demos and Scripts
- Core torch7 demos repository.
- linear-regression, logistic-regression
- face detector (training and detection as separate demos)
- mst-based-segmenter
- train-a-digit-classifier
- train-autoencoder
- optical flow demo
- train-on-housenumbers
- train-on-cifar
- tracking with deep nets
- kinect demo
- filter-bank visualization
- saliency-networks
- Training a Convnet for the Galaxy-Zoo Kaggle challenge(CUDA demo)
- Music Tagging – Music Tagging scripts for torch7.
- torch-datasets – Scripts to load several popular datasets including:
- BSR 500
- CIFAR-10
- COIL
- Street View House Numbers
- MNIST
- NORB
- Atari2600 – Scripts to generate a dataset with static frames from the Arcade Learning Environment.
Matlab
Computer Vision
- Contourlets – MATLAB source code that implements the contourlet transform and its utility functions.
- Shearlets – MATLAB code for shearlet transform.
- Curvelets – The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles.
- Bandlets – MATLAB code for bandlet transform.
- mexopencv – Collection and a development kit of MATLAB mex functions for OpenCV library.
Natural Language Processing
- NLP – A NLP library for Matlab.
General-Purpose Machine Learning
- Training a deep autoencoder or a classifier on MNIST digits – Training a deep autoencoder or a classifier on MNIST digits[DEEP LEARNING].
- Convolutional-Recursive Deep Learning for 3D Object Classification – Convolutional-Recursive Deep Learning for 3D Object Classification[DEEP LEARNING].
- Spider – The spider is intended to be a complete object orientated environment for machine learning in Matlab.
- LibSVM – A Library for Support Vector Machines.
- ThunderSVM – An Open-Source SVM Library on GPUs and CPUs
- LibLinear – A Library for Large Linear Classification.
- Machine Learning Module – Class on machine w/ PDF, lectures, code
- Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind.
- Pattern Recognition Toolbox – A complete object-oriented environment for machine learning in Matlab.
- Pattern Recognition and Machine Learning – This package contains the matlab implementation of the algorithms described in the book Pattern Recognition and Machine Learning by C. Bishop.
- Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly with MATLAB.
- MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
- Machine Learning in MatLab/Octave – examples of popular machine learning algorithms (neural networks, linear/logistic regressions, K-Means, etc.) with code examples and mathematics behind them being explained.
Data Analysis / Data Visualization
- ParaMonte – A general-purpose MATLAB library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
- matlab_bgl – MatlabBGL is a Matlab package for working with graphs.
- gaimc – Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL’s mex functions.
.NET
Computer Vision
- OpenCVDotNet – A wrapper for the OpenCV project to be used with .NET applications.
- Emgu CV – Cross platform wrapper of OpenCV which can be compiled in Mono to be run on Windows, Linus, Mac OS X, iOS, and Android.
- AForge.NET – Open source C# framework for developers and researchers in the fields of Computer Vision and Artificial Intelligence. Development has now shifted to GitHub.
- Accord.NET – Together with AForge.NET, this library can provide image processing and computer vision algorithms to Windows, Windows RT and Windows Phone. Some components are also available for Java and Android.
Natural Language Processing
- Stanford.NLP for .NET – A full port of Stanford NLP packages to .NET and also available precompiled as a NuGet package.
General-Purpose Machine Learning
- Accord-Framework -The Accord.NET Framework is a complete framework for building machine learning, computer vision, computer audition, signal processing and statistical applications.
- Accord.MachineLearning – Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications. This package is part of the Accord.NET Framework.
- DiffSharp – An automatic differentiation (AD) library providing exact and efficient derivatives (gradients, Hessians, Jacobians, directional derivatives, and matrix-free Hessian- and Jacobian-vector products) for machine learning and optimization applications. Operations can be nested to any level, meaning that you can compute exact higher-order derivatives and differentiate functions that are internally making use of differentiation, for applications such as hyperparameter optimization.
- Encog – An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
- GeneticSharp – Multi-platform genetic algorithm library for .NET Core and .NET Framework. The library has several implementations of GA operators, like: selection, crossover, mutation, reinsertion and termination.
- Infer.NET – Infer.NET is a framework for running Bayesian inference in graphical models. One can use Infer.NET to solve many different kinds of machine learning problems, from standard problems like classification, recommendation or clustering through to customized solutions to domain-specific problems. Infer.NET has been used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision, and many others.
- ML.NET – ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers. ML.NET was originally developed in Microsoft Research and evolved into a significant framework over the last decade and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel and more.
- Neural Network Designer – DBMS management system and designer for neural networks. The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feedback. The chat bots can even scrape the internet for information to return in their output as well as to use for learning.
- Synapses – Neural network library in F#.
- Vulpes – Deep belief and deep learning implementation written in F# and leverages CUDA GPU execution with Alea.cuBase.
- MxNet.Sharp – .NET Standard bindings for Apache MxNet with Imperative, Symbolic and Gluon Interface for developing, training and deploying Machine Learning models in C#. https://mxnet.tech-quantum.com/
Data Analysis / Data Visualization
- numl – numl is a machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering.
- Math.NET Numerics – Numerical foundation of the Math.NET project, aiming to provide methods and algorithms for numerical computations in science, engineering and everyday use. Supports .Net 4.0, .Net 3.5 and Mono on Windows, Linux and Mac; Silverlight 5, WindowsPhone/SL 8, WindowsPhone 8.1 and Windows 8 with PCL Portable Profiles 47 and 344; Android/iOS with Xamarin.
- Sho – Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.
Objective C
General-Purpose Machine Learning
- YCML – A Machine Learning framework for Objective-C and Swift (OS X / iOS).
- MLPNeuralNet – Fast multilayer perceptron neural network library for iOS and Mac OS X. MLPNeuralNet predicts new examples by trained neural networks. It is built on top of the Apple’s Accelerate Framework, using vectorized operations and hardware acceleration if available. [Deprecated]
- MAChineLearning – An Objective-C multilayer perceptron library, with full support for training through backpropagation. Implemented using vDSP and vecLib, it’s 20 times faster than its Java equivalent. Includes sample code for use from Swift.
- BPN-NeuralNetwork – It implemented 3 layers of neural networks ( Input Layer, Hidden Layer and Output Layer ) and it was named Back Propagation Neural Networks (BPN). This network can be used in products recommendation, user behavior analysis, data mining and data analysis. [Deprecated]
- Multi-Perceptron-NeuralNetwork – it implemented multi-perceptrons neural network (ニューラルネットワーク) based on Back Propagation Neural Networks (BPN) and designed unlimited-hidden-layers.
- KRHebbian-Algorithm – It is a non-supervisor and self-learning algorithm (adjust the weights) in the neural network of Machine Learning. [Deprecated]
- KRKmeans-Algorithm – It implemented K-Means clustering and classification algorithm. It could be used in data mining and image compression. [Deprecated]
- KRFuzzyCMeans-Algorithm – It implemented Fuzzy C-Means (FCM) the fuzzy clustering / classification algorithm on Machine Learning. It could be used in data mining and image compression. [Deprecated]
OCaml
General-Purpose Machine Learning
- Oml – A general statistics and machine learning library.
- GPR – Efficient Gaussian Process Regression in OCaml.
- Libra-Tk – Algorithms for learning and inference with discrete probabilistic models.
- TensorFlow – OCaml bindings for TensorFlow.
Perl
Data Analysis / Data Visualization
- Perl Data Language, a pluggable architecture for data and image processing, which can be used for machine learning.
General-Purpose Machine Learning
- MXnet for Deep Learning, in Perl, also released in CPAN.
- Perl Data Language, using AWS machine learning platform from Perl.
- Algorithm::SVMLight, implementation of Support Vector Machines with SVMLight under it. [Deprecated]
- Several machine learning and artificial intelligence models are included in the AI namespace. For instance, you can find Naïve Bayes.
Perl 6
Data Analysis / Data Visualization
- Perl Data Language, a pluggable architecture for data and image processing, which can be used for machine learning.
General-Purpose Machine Learning
PHP
Natural Language Processing
- jieba-php – Chinese Words Segmentation Utilities.
General-Purpose Machine Learning
- PHP-ML – Machine Learning library for PHP. Algorithms, Cross Validation, Neural Network, Preprocessing, Feature Extraction and much more in one library.
- PredictionBuilder – A library for machine learning that builds predictions using a linear regression.
- Rubix ML – A high-level machine learning (ML) library that lets you build programs that learn from data using the PHP language.
- 19 Questions – A machine learning / bayesian inference assigning attributes to objects.
Python
Computer Vision
- Scikit-Image – A collection of algorithms for image processing in Python.
- Jobtensor – A powerful tool for learning Python
- Scikit-Opt – Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,Artificial Fish Swarm Algorithm in Python)
- SimpleCV – An open source computer vision framework that gives access to several high-powered computer vision libraries, such as OpenCV. Written on Python and runs on Mac, Windows, and Ubuntu Linux.
- Vigranumpy – Python bindings for the VIGRA C++ computer vision library.
- OpenFace – Free and open source face recognition with deep neural networks.
- PCV – Open source Python module for computer vision. [Deprecated]
- face_recognition – Face recognition library that recognizes and manipulates faces from Python or from the command line.
- dockerface – Easy to install and use deep learning Faster R-CNN face detection for images and video in a docker container.
- Detectron – FAIR’s software system that implements state-of-the-art object detection algorithms, including Mask R-CNN. It is written in Python and powered by the Caffe2 deep learning framework. [Deprecated]
- detectron2 – FAIR’s next-generation research platform for object detection and segmentation. It is a ground-up rewrite of the previous version, Detectron, and is powered by the PyTorch deep learning framework.
- albumentations – А fast and framework agnostic image augmentation library that implements a diverse set of augmentation techniques. Supports classification, segmentation, detection out of the box. Was used to win a number of Deep Learning competitions at Kaggle, Topcoder and those that were a part of the CVPR workshops.
- pytessarct – Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images. Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine.
- imutils – A library containing Convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.
- PyTorchCV – A PyTorch-Based Framework for Deep Learning in Computer Vision.
- Self-supervised learning
- neural-style-pt – A PyTorch implementation of Justin Johnson’s neural-style (neural style transfer).
- Detecto – Train and run a computer vision model with 5-10 lines of code.
- neural-dream – A PyTorch implementation of DeepDream.
- Openpose – A real-time multi-person keypoint detection library for body, face, hands, and foot estimation
- Deep High-Resolution-Net – A PyTorch implementation of CVPR2019 paper “Deep High-Resolution Representation Learning for Human Pose Estimation”
- dream-creator – A PyTorch implementation of DeepDream. Allows individuals to quickly and easily train their own custom GoogleNet models with custom datasets for DeepDream.
- Lucent – Tensorflow and OpenAI Clarity’s Lucid adapted for PyTorch.
- lightly – Lightly is a computer vision framework for self-supervised learning.
- Learnergy – Energy-based machine learning models built upon PyTorch.
Natural Language Processing
- pkuseg-python – A better version of Jieba, developed by Peking University.
- NLTK – A leading platform for building Python programs to work with human language data.
- Pattern – A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
- Quepy – A python framework to transform natural language questions to queries in a database query language.
- TextBlob – Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
- YAlign – A sentence aligner, a friendly tool for extracting parallel sentences from comparable corpora. [Deprecated]
- jieba – Chinese Words Segmentation Utilities.
- SnowNLP – A library for processing Chinese text.
- spammy – A library for email Spam filtering built on top of nltk
- loso – Another Chinese segmentation library. [Deprecated]
- genius – A Chinese segment based on Conditional Random Field.
- KoNLPy – A Python package for Korean natural language processing.
- nut – Natural language Understanding Toolkit. [Deprecated]
- Rosetta – Text processing tools and wrappers (e.g. Vowpal Wabbit)
- BLLIP Parser – Python bindings for the BLLIP Natural Language Parser (also known as the Charniak-Johnson parser). [Deprecated]
- PyNLPl – Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for FoLiA, but also ARPA language models, Moses phrasetables, GIZA++ alignments.
- PySS3 – Python package that implements a novel white-box machine learning model for text classification, called SS3. Since SS3 has the ability to visually explain its rationale, this package also comes with easy-to-use interactive visualizations tools (online demos).
- python-ucto – Python binding to ucto (a unicode-aware rule-based tokenizer for various languages).
- python-frog – Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
- python-zpar – Python bindings for ZPar, a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.
- colibri-core – Python binding to C++ library for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
- spaCy – Industrial strength NLP with Python and Cython.
- PyStanfordDependencies – Python interface for converting Penn Treebank trees to Stanford Dependencies.
- Distance – Levenshtein and Hamming distance computation. [Deprecated]
- Fuzzy Wuzzy – Fuzzy String Matching in Python.
- jellyfish – a python library for doing approximate and phonetic matching of strings.
- editdistance – fast implementation of edit distance.
- textacy – higher-level NLP built on Spacy.
- stanford-corenlp-python – Python wrapper for Stanford CoreNLP [Deprecated]
- CLTK – The Classical Language Toolkit.
- Rasa – A “machine learning framework to automate text-and voice-based conversations.”
- yase – Transcode sentence (or other sequence) to list of word vector .
- Polyglot – Multilingual text (NLP) processing toolkit.
- DrQA – Reading Wikipedia to answer open-domain questions.
- Dedupe – A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
- Snips NLU – Natural Language Understanding library for intent classification and entity extraction
- NeuroNER – Named-entity recognition using neural networks providing state-of-the-art-results
- DeepPavlov – conversational AI library with many pre-trained Russian NLP models.
- BigARTM – topic modelling platform.
- NALP – A Natural Adversarial Language Processing framework built over Tensorflow.
General-Purpose Machine Learning
- Shapley -> A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
- igel -> A delightful machine learning tool that allows you to train/fit, test and use models without writing code
- ML Model building -> A Repository Containing Classification, Clustering, Regression, Recommender Notebooks with illustration to make them.
- ML/DL project template
- PyTorch Geometric Temporal -> A temporal extension of PyTorch Geometric for dynamic graph representation learning.
- Little Ball of Fur -> A graph sampling extension library for NetworkX with a Scikit-Learn like API.
- Karate Club -> An unsupervised machine learning extension library for NetworkX with a Scikit-Learn like API.
- Auto_ViML -> Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced “auto vimal”, is a comprehensive and scalable Python AutoML toolkit with imbalanced handling, ensembling, stacking and built-in feature selection. Featured in Medium article.
- PyOD -> Python Outlier Detection, comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. Featured for Advanced models, including Neural Networks/Deep Learning and Outlier Ensembles.
- steppy -> Lightweight, Python library for fast and reproducible machine learning experimentation. Introduces a very simple interface that enables clean machine learning pipeline design.
- steppy-toolkit -> Curated collection of the neural networks, transformers and models that make your machine learning work faster and more effective.
- CNTK – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit. Documentation can be found here.
- Couler – Unified interface for constructing and managing machine learning workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
- auto_ml – Automated machine learning for production and analytics. Lets you focus on the fun parts of ML, while outputting production-ready code, and detailed analytics of your dataset and results. Includes support for NLP, XGBoost, CatBoost, LightGBM, and soon, deep learning.
- machine learning – automated build consisting of a web-interface, and set of programmatic-interface API, for support vector machines. Corresponding dataset(s) are stored into a SQL database, then generated model(s) used for prediction(s), are stored into a NoSQL datastore.
- XGBoost – Python bindings for eXtreme Gradient Boosting (Tree) Library.
- Apache SINGA – An Apache Incubating project for developing an open source machine learning library.
- Bayesian Methods for Hackers – Book/iPython notebooks on Probabilistic Programming in Python.
- Featureforge A set of tools for creating and testing machine learning features, with a scikit-learn compatible API.
- MLlib in Apache Spark – Distributed machine learning library in Spark
- Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
- scikit-learn – A Python module for machine learning built on top of SciPy.
- metric-learn – A Python module for metric learning.
- SimpleAI Python implementation of many of the artificial intelligence algorithms described in the book “Artificial Intelligence, a Modern Approach”. It focuses on providing an easy to use, well documented and tested library.
- astroML – Machine Learning and Data Mining for Astronomy.
- graphlab-create – A library with various machine learning models (regression, clustering, recommender systems, graph analytics, etc.) implemented on top of a disk-backed DataFrame.
- BigML – A library that contacts external servers.
- pattern – Web mining module for Python.
- NuPIC – Numenta Platform for Intelligent Computing.
- Pylearn2 – A Machine Learning library based on Theano. [Deprecated]
- keras – High-level neural networks frontend for TensorFlow, CNTK and Theano.
- Lasagne – Lightweight library to build and train neural networks in Theano.
- hebel – GPU-Accelerated Deep Learning Library in Python. [Deprecated]
- Chainer – Flexible neural network framework.
- prophet – Fast and automated time series forecasting framework by Facebook.
- gensim – Topic Modelling for Humans.
- topik – Topic modelling toolkit. [Deprecated]
- PyBrain – Another Python Machine Learning Library.
- Brainstorm – Fast, flexible and fun neural networks. This is the successor of PyBrain.
- Surprise – A scikit for building and analyzing recommender systems.
- implicit – Fast Python Collaborative Filtering for Implicit Datasets.
- LightFM – A Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback.
- Crab – A flexible, fast recommender engine. [Deprecated]
- python-recsys – A Python library for implementing a Recommender System.
- thinking bayes – Book on Bayesian Analysis.
- Image-to-Image Translation with Conditional Adversarial Networks – Implementation of image to image (pix2pix) translation from the paper by isola et al.[DEEP LEARNING]
- Restricted Boltzmann Machines -Restricted Boltzmann Machines in Python. [DEEP LEARNING]
- Bolt – Bolt Online Learning Toolbox. [Deprecated]
- CoverTree – Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree [Deprecated]
- nilearn – Machine learning for NeuroImaging in Python.
- neuropredict – Aimed at novice machine learners and non-expert programmers, this package offers easy (no coding needed) and comprehensive machine learning (evaluation and full report of predictive performance WITHOUT requiring you to code) in Python for NeuroImaging and any other type of features. This is aimed at absorbing much of the ML workflow, unlike other packages like nilearn and pymvpa, which require you to learn their API and code to produce anything useful.
- imbalanced-learn – Python module to perform under sampling and oversampling with various techniques.
- Shogun – The Shogun Machine Learning Toolbox.
- Pyevolve – Genetic algorithm framework. [Deprecated]
- Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind.
- breze – Theano based library for deep and recurrent neural networks.
- Cortex – Open source platform for deploying machine learning models in production.
- pyhsmm – library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
- SKLL – A wrapper around scikit-learn that makes it simpler to conduct experiments.
- neurolab
- Spearmint – Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012. [Deprecated]
- Pebl – Python Environment for Bayesian Learning. [Deprecated]
- Theano – Optimizing GPU-meta-programming code generating array oriented optimizing math compiler in Python.
- TensorFlow – Open source software library for numerical computation using data flow graphs.
- pomegranate – Hidden Markov Models for Python, implemented in Cython for speed and efficiency.
- python-timbl – A Python extension module wrapping the full TiMBL C++ programming interface. Timbl is an elaborate k-Nearest Neighbours machine learning toolkit.
- deap – Evolutionary algorithm framework.
- pydeep – Deep Learning In Python. [Deprecated]
- mlxtend – A library consisting of useful tools for data science and machine learning tasks.
- neon – Nervana’s high-performance Python-based Deep Learning framework [DEEP LEARNING]. [Deprecated]
- Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search.
- Neural Networks and Deep Learning – Code samples for my book “Neural Networks and Deep Learning” [DEEP LEARNING].
- Annoy – Approximate nearest neighbours implementation.
- TPOT – Tool that automatically creates and optimizes machine learning pipelines using genetic programming. Consider it your personal data science assistant, automating a tedious part of machine learning.
- pgmpy A python library for working with Probabilistic Graphical Models.
- DIGITS – The Deep Learning GPU Training System (DIGITS) is a web application for training deep learning models.
- Orange – Open source data visualization and data analysis for novices and experts.
- MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
- milk – Machine learning toolkit focused on supervised classification. [Deprecated]
- TFLearn – Deep learning library featuring a higher-level API for TensorFlow.
- REP – an IPython-based environment for conducting data-driven research in a consistent and reproducible way. REP is not trying to substitute scikit-learn, but extends it and provides better user experience. [Deprecated]
- rgf_python – Python bindings for Regularized Greedy Forest (Tree) Library.
- skbayes – Python package for Bayesian Machine Learning with scikit-learn API.
- fuku-ml – Simple machine learning library, including Perceptron, Regression, Support Vector Machine, Decision Tree and more, it’s easy to use and easy to learn for beginners.
- Xcessiv – A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
- PyTorch – Tensors and Dynamic neural networks in Python with strong GPU acceleration
- PyTorch Lightning – The lightweight PyTorch wrapper for high-performance AI research.
- PyTorch Lightning Bolts – Toolbox of models, callbacks, and datasets for AI/ML researchers.
- skorch – A scikit-learn compatible neural network library that wraps PyTorch.
- ML-From-Scratch – Implementations of Machine Learning models from scratch in Python with a focus on transparency. Aims to showcase the nuts and bolts of ML in an accessible way.
- Edward – A library for probabilistic modeling, inference, and criticism. Built on top of TensorFlow.
- xRBM – A library for Restricted Boltzmann Machine (RBM) and its conditional variants in Tensorflow.
- CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box. It is easy to install, well documented and supports CPU and GPU (even multi-GPU) computation.
- stacked_generalization – Implementation of machine learning stacking technique as a handy library in Python.
- modAL – A modular active learning framework for Python, built on top of scikit-learn.
- Cogitare: A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python.
- Parris – Parris, the automated infrastructure setup tool for machine learning algorithms.
- neonrvm – neonrvm is an open source machine learning library based on RVM technique. It’s written in C programming language and comes with Python programming language bindings.
- Turi Create – Machine learning from Apple. Turi Create simplifies the development of custom machine learning models. You don’t have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
- xLearn – A high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems.
- mlens – A high performance, memory efficient, maximally parallelized ensemble learning, integrated with scikit-learn.
- Netron – Visualizer for machine learning models.
- Thampi – Machine Learning Prediction System on AWS Lambda
- MindsDB – Open Source framework to streamline use of neural networks.
- Microsoft Recommenders: Examples and best practices for building recommendation systems, provided as Jupyter notebooks. The repo contains some of the latest state of the art algorithms from Microsoft Research as well as from other companies and institutions.
- StellarGraph: Machine Learning on Graphs, a Python library for machine learning on graph-structured (network-structured) data.
- BentoML: Toolkit for package and deploy machine learning models for serving in production
- MiraiML: An asynchronous engine for continuous & autonomous machine learning, built for real-time usage.
- numpy-ML: Reference implementations of ML models written in numpy
- creme: A framework for online machine learning.
- Neuraxle: A framework providing the right abstractions to ease research, development, and deployment of your ML pipelines.
- Cornac – A comparative framework for multimodal recommender systems with a focus on models leveraging auxiliary data.
- JAX – JAX is Autograd and XLA, brought together for high-performance machine learning research.
- Catalyst – High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing. Being able to research/develop something new, rather than write another regular train loop.
- Fastai – High-level wrapper built on the top of Pytorch which supports vision, text, tabular data and collaborative filtering.
- scikit-multiflow – A machine learning framework for multi-output/multi-label and stream data.
- Lightwood – A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with objective to build predictive models with one line of code.
- bayeso – A simple, but essential Bayesian optimization package, written in Python.
- mljar-supervised – An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides explanations and markdown reports.
- evostra – A fast Evolution Strategy implementation in Python.
- Determined – Scalable deep learning training platform, including integrated support for distributed training, hyperparameter tuning, experiment tracking, and model management.
- PySyft – A Python library for secure and private Deep Learning built on PyTorch and TensorFlow.
- PyGrid – Peer-to-peer network of data owners and data scientists who can collectively train AI models using PySyft
- sktime – A unified framework for machine learning with time series
- OPFython – A Python-inspired implementation of the Optimum-Path Forest classifier.
- Opytimizer – Python-based meta-heuristic optimization techniques.
- Gradio – A Python library for quickly creating and sharing demos of models. Debug models interactively in your browser, get feedback from collaborators, and generate public links without deploying anything.
- Hub – Fastest unstructured dataset management for TensorFlow/PyTorch. Stream & version-control data. Store even petabyte-scale data in a single numpy-like array on the cloud accessible on any machine. Visit activeloop.ai for more info.
- Synthia – Multidimensional synthetic data generation in Python.
- ByteHub – An easy-to-use, Python-based feature store. Optimized for time-series data.
Data Analysis / Data Visualization
- DataVisualization – A Github Repository Where you can Learn Datavisualizatoin Basics to Intermediate level.
- Cartopy – Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
- SciPy – A Python-based ecosystem of open-source software for mathematics, science, and engineering.
- NumPy – A fundamental package for scientific computing with Python.
- AutoViz AutoViz performs automatic visualization of any dataset with a single line of Python code. Give it any input file (CSV, txt or json) of any size and AutoViz will visualize it. See Medium article.
- Numba – Python JIT (just in time) compiler to LLVM aimed at scientific Python by the developers of Cython and NumPy.
- Mars – A tensor-based framework for large-scale data computation which is often regarded as a parallel and distributed version of NumPy.
- NetworkX – A high-productivity software for complex networks.
- igraph – binding to igraph library – General purpose graph library.
- Pandas – A library providing high-performance, easy-to-use data structures and data analysis tools.
- ParaMonte – A general-purpose Python library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found here.
- Open Mining – Business Intelligence (BI) in Python (Pandas web interface) [Deprecated]
- PyMC – Markov Chain Monte Carlo sampling toolkit.
- zipline – A Pythonic algorithmic trading library.
- PyDy – Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
- SymPy – A Python library for symbolic mathematics.
- statsmodels – Statistical modeling and econometrics in Python.
- astropy – A community Python library for Astronomy.
- matplotlib – A Python 2D plotting library.
- bokeh – Interactive Web Plotting for Python.
- plotly – Collaborative web plotting for Python and matplotlib.
- altair – A Python to Vega translator.
- d3py – A plotting library for Python, based on D3.js.
- PyDexter – Simple plotting for Python. Wrapper for D3xterjs; easily render charts in-browser.
- ggplot – Same API as ggplot2 for R. [Deprecated]
- ggfortify – Unified interface to ggplot2 popular R packages.
- Kartograph.py – Rendering beautiful SVG maps in Python.
- pygal – A Python SVG Charts Creator.
- PyQtGraph – A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
- pycascading [Deprecated]
- Petrel – Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
- Blaze – NumPy and Pandas interface to Big Data.
- emcee – The Python ensemble sampling toolkit for affine-invariant MCMC.
- windML – A Python Framework for Wind Energy Analysis and Prediction.
- vispy – GPU-based high-performance interactive OpenGL 2D/3D data visualization library.
- cerebro2 A web-based visualization and debugging platform for NuPIC. [Deprecated]
- NuPIC Studio An all-in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool! [Deprecated]
- SparklingPandas Pandas on PySpark (POPS).
- Seaborn – A python visualization library based on matplotlib.
- bqplot – An API for plotting in Jupyter (IPython).
- pastalog – Simple, realtime visualization of neural network training performance.
- Superset – A data exploration platform designed to be visual, intuitive, and interactive.
- Dora – Tools for exploratory data analysis in Python.
- Ruffus – Computation Pipeline library for python.
- SOMPY – Self Organizing Map written in Python (Uses neural networks for data analysis).
- somoclu Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters, has python API.
- HDBScan – implementation of the hdbscan algorithm in Python – used for clustering
- visualize_ML – A python package for data exploration and data analysis. [Deprecated]
- scikit-plot – A visualization library for quick and easy generation of common plots in data analysis and machine learning.
- Bowtie – A dashboard library for interactive visualizations using flask socketio and react.
- lime – Lime is about explaining what machine learning classifiers (or models) are doing. It is able to explain any black box classifier, with two or more classes.
- PyCM – PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
- Dash – A framework for creating analytical web applications built on top of Plotly.js, React, and Flask
- Lambdo – A workflow engine for solving machine learning problems by combining in one analysis pipeline (i) feature engineering and machine learning (ii) model training and prediction (iii) table population and column evaluation via user-defined (Python) functions.
- TensorWatch – Debugging and visualization tool for machine learning and data science. It extensively leverages Jupyter Notebook to show real-time visualizations of data in running processes such as machine learning training.
- dowel – A little logger for machine learning research. Output any object to the terminal, CSV, TensorBoard, text logs on disk, and more with just one call to logger.log().
Misc Scripts / iPython Notebooks / Codebases
- MiniGrad – A minimal, educational, Pythonic implementation of autograd (~100 loc).
- Map/Reduce implementations of common ML algorithms: Jupyter notebooks that cover how to implement from scratch different ML algorithms (ordinary least squares, gradient descent, k-means, alternating least squares), using Python NumPy, and how to then make these implementations scalable using Map/Reduce and Spark.
- BioPy – Biologically-Inspired and Machine Learning Algorithms in Python. [Deprecated]
- CAEs for Data Assimilation – Convolutional autoencoders for 3D image/field compression applied to reduced order Data Assimilation.
- SVM Explorer – Interactive SVM Explorer, using Dash and scikit-learn
- pattern_classification
- thinking stats 2
- hyperopt
- numpic
- 2012-paper-diginorm
- A gallery of interesting IPython notebooks
- ipython-notebooks
- data-science-ipython-notebooks – Continually updated Data Science Python Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
- decision-weights
- Sarah Palin LDA – Topic Modeling the Sarah Palin emails.
- Diffusion Segmentation – A collection of image segmentation algorithms based on diffusion methods.
- Scipy Tutorials – SciPy tutorials. This is outdated, check out scipy-lecture-notes.
- Crab – A recommendation engine library for Python.
- BayesPy – Bayesian Inference Tools in Python.
- scikit-learn tutorials – Series of notebooks for learning scikit-learn.
- sentiment-analyzer – Tweets Sentiment Analyzer
- sentiment_classifier – Sentiment classifier using word sense disambiguation.
- group-lasso – Some experiments with the coordinate descent algorithm used in the (Sparse) Group Lasso model.
- jProcessing – Kanji / Hiragana / Katakana to Romaji Converter. Edict Dictionary & parallel sentences Search. Sentence Similarity between two JP Sentences. Sentiment Analysis of Japanese Text. Run Cabocha(ISO–8859-1 configured) in Python.
- mne-python-notebooks – IPython notebooks for EEG/MEG data processing using mne-python.
- Neon Course – IPython notebooks for a complete course around understanding Nervana’s Neon.
- pandas cookbook – Recipes for using Python’s pandas library.
- climin – Optimization library focused on machine learning, pythonic implementations of gradient descent, LBFGS, rmsprop, adadelta and others.
- Allen Downey’s Data Science Course – Code for Data Science at Olin College, Spring 2014.
- Allen Downey’s Think Bayes Code – Code repository for Think Bayes.
- Allen Downey’s Think Complexity Code – Code for Allen Downey’s book Think Complexity.
- Allen Downey’s Think OS Code – Text and supporting code for Think OS: A Brief Introduction to Operating Systems.
- Python Programming for the Humanities – Course for Python programming for the Humanities, assuming no prior knowledge. Heavy focus on text processing / NLP.
- GreatCircle – Library for calculating great circle distance.
- Optunity examples – Examples demonstrating how to use Optunity in synergy with machine learning libraries.
- Dive into Machine Learning with Python Jupyter notebook and scikit-learn – “I learned Python by hacking first, and getting serious later. I wanted to do this with Machine Learning. If this is your style, join me in getting a bit ahead of yourself.”
- TDB – TensorDebugger (TDB) is a visual debugger for deep learning. It features interactive, node-by-node debugging and visualization for TensorFlow.
- Suiron – Machine Learning for RC Cars.
- Introduction to machine learning with scikit-learn – IPython notebooks from Data School’s video tutorials on scikit-learn.
- Practical XGBoost in Python – comprehensive online course about using XGBoost in Python.
- Introduction to Machine Learning with Python – Notebooks and code for the book “Introduction to Machine Learning with Python”
- Pydata book – Materials and IPython notebooks for “Python for Data Analysis” by Wes McKinney, published by O’Reilly Media
- Homemade Machine Learning – Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
- Prodmodel – Build tool for data science pipelines.
- the-elements-of-statistical-learning – This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.
- Hyperparameter-Optimization-of-Machine-Learning-Algorithms – Code for hyperparameter tuning/optimization of machine learning and deep learning algorithms.
Neural Networks
- nn_builder – nn_builder is a python package that lets you build neural networks in 1 line
- NeuralTalk – NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
- Neuron – Neuron is simple class for time series predictions. It’s utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron – Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm.
- NeuralTalk – NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences. [Deprecated]
- Neuron – Neuron is simple class for time series predictions. It’s utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron – Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm. [Deprecated]
- Data Driven Code – Very simple implementation of neural networks for dummies in python without using any libraries, with detailed comments.
- Machine Learning, Data Science and Deep Learning with Python – LiveVideo course that covers machine learning, Tensorflow, artificial intelligence, and neural networks.
- TResNet: High Performance GPU-Dedicated Architecture – TResNet models were designed and optimized to give the best speed-accuracy tradeoff out there on GPUs.
- TResNet: Simple and powerful neural network library for python – Variety of supported types of Artificial Neural Network and learning algorithms.
- Jina AI An easier way to build neural search in the cloud. Compatible with Jupyter Notebooks.
- sequitur PyTorch library for creating and training sequence autoencoders in just two lines of code
Kaggle Competition Source Code
- open-solution-home-credit -> source code and experiments results for Home Credit Default Risk.
- open-solution-googleai-object-detection -> source code and experiments results for Google AI Open Images – Object Detection Track.
- open-solution-salt-identification -> source code and experiments results for TGS Salt Identification Challenge.
- open-solution-ship-detection -> source code and experiments results for Airbus Ship Detection Challenge.
- open-solution-data-science-bowl-2018 -> source code and experiments results for 2018 Data Science Bowl.
- open-solution-value-prediction -> source code and experiments results for Santander Value Prediction Challenge.
- open-solution-toxic-comments -> source code for Toxic Comment Classification Challenge.
- wiki challenge – An implementation of Dell Zhang’s solution to Wikipedia’s Participation Challenge on Kaggle.
- kaggle insults – Kaggle Submission for “Detecting Insults in Social Commentary”.
- kaggle_acquire-valued-shoppers-challenge – Code for the Kaggle acquire valued shoppers challenge.
- kaggle-cifar – Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet.
- kaggle-blackbox – Deep learning made easy.
- kaggle-accelerometer – Code for Accelerometer Biometric Competition at Kaggle.
- kaggle-advertised-salaries – Predicting job salaries from ads – a Kaggle competition.
- kaggle amazon – Amazon access control challenge.
- kaggle-bestbuy_big – Code for the Best Buy competition at Kaggle.
- kaggle-bestbuy_small
- Kaggle Dogs vs. Cats – Code for Kaggle Dogs vs. Cats competition.
- Kaggle Galaxy Challenge – Winning solution for the Galaxy Challenge on Kaggle.
- Kaggle Gender – A Kaggle competition: discriminate gender based on handwriting.
- Kaggle Merck – Merck challenge at Kaggle.
- Kaggle Stackoverflow – Predicting closed questions on Stack Overflow.
- kaggle_acquire-valued-shoppers-challenge – Code for the Kaggle acquire valued shoppers challenge.
- wine-quality – Predicting wine quality.
Reinforcement Learning
- DeepMind Lab – DeepMind Lab is a 3D learning environment based on id Software’s Quake III Arena via ioquake3 and other open source software. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning.
- Gym – OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
- Serpent.AI – Serpent.AI is a game agent framework that allows you to turn any video game you own into a sandbox to develop AI and machine learning experiments. For both researchers and hobbyists.
- ViZDoom – ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research in machine visual learning, and deep reinforcement learning, in particular.
- Roboschool – Open-source software for robot simulation, integrated with OpenAI Gym.
- Retro – Retro Games in Gym
- SLM Lab – Modular Deep Reinforcement Learning framework in PyTorch.
- Coach – Reinforcement Learning Coach by Intel® AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
- garage – A toolkit for reproducible reinforcement learning research
- metaworld – An open source robotics benchmark for meta- and multi-task reinforcement learning
- acme – An Open Source Distributed Framework for Reinforcement Learning that makes build and train your agents easily.
- Spinning Up – An educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning
Ruby
Natural Language Processing
- Awesome NLP with Ruby – Curated link list for practical natural language processing in Ruby.
- Treat – Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby.
- Stemmer – Expose libstemmer_c to Ruby. [Deprecated]
- Raspell – raspell is an interface binding for ruby. [Deprecated]
- UEA Stemmer – Ruby port of UEALite Stemmer – a conservative stemmer for search and indexing.
- Twitter-text-rb – A library that does auto linking and extraction of usernames, lists and hashtags in tweets.
General-Purpose Machine Learning
- Awesome Machine Learning with Ruby – Curated list of ML related resources for Ruby.
- Ruby Machine Learning – Some Machine Learning algorithms, implemented in Ruby. [Deprecated]
- Machine Learning Ruby [Deprecated]
- jRuby Mahout – JRuby Mahout is a gem that unleashes the power of Apache Mahout in the world of JRuby. [Deprecated]
- CardMagic-Classifier – A general classifier module to allow Bayesian and other types of classifications.
- rb-libsvm – Ruby language bindings for LIBSVM which is a Library for Support Vector Machines.
- Scoruby – Creates Random Forest classifiers from PMML files.
- rumale – Rumale is a machine learning library in Ruby
Data Analysis / Data Visualization
- rsruby – Ruby – R bridge.
- data-visualization-ruby – Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby. [Deprecated]
- ruby-plot – gnuplot wrapper for Ruby, especially for plotting ROC curves into SVG files. [Deprecated]
- plot-rb – A plotting library in Ruby built on top of Vega and D3. [Deprecated]
- scruffy – A beautiful graphing toolkit for Ruby.
- SciRuby
- Glean – A data management tool for humans. [Deprecated]
- Bioruby
- Arel [Deprecated]
Misc
- Big Data For Chimps
- Listof – Community based data collection, packed in gem. Get list of pretty much anything (stop words, countries, non words) in txt, json or hash. Demo/Search for a list
Rust
General-Purpose Machine Learning
- deeplearn-rs – deeplearn-rs provides simple networks that use matrix multiplication, addition, and ReLU under the MIT license.
- rustlearn – a machine learning framework featuring logistic regression, support vector machines, decision trees and random forests.
- rusty-machine – a pure-rust machine learning library.
- leaf – open source framework for machine intelligence, sharing concepts from TensorFlow and Caffe. Available under the MIT license. [Deprecated]
- RustNN – RustNN is a feedforward neural network library. [Deprecated]
- RusticSOM – A Rust library for Self Organising Maps (SOM).
R
General-Purpose Machine Learning
- ahaz – ahaz: Regularization for semiparametric additive hazards regression. [Deprecated]
- arules – arules: Mining Association Rules and Frequent Itemsets
- biglasso – biglasso: Extending Lasso Model Fitting to Big Data in R.
- bmrm – bmrm: Bundle Methods for Regularized Risk Minimization Package.
- Boruta – Boruta: A wrapper algorithm for all-relevant feature selection.
- bst – bst: Gradient Boosting.
- C50 – C50: C5.0 Decision Trees and Rule-Based Models.
- caret – Classification and Regression Training: Unified interface to ~150 ML algorithms in R.
- caretEnsemble – caretEnsemble: Framework for fitting multiple caret models as well as creating ensembles of such models. [Deprecated]
- CatBoost – General purpose gradient boosting on decision trees library with categorical features support out of the box for R.
- Clever Algorithms For Machine Learning
- CORElearn – CORElearn: Classification, regression, feature evaluation and ordinal evaluation.
- CoxBoost – CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks [Deprecated]
- Cubist – Cubist: Rule- and Instance-Based Regression Modeling.
- e1071 – e1071: Misc Functions of the Department of Statistics (e1071), TU Wien
- earth – earth: Multivariate Adaptive Regression Spline Models
- elasticnet – elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA.
- ElemStatLearn – ElemStatLearn: Data sets, functions and examples from the book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman.
- evtree – evtree: Evolutionary Learning of Globally Optimal Trees.
- forecast – forecast: Timeseries forecasting using ARIMA, ETS, STLM, TBATS, and neural network models.
- forecastHybrid – forecastHybrid: Automatic ensemble and cross validation of ARIMA, ETS, STLM, TBATS, and neural network models from the “forecast” package.
- fpc – fpc: Flexible procedures for clustering.
- frbs – frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks. [Deprecated]
- GAMBoost – GAMBoost: Generalized linear and additive models by likelihood based boosting. [Deprecated]
- gamboostLSS – gamboostLSS: Boosting Methods for GAMLSS.
- gbm – gbm: Generalized Boosted Regression Models.
- glmnet – glmnet: Lasso and elastic-net regularized generalized linear models.
- glmpath – glmpath: L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model.
- GMMBoost – GMMBoost: Likelihood-based Boosting for Generalized mixed models. [Deprecated]
- grplasso – grplasso: Fitting user specified models with Group Lasso penalty.
- grpreg – grpreg: Regularization paths for regression models with grouped covariates.
- h2o – A framework for fast, parallel, and distributed machine learning algorithms at scale — Deeplearning, Random forests, GBM, KMeans, PCA, GLM.
- hda – hda: Heteroscedastic Discriminant Analysis. [Deprecated]
- Introduction to Statistical Learning
- ipred – ipred: Improved Predictors.
- kernlab – kernlab: Kernel-based Machine Learning Lab.
- klaR – klaR: Classification and visualization.
- L0Learn – L0Learn: Fast algorithms for best subset selection.
- lars – lars: Least Angle Regression, Lasso and Forward Stagewise. [Deprecated]
- lasso2 – lasso2: L1 constrained estimation aka ‘lasso’.
- LiblineaR – LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library.
- LogicReg – LogicReg: Logic Regression.
- Machine Learning For Hackers
- maptree – maptree: Mapping, pruning, and graphing tree models. [Deprecated]
- mboost – mboost: Model-Based Boosting.
- medley – medley: Blending regression models, using a greedy stepwise approach.
- mlr – mlr: Machine Learning in R.
- ncvreg – ncvreg: Regularization paths for SCAD- and MCP-penalized regression models.
- nnet – nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models. [Deprecated]
- pamr – pamr: Pam: prediction analysis for microarrays. [Deprecated]
- party – party: A Laboratory for Recursive Partitioning
- partykit – partykit: A Toolkit for Recursive Partitioning.
- penalized – penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model.
- penalizedLDA – penalizedLDA: Penalized classification using Fisher’s linear discriminant. [Deprecated]
- penalizedSVM – penalizedSVM: Feature Selection SVM using penalty functions.
- quantregForest – quantregForest: Quantile Regression Forests.
- randomForest – randomForest: Breiman and Cutler’s random forests for classification and regression.
- randomForestSRC – randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC).
- rattle – rattle: Graphical user interface for data mining in R.
- rda – rda: Shrunken Centroids Regularized Discriminant Analysis.
- rdetools – rdetools: Relevant Dimension Estimation (RDE) in Feature Spaces. [Deprecated]
- REEMtree – REEMtree: Regression Trees with Random Effects for Longitudinal (Panel) Data. [Deprecated]
- relaxo – relaxo: Relaxed Lasso. [Deprecated]
- rgenoud – rgenoud: R version of GENetic Optimization Using Derivatives
- Rmalschains – Rmalschains: Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R.
- rminer – rminer: Simpler use of data mining methods (e.g. NN and SVM) in classification and regression. [Deprecated]
- ROCR – ROCR: Visualizing the performance of scoring classifiers. [Deprecated]
- RoughSets – RoughSets: Data Analysis Using Rough Set and Fuzzy Rough Set Theories. [Deprecated]
- rpart – rpart: Recursive Partitioning and Regression Trees.
- RPMM – RPMM: Recursively Partitioned Mixture Model.
- RSNNS – RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS).
- RWeka – RWeka: R/Weka interface.
- RXshrink – RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression.
- sda – sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection. [Deprecated]
- spectralGraphTopology – spectralGraphTopology: Learning Graphs from Data via Spectral Constraints.
- SuperLearner – Multi-algorithm ensemble learning packages.
- svmpath – svmpath: svmpath: the SVM Path algorithm. [Deprecated]
- tgp – tgp: Bayesian treed Gaussian process models. [Deprecated]
- tree – tree: Classification and regression trees.
- varSelRF – varSelRF: Variable selection using random forests.
- XGBoost.R – R binding for eXtreme Gradient Boosting (Tree) Library.
- Optunity – A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R.
- igraph – binding to igraph library – General purpose graph library.
- MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
- TDSP-Utilities – Two data science utilities in R from Microsoft: 1) Interactive Data Exploration, Analysis, and Reporting (IDEAR) ; 2) Automated Modeling and Reporting (AMR).
Data Manipulation | Data Analysis | Data Visualization
- dplyr – A data manipulation package that helps to solve the most common data manipulation problems.
- ggplot2 – A data visualization package based on the grammar of graphics.
- tmap for visualizing geospatial data with static maps and leaflet for interactive maps
- tm and quanteda are the main packages for managing, analyzing, and visualizing textual data.
- shiny is the basis for truly interactive displays and dashboards in R. However, some measure of interactivity can be achieved with htmlwidgets bringing javascript libraries to R. These include, plotly, dygraphs, highcharter, and several others.
SAS
General-Purpose Machine Learning
- Visual Data Mining and Machine Learning – Interactive, automated, and programmatic modeling with the latest machine learning algorithms in and end-to-end analytics environment, from data prep to deployment. Free trial available.
- Enterprise Miner – Data mining and machine learning that creates deployable models using a GUI or code.
- Factory Miner – Automatically creates deployable machine learning models across numerous market or customer segments using a GUI.
Data Analysis / Data Visualization
- SAS/STAT – For conducting advanced statistical analysis.
- University Edition – FREE! Includes all SAS packages necessary for data analysis and visualization, and includes online SAS courses.
Natural Language Processing
- Contextual Analysis – Add structure to unstructured text using a GUI.
- Sentiment Analysis – Extract sentiment from text using a GUI.
- Text Miner – Text mining using a GUI or code.
Demos and Scripts
- ML_Tables – Concise cheat sheets containing machine learning best practices.
- enlighten-apply – Example code and materials that illustrate applications of SAS machine learning techniques.
- enlighten-integration – Example code and materials that illustrate techniques for integrating SAS with other analytics technologies in Java, PMML, Python and R.
- enlighten-deep – Example code and materials that illustrate using neural networks with several hidden layers in SAS.
- dm-flow – Library of SAS Enterprise Miner process flow diagrams to help you learn by example about specific data mining topics.
Scala
Natural Language Processing
- ScalaNLP – ScalaNLP is a suite of machine learning and numerical computing libraries.
- Breeze – Breeze is a numerical processing library for Scala.
- Chalk – Chalk is a natural language processing library. [Deprecated]
- FACTORIE – FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
- Montague – Montague is a semantic parsing library for Scala with an easy-to-use DSL.
- Spark NLP – Natural language processing library built on top of Apache Spark ML to provide simple, performant, and accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.
Data Analysis / Data Visualization
- MLlib in Apache Spark – Distributed machine learning library in Spark
- Hydrosphere Mist – a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
- Scalding – A Scala API for Cascading.
- Summing Bird – Streaming MapReduce with Scalding and Storm.
- Algebird – Abstract Algebra for Scala.
- xerial – Data management utilities for Scala. [Deprecated]
- PredictionIO – PredictionIO, a machine learning server for software developers and data engineers.
- BIDMat – CPU and GPU-accelerated matrix library intended to support large-scale exploratory data analysis.
- Flink – Open source platform for distributed stream and batch data processing.
- Spark Notebook – Interactive and Reactive Data Science using Scala and Spark.
General-Purpose Machine Learning
- DeepLearning.scala – Creating statically typed dynamic neural networks from object-oriented & functional programming constructs.
- Conjecture – Scalable Machine Learning in Scalding.
- brushfire – Distributed decision tree ensemble learning in Scala.
- ganitha – Scalding powered machine learning. [Deprecated]
- adam – A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
- bioscala – Bioinformatics for the Scala programming language
- BIDMach – CPU and GPU-accelerated Machine Learning Library.
- Figaro – a Scala library for constructing probabilistic models.
- H2O Sparkling Water – H2O and Spark interoperability.
- FlinkML in Apache Flink – Distributed machine learning library in Flink.
- DynaML – Scala Library/REPL for Machine Learning Research.
- Saul – Flexible Declarative Learning-Based Programming.
- SwiftLearner – Simply written algorithms to help study ML or write your own implementations.
- Smile – Statistical Machine Intelligence and Learning Engine.
- doddle-model – An in-memory machine learning library built on top of Breeze. It provides immutable objects and exposes its functionality through a scikit-learn-like API.
- TensorFlow Scala – Strongly-typed Scala API for TensorFlow.
Scheme
Neural Networks
- layer – Neural network inference from the command line, implemented in CHICKEN Scheme.
Swift
General-Purpose Machine Learning
- Bender – Fast Neural Networks framework built on top of Metal. Supports TensorFlow models.
- Swift AI – Highly optimized artificial intelligence and machine learning library written in Swift.
- Swift for Tensorflow – a next-generation platform for machine learning, incorporating the latest research across machine learning, compilers, differentiable programming, systems design, and beyond.
- BrainCore – The iOS and OS X neural network framework.
- swix – A bare bones library that includes a general matrix language and wraps some OpenCV for iOS development. [Deprecated]
- AIToolbox – A toolbox framework of AI modules written in Swift: Graphs/Trees, Linear Regression, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Algorithms, MDP, Mixture of Gaussians.
- MLKit – A simple Machine Learning Framework written in Swift. Currently features Simple Linear Regression, Polynomial Regression, and Ridge Regression.
- Swift Brain – The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X development. This project includes algorithms focused on Bayes theorem, neural networks, SVMs, Matrices, etc…
- Perfect TensorFlow – Swift Language Bindings of TensorFlow. Using native TensorFlow models on both macOS / Linux.
- PredictionBuilder – A library for machine learning that builds predictions using a linear regression.
- Awesome CoreML – A curated list of pretrained CoreML models.
- Awesome Core ML Models – A curated list of machine learning models in CoreML format.
TensorFlow
General-Purpose Machine Learning
- Awesome TensorFlow – A list of all things related to TensorFlow.
- Golden TensorFlow – A page of content on TensorFlow, including academic papers and links to related topics.
Tools
Neural Networks
- layer – Neural network inference from the command line
Misc
- Pinecone – Vector database for applications that require real-time, scalable vector embedding and similarity search.
- CatalyzeX – Browser extension (Chrome and Firefox) that automatically finds and shows code implementations for machine learning papers anywhere: Google, Twitter, Arxiv, Scholar, etc.
- ML Workspace – All-in-one web-based IDE for machine learning and data science. The workspace is deployed as a docker container and is preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch) and dev tools (e.g., Jupyter, VS Code).
- Notebooks – A starter kit for Jupyter notebooks and machine learning. Companion docker images consist of all combinations of python versions, machine learning frameworks (Keras, PyTorch and Tensorflow) and CPU/CUDA versions.
- DVC – Data Science Version Control is an open-source version control system for machine learning projects with pipelines support. It makes ML projects reproducible and shareable.
- Kedro – Kedro is a data and development workflow framework that implements best practices for data pipelines with an eye towards productionizing machine learning models.
- guild.ai – Tool to log, analyze, compare and “optimize” experiments. It’s cross-platform and framework independent, and provided integrated visualizers such as tensorboard.
- Sacred – Python tool to help you configure, organize, log and reproduce experiments. Like a notebook lab in the context of Chemistry/Biology. The community has built multiple add-ons leveraging the proposed standard.
- MLFlow – platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. Framework and language agnostic, take a look at all the built-in integrations.
- Weights & Biases – Machine learning experiment tracking, dataset versioning, hyperparameter search, visualization, and collaboration
- More tools to improve the ML lifecycle: Catalyst, PachydermIO. The following are Github-alike and targeting teams Weights & Biases, Neptune.Ml, Comet.ml, Valohai.ai, DAGsHub.
- MachineLearningWithTensorFlow2ed – a book on general purpose machine learning techniques regression, classification, unsupervised clustering, reinforcement learning, auto encoders, convolutional neural networks, RNNs, LSTMs, using TensorFlow 1.14.1.
- m2cgen – A tool that allows the conversion of ML models into native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart) with zero dependencies.
- CML – A library for doing continuous integration with ML projects. Use GitHub Actions & GitLab CI to train and evaluate models in production like environments and automatically generate visual reports with metrics and graphs in pull/merge requests. Framework & language agnostic.
- Pythonizr – An online tool to generate boilerplate machine learning code that uses scikit-learn.
Credits
- Some of the python libraries were cut-and-pasted from vinta
- References for Go were mostly cut-and-pasted from gopherdata
2. Machine Learning with Python – Part II
This curated list contains 840 awesome open-source projects with a total of 2.8M stars grouped into 32 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome!
Discover other best-of lists or create your own.
Subscribe to our newsletter for updates and trending projects.
Contents
- Machine Learning Frameworks 54 projects
- Data Visualization 49 projects
- Text Data & NLP 82 projects
- Image Data 49 projects
- Graph Data 29 projects
- Audio Data 23 projects
- Geospatial Data 22 projects
- Financial Data 23 projects
- Time Series Data 20 projects
- Medical Data 19 projects
- Optical Character Recognition 11 projects
- Data Containers & Structures 28 projects
- Data Loading & Extraction 23 projects
- Web Scraping & Crawling 1 projects
- Data Pipelines & Streaming 35 projects
- Distributed Machine Learning 26 projects
- Hyperparameter Optimization & AutoML 45 projects
- Reinforcement Learning 19 projects
- Recommender Systems 14 projects
- Privacy Machine Learning 6 projects
- Workflow & Experiment Tracking 35 projects
- Model Serialization & Conversion 11 projects
- Model Interpretability 46 projects
- Vector Similarity Search (ANN) 12 projects
- Probabilistics & Statistics 21 projects
- Adversarial Robustness 8 projects
- GPU Utilities 18 projects
- Tensorflow Utilities 13 projects
- Sklearn Utilities 17 projects
- Pytorch Utilities 27 projects
- Database Clients 1 projects
- Others 52 projects
Explanation
- Combined project-quality score
- Star count from GitHub
- New project (less than 6 months old)
- Inactive project (6 months no activity)
- Dead project (12 months no activity)
- Project is trending up or down
- Project was recently added
- Warning (e.g. missing/risky license)
- Contributors count from GitHub
- Fork count from GitHub
- Issue count from GitHub
- Last update timestamp on package manager
- Download count from package manager
- Number of dependent projects
- Tensorflow related project
- Sklearn related project
- PyTorch related project
- MxNet related project
- Apache Spark related project
- Jupyter related project
- PaddlePaddle related project
- Pandas related project
Machine Learning Frameworks
General-purpose machine learning and deep learning frameworks.
Tensorflow (44 · 160K) – An Open Source Machine Learning Framework for Everyone. Apache-2 PyTorch (39 · 47K) – Tensors and Dynamic neural networks in Python with strong GPU.. BSD-3 PySpark (38 · 29K) – Apache Spark Python API. Apache-2 scikit-learn (37 · 45K) – scikit-learn: machine learning in Python. BSD-3 StatsModels (36 · 6.1K) – Statsmodels: statistical modeling and econometrics in Python. BSD-3Keras (35 · 51K) – Deep Learning for humans. MIT XGBoost (35 · 21K) – Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or.. Apache-2LightGBM (35 · 12K) – A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT,.. MITMXNet (34 · 19K) – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning.. Apache-2 Theano (34 · 9.4K) – Theano is a Python library that allows you to define, optimize, and.. BSD-3PyFlink (33 · 16K) – Apache Flink Python API. Apache-2pytorch-lightning (33 · 12K) – The lightweight PyTorch wrapper for high-performance.. Apache-2 Fastai (32 · 21K) – The fastai deep learning library. Apache-2 jax (32 · 12K) – Composable transformations of Python+NumPy programs: differentiate,.. Apache-2Thinc (32 · 2.2K) – A refreshing functional take on deep learning, compatible with your favorite.. MITCatboost (31 · 5.8K) – A fast, scalable, high performance Gradient Boosting on Decision.. Apache-2Chainer (31 · 5.5K) – A flexible framework of neural networks for deep learning. MITPaddlePaddle (30 · 15K) – PArallel Distributed Deep LEarning: Machine Learning.. Apache-2 TFlearn (30 · 9.5K) – Deep learning library featuring a higher-level API for TensorFlow. MIT Vowpal Wabbit (30 · 7.5K) – Vowpal Wabbit is a machine learning system which pushes the.. BSD-3Turi Create (28 · 10K) – Turi Create simplifies the development of custom machine learning.. BSD-3Sonnet (28 · 8.8K) – TensorFlow-based neural network library. Apache-2 dyNET (28 · 3.2K) – DyNet: The Dynamic Neural Network Toolkit. Apache-2tensorpack (27 · 6K · ) – A Neural Net Training Interface on TensorFlow, with focus.. Apache-2 Ignite (27 · 3.5K) – High-level library to help with training and evaluating neural.. BSD-3 Jina (27 · 2.5K) – An easier way to build neural search on the cloud. Apache-2Flax (27 · 1.5K) – Flax is a neural network ecosystem for JAX that is designed for.. Apache-2 jaxCNTK (26 · 17K · ) – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit. MITskorch (26 · 3.8K) – A scikit-learn compatible neural network library that wraps.. BSD-3 mlpack (26 · 3.6K) – mlpack: a scalable C++ machine learning library –. BSD-3Ludwig (25 · 7.6K) – Ludwig is a toolbox that allows to train and evaluate deep.. Apache-2 xLearn (25 · 2.9K · ) – High performance, easy-to-use, and scalable machine learning (ML).. Apache-2Neural Network Libraries (24 · 2.4K) – Neural Network Libraries. Apache-2ktrain (24 · 760) – ktrain is a Python library that makes deep learning and AI more.. Apache-2 tensorflow-upstream (24 · 550) – TensorFlow ROCm port. Apache-2 SHOGUN (23 · 2.8K) – Unified and efficient Machine Learning. BSD-3einops (23 · 2.6K) – Deep learning operations reinvented (for pytorch, tensorflow, jax and.. MITfklearn (23 · 1.3K) – fklearn: Functional Machine Learning. Apache-2mace (21 · 4.3K) – MACE is a deep learning inference framework optimized for mobile.. Apache-2Neural Tangents (21 · 1.3K) – Fast and Easy Infinite Neural Networks in Python. Apache-2ThunderSVM (20 · 1.3K) – ThunderSVM: A Fast SVM Library on GPUs and CPUs. Apache-2Haiku (20 · 1K) – JAX-based neural network library. Apache-2Torchbearer (20 · 590) – torchbearer: A model fitting library for PyTorch. MIT Objax (19 · 580) – Objax is a machine learning framework that provides an Object.. Apache-2 jaxelegy (17 · 180) – Elegy is a framework-agnostic Trainer interface for the Jax.. Apache-2 jaxThunderGBM (16 · 580) – ThunderGBM: Fast GBDTs and Random Forests on GPUs. Apache-2NeoML (13 · 570) – Machine learning framework for both deep learning and traditional.. Apache-2Show 7 hidden projects…
General-purpose and task-specific data visualization libraries.
Matplotlib (41 · 13K) – matplotlib: plotting with Python. Python-2.0Seaborn (37 · 8.2K) – Statistical data visualization using matplotlib. BSD-3Plotly (35 · 9.1K) – The interactive graphing library for Python (includes Plotly Express). MITdash (34 · 14K) – Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required. MITBokeh (33 · 15K) – Interactive Data Visualization in the browser, from Python. BSD-3pyecharts (31 · 11K) – Python Echarts Plotting Library. MIT wordcloud (31 · 7.9K) – A little word cloud generator in Python. MITAltair (31 · 6.5K) – Declarative statistical visualization library for Python. BSD-3UMAP (30 · 4.6K) – Uniform Manifold Approximation and Projection. BSD-3bqplot (30 · 3K) – Plotting library for IPython/Jupyter notebooks. Apache-2 PyQtGraph (30 · 2.3K) – Fast data visualization and GUI tools for scientific / engineering.. MITpandas-profiling (29 · 6.9K) – Create HTML profiling reports from pandas DataFrame.. MIT VisPy (29 · 2.6K) – High-performance interactive 2D/3D data visualization library. BSD-3 Graphviz (29 · 940) – Simple Python interface for Graphviz. MITdatashader (28 · 2.4K) – Quickly and accurately render even the largest data. BSD-3HoloViews (28 · 1.8K) – With Holoviews, your data visualizes itself. BSD-3 Cufflinks (27 · 2.1K) – Productivity Tools for Plotly + Pandas. MIT PyVista (27 · 720) – 3D plotting and mesh analysis through a streamlined interface for the.. MIT data-validation (27 · 530) – Library for exploring and validating machine learning.. Apache-2 Perspective (26 · 3.3K) – Streaming pivot visualization via WebAssembly. Apache-2 missingno (26 · 2.7K) – Missing data visualization module for Python. MITpythreejs (26 · 710) – A Jupyter – Three.js bridge. BSD-3 Facets Overview (25 · 6.5K) – Visualizations for machine learning datasets. Apache-2 Chartify (25 · 2.8K) – Python library that makes it easy for data scientists to create.. Apache-2HyperTools (25 · 1.6K) – A Python toolbox for gaining geometric insights into high-dimensional.. MIThvPlot (25 · 360) – A high-level plotting API for pandas, dask, xarray, and networkx built on.. BSD-3openTSNE (24 · 760) – Extensible, parallel implementations of t-SNE. BSD-3PandasGUI (23 · 2.1K) – A GUI for Pandas DataFrames. MIT python-ternary (23 · 400) – Ternary plotting library for python with matplotlib. MITD-Tale (22 · 2.1K) – Visualizer for pandas data structures. ❗️LGPL-2.1 Multicore-TSNE (22 · 1.5K · ) – Parallel t-SNE implementation with Python and Torch.. BSD-3 Pandas-Bokeh (22 · 630) – Bokeh Plotting Backend for Pandas and GeoPandas. MIT vega (22 · 300) – IPython/Jupyter notebook module for Vega and Vega-Lite. BSD-3 Sweetviz (20 · 1.4K) – Visualize and compare datasets, target values and associations, with one.. MITlets-plot (20 · 520) – An open-source plotting library for statistical data. MITjoypy (20 · 320) – Joyplots in Python with matplotlib & pandas. MITHiPlot (19 · 2K) – HiPlot makes understanding high dimensional data easy. MITanimatplot (19 · 360) – A python package for animating plots build on matplotlib. MITPyWaffle (18 · 400 · ) – Make Waffle Charts in Python. MITAutoViz (18 · 310) – Automatically Visualize any dataset, any size with a single line of.. Apache-2FiftyOne (18 · 220) – Visualize, create, and debug image and video datasets.. Apache-2 data-describe (14 · 270) – datadescribe: Pythonic EDA Accelerator for Data Science. Apache-2nx-altair (14 · 160 · ) – Draw interactive NetworkX graphs with Altair. MIT Show 6 hidden projects…
Libraries for processing, cleaning, manipulating, and analyzing text data as well as libraries for NLP tasks such as language detection, fuzzy matching, classification, seq2seq learning, conversational AI, keyword extraction, and translation.
spaCy (37 · 20K) – Industrial-strength Natural Language Processing (NLP) in Python. MITtransformers (36 · 42K) – Transformers: State-of-the-art Natural Language.. Apache-2 gensim (35 · 12K) – Topic Modelling for Humans. ❗️LGPL-2.1nltk (34 · 9.7K) – Suite of libraries and programs for symbolic and statistical natural.. Apache-2AllenNLP (32 · 9.8K) – An open-source NLP research library, built on PyTorch. Apache-2 fairseq (31 · 11K) – Facebook AI Research Sequence-to-Sequence Toolkit written in Python. MIT ChatterBot (31 · 11K · ) – ChatterBot is a machine learning, conversational dialog engine.. BSD-3sentencepiece (31 · 4.9K) – Unsupervised text tokenizer for Neural Network-based text.. Apache-2fastText (30 · 22K · ) – Library for fast text representation and classification. MITflair (30 · 10K) – A very simple framework for state-of-the-art Natural Language Processing.. MIT snowballstemmer (30 · 480) – Snowball compiler and stemming algorithms. BSD-3TextBlob (29 · 7.6K) – Simple, Pythonic, text processing–Sentiment analysis, part-of-speech.. MITtorchtext (29 · 2.7K · ) – Data loaders and abstractions for text and NLP. BSD-3 Rasa (28 · 11K) – Open source machine learning framework to automate text- and voice-.. Apache-2 OpenNMT (28 · 4.9K) – Open Source Neural Machine Translation in PyTorch. MIT sentence-transformers (28 · 4.4K) – Sentence Embeddings with BERT & XLNet. Apache-2 Tokenizers (28 · 4.3K) – Fast State-of-the-Art Tokenizers optimized for Research and.. Apache-2Dedupe (28 · 2.9K) – A python library for accurate and scalable fuzzy matching, record.. MITphonenumbers (28 · 2.6K) – Python port of Google’s libphonenumber. Apache-2DeepPavlov (26 · 5.1K) – An open source library for deep learning end-to-end dialog.. Apache-2 ftfy (26 · 2.9K) – Fixes mojibake and other glitches in Unicode text, after the fact. MITGluonNLP (26 · 2.2K) – Toolkit that enables easy text preprocessing, datasets loading.. Apache-2 TextDistance (26 · 1.9K) – Compute distance between sequences. 30+ algorithms, pure python.. MITtextacy (26 · 1.6K) – NLP, before and after spaCy. Apache-2jellyfish (26 · 1.4K) – a python library for doing approximate and phonetic matching of.. BSD-2TensorFlow Text (26 · 700) – Making text a first-class citizen in TensorFlow. Apache-2 CLTK (26 · 650) – The Classical Language Toolkit. MITinflect (26 · 490) – Correctly generate plurals, ordinals, indefinite articles; convert numbers.. MITParlAI (25 · 7K) – A framework for training and evaluating AI models on a variety of.. MIT PyText (25 · 6.1K) – A natural language modeling framework based on PyTorch. BSD-3 stanza (25 · 5.3K · ) – Official Stanford NLP Python Library for Many Human Languages. Apache-2vaderSentiment (25 · 2.9K · ) – VADER Sentiment Analysis. VADER (Valence Aware Dictionary.. MITspark-nlp (25 · 2K) – State of the Art Natural Language Processing. Apache-2 haystack (25 · 1.5K) – End-to-end Python framework for building natural language search.. Apache-2pyahocorasick (25 · 590) – Python module (C extension and plain python) implementing Aho-.. BSD-3T5 (24 · 3.2K) – Code for the paper Exploring the Limits of Transfer Learning with a.. Apache-2 Sumy (24 · 2.5K) – Module for automatic summarization of text documents and HTML pages. Apache-2fastNLP (24 · 2K) – fastNLP: A Modularized and Extensible NLP Framework. Currently still.. Apache-2pytorch-nlp (24 · 1.9K) – Basic Utilities for PyTorch Natural Language Processing (NLP). BSD-3 scattertext (24 · 1.5K · ) – Beautiful visualizations of how language differs among.. Apache-2sense2vec (24 · 1.2K) – Contextually-keyed word vectors. MITspacy-transformers (24 · 920) – Use pretrained transformers like BERT, XLNet and GPT-2.. MIT spacySciSpacy (24 · 850) – A full spaCy pipeline and models for scientific/biomedical documents. Apache-2Ciphey (23 · 6.5K) – Automatically decrypt encryptions without knowing the key or cipher,.. MITflashtext (23 · 4.7K · ) – Extract Keywords from sentence or Replace keywords in sentences. MITneuralcoref (23 · 2.2K) – Fast Coreference Resolution in spaCy with Neural Networks. MITpySBD (23 · 290) – pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence.. MITtextgenrnn (22 · 4.3K · ) – Easily train your own text-generating neural network of any.. MIT fast-bert (22 · 1.5K) – Super easy library for BERT based NLP models. Apache-2PyTextRank (22 · 1.5K · ) – Python implementation of TextRank for phrase extraction and.. MITFARM (22 · 1.1K) – Fast & easy transfer learning for NLP. Harvesting language models.. Apache-2 DeepMatcher (21 · 3.5K · ) – Python package for performing Entity and Text Matching using.. BSD-3gpt-2-simple (21 · 2.5K) – Python package to easily retrain OpenAI’s GPT-2 text-.. MIT Texar (21 · 2.1K · ) – Toolkit for Machine Learning, Natural Language Processing, and.. Apache-2 NLP Architect (20 · 2.6K) – A model library for exploring state-of-the-art deep learning.. Apache-2NeMo (20 · 2.5K) – NeMo: a toolkit for conversational AI. Apache-2 DELTA (20 · 1.4K) – DELTA is a deep learning based natural language and speech.. Apache-2 Sockeye (20 · 990) – Sequence-to-sequence framework with a focus on Neural Machine.. Apache-2 YouTokenToMe (20 · 720) – Unsupervised text tokenizer focused on computational efficiency. MITfinetune (20 · 630) – Scikit-learn style model finetuning for NLP. MPL-2.0 Texthero (19 · 2.1K) – Text preprocessing, representation and visualization from zero to hero. MITtextpipe (19 · 280) – Textpipe: clean and extract metadata from text. MITKashgari (18 · 2K) – Kashgari is a production-level NLP Transfer learning framework.. Apache-2 Camphr (18 · 330) – spaCy plugin for Transformers , Udify, ELmo, etc. Apache-2 spacyskift (18 · 210) – scikit-learn wrappers for Python fastText. MIT Translate (15 · 680) – Translate – a PyTorch Language Library. BSD-3 VizSeq (15 · 310) – An Analysis Toolkit for Natural Language Generation (Translation,.. MITOpenNRE (14 · 3K) – An Open-Source Package for Neural Relation Extraction (NRE). MITTransferNLP (14 · 290 · ) – NLP library designed for reproducible experimentation.. MIT NeuralQA (14 · 180) – NeuralQA: A Usable Library for Question Answering on Large Datasets with.. MITtextvec (13 · 170) – Text vectorization tool to outperform TFIDF for classification tasks. MIT Show 11 hidden projects…
Libraries for image & video processing, manipulation, and augmentation as well as libraries for computer vision tasks such as facial recognition, object detection, and classification.
Pillow (39 · 8.3K) – The friendly PIL fork (Python Imaging Library). ❗️PILtorchvision (36 · 8.6K) – Datasets, Transforms and Models specific to Computer Vision. BSD-3 scikit-image (33 · 4.2K) – Image processing in Python. BSD-2imgaug (31 · 11K · ) – Image augmentation for machine learning experiments. MITimageio (31 · 840) – Python library for reading and writing image data. BSD-2opencv-python (30 · 1.8K) – Automated CI toolchain to produce precompiled opencv-python,.. MITWand (30 · 1.1K) – The ctypes-based simple ImageMagick binding for Python. MITFace Recognition (29 · 39K) – The world’s simplest facial recognition api for Python.. MIT MoviePy (29 · 7.3K) – Video editing with Python. MITPyTorch Image Models (28 · 7.9K · ) – PyTorch image models, scripts, pretrained weights –.. Apache-2 Albumentations (28 · 7.5K) – Fast image augmentation library and easy to use wrapper.. MIT Kornia (28 · 3.7K) – Open Source Differentiable Computer Vision Library for PyTorch. Apache-2 imutils (28 · 3.6K) – A series of convenience functions to make basic image processing.. MITImageHash (28 · 1.9K) – A Python Perceptual Image Hashing Module. BSD-2imageai (27 · 6K) – A python library built to empower developers to build applications and.. MITGluonCV (27 · 4.6K) – Gluon CV Toolkit. Apache-2 detectron2 (26 · 15K) – Detectron2 is FAIR’s next-generation platform for object.. Apache-2 InsightFace (26 · 8.7K) – Face Analysis Project on MXNet. MIT MMDetection (25 · 14K) – OpenMMLab Detection Toolbox and Benchmark. Apache-2 PyTorch3D (25 · 4.6K) – PyTorch3D is FAIR’s library of reusable components for deep.. MIT facenet-pytorch (25 · 1.9K) – Pretrained Pytorch face detection (MTCNN) and recognition.. MIT mahotas (25 · 670) – Computer Vision in Python. MITAugmentor (24 · 4.3K · ) – Image augmentation library in Python for machine learning. MITmtcnn (24 · 1.4K) – MTCNN face detection implementation for TensorFlow, as a PIP package. MIT Face Alignment (23 · 4.7K) – 2D and 3D Face alignment library build using pytorch. BSD-3 CellProfiler (23 · 550) – An open-source application for biological image analysis. BSD-3segmentation_models (22 · 3K · ) – Segmentation models with pretrained backbones. Keras.. MIT vidgear (22 · 1.7K) – High-performance cross-platform Video Processing Python framework.. Apache-2pyvips (22 · 300) – python binding for libvips using cffi. MITImage Deduplicator (21 · 3.4K) – Finding duplicate images made easy!. Apache-2 Image Super-Resolution (21 · 2.6K) – Super-scale your images and run experiments with.. Apache-2 tensorflow-graphics (21 · 2.4K) – TensorFlow Graphics: Differentiable Graphics Layers.. Apache-2 Classy Vision (21 · 1.2K) – An end-to-end PyTorch framework for image and video.. MIT Torch Points 3D (21 · 1.1K) – Pytorch framework for doing deep learning on point clouds. BSD-3 MMF (20 · 4.2K) – A modular framework for vision & language multimodal research from.. BSD-3 image-match (20 · 2.5K) – Quickly search over billions of images. Apache-2nude.py (20 · 790) – Nudity detection with Python. MITCaer (20 · 450) – A lightweight Computer Vision library. Scale your models, not boilerplate. MITvit-pytorch (18 · 2.9K · ) – Implementation of Vision Transformer, a simple way to.. MIT Norfair (18 · 920) – Lightweight Python library for adding real-time 2D object tracking to.. BSD-3PaddleDetection (17 · 2.3K) – Object detection and instance segmentation toolkit.. Apache-2 lightly (17 · 430 · ) – A python library for self-supervised learning on images. MIT pycls (15 · 1.5K) – Codebase for Image Classification Research, written in PyTorch. MIT DE⫶TR (14 · 6.4K) – End-to-End Object Detection with Transformers. Apache-2 PySlowFast (14 · 3.4K) – PySlowFast: video understanding codebase from FAIR for.. Apache-2 Show 4 hidden projects…
Libraries for graph processing, clustering, embedding, and machine learning tasks.
networkx (33 · 8.8K · ) – Network Analysis in Python. BSD-3PyTorch Geometric (29 · 10K · ) – Geometric Deep Learning Extension Library for PyTorch. MIT dgl (26 · 6.8K) – Python package built to ease deep learning on graph, on top of existing.. Apache-2StellarGraph (25 · 1.8K) – StellarGraph – Machine Learning on Graphs. Apache-2 Spektral (23 · 1.7K) – Graph Neural Networks with Keras and Tensorflow 2. MIT ogb (22 · 770) – Benchmark datasets, data loaders, and evaluators for graph machine learning. MITNode2Vec (22 · 650) – Implementation of the node2vec algorithm. MITtorch-cluster (21 · 340) – PyTorch Extension Library of Optimized Graph Cluster.. MIT AmpliGraph (20 · 1.4K · ) – Python library for Representation Learning on Knowledge.. Apache-2 PyTorch-BigGraph (19 · 2.7K) – Generate embeddings from large-scale graph-structured.. BSD-3 PyKEEN (19 · 330) – A Python library for learning and evaluating knowledge graph embeddings. MITgraph-nets (18 · 4.8K) – Build Graph Nets in Tensorflow. Apache-2 DeepGraph (18 · 230) – Analyze Data with Pandas-based Networks. Documentation:. BSD-3 Paddle Graph Learning (17 · 920) – Paddle Graph Learning (PGL) is an efficient and.. Apache-2 kglib (16 · 400) – Grakn Knowledge Graph Library (ML R&D). Apache-2pytorch_geometric_temporal (16 · 370) – A Temporal Extension Library for PyTorch Geometric. MIT GraphEmbedding (15 · 1.8K) – Implementation and experiments of graph embedding algorithms. MIT Euler (14 · 2.5K · ) – A distributed graph deep learning framework. Apache-2 AutoGL (14 · 590 · ) – An autoML framework & toolkit for machine learning on graphs. MIT OpenKE (13 · 2.4K · ) – An Open-Source Package for Knowledge Embedding (KE). MITGraphVite (13 · 860) – GraphVite: A General and High-performance Graph Embedding System. Apache-2Show 8 hidden projects…
Libraries for audio analysis, manipulation, transformation, and extraction, as well as speech recognition and music generation tasks.
DeepSpeech (31 · 17K) – DeepSpeech is an open source embedded (offline, on-device).. MPL-2.0 Pydub (30 · 5.2K · ) – Manipulate audio with a simple and easy high level interface. MITMagenta (29 · 16K) – Magenta: Music and Art Generation with Machine Intelligence. Apache-2 torchaudio (29 · 1.3K · ) – Data manipulation and transformation for audio signal.. BSD-2 librosa (27 · 4.3K) – Python library for audio and music analysis. ISCaudioread (26 · 360) – cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.. MITspleeter (25 · 16K) – Deezer source separation library including pretrained models. MIT pyAudioAnalysis (25 · 3.8K) – Python Audio Analysis Library: Feature Extraction,.. Apache-2python-soundfile (25 · 370) – SoundFile is an audio library based on libsndfile, CFFI, and.. BSD-3espnet (24 · 3.5K) – End-to-End Speech Processing Toolkit. Apache-2python_speech_features (23 · 1.9K) – This library provides common speech features for ASR.. MITtinytag (23 · 440) – Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and.. MITPorcupine (22 · 2.4K) – On-device wake word detection powered by deep learning. Apache-2DDSP (22 · 1.8K) – DDSP: Differentiable Digital Signal Processing. Apache-2 kapre (21 · 720) – kapre: Keras Audio Preprocessors. MIT Dejavu (20 · 5.3K · ) – Audio fingerprinting and recognition in Python. MITTTS (20 · 3.3K) – Deep learning for Text to Speech (Discussion forum:.. MPL-2.0Muda (17 · 180 · ) – A library for augmenting annotated audio data. ISCJulius (14 · 180 · ) – Fast PyTorch based DSP for audio and 1D signals. MIT Show 4 hidden projects…
Libraries to load, process, analyze, and write geographic data as well as libraries for spatial analysis, map visualization, and geocoding.
pydeck (33 · 8.5K) – WebGL2 powered geospatial visualization layers. MIT folium (32 · 5.2K) – Python Data. Leaflet.js Maps. MITgeopy (32 · 3.2K) – Geocoding library for Python. MITShapely (32 · 2.2K) – Manipulation and analysis of geometric objects. BSD-3GeoPandas (31 · 2.5K) – Python tools for geographic data. BSD-3 pyproj (31 · 580 · ) – Python interface to PROJ (cartographic projections and coordinate.. MITRasterio (30 · 1.4K) – Rasterio reads and writes geospatial raster datasets. BSD-3Fiona (30 · 780) – Fiona reads and writes geographic data files. BSD-3ipyleaflet (28 · 1.1K · ) – A Jupyter – Leaflet.js bridge. MIT geojson (26 · 600) – Python bindings and utilities for GeoJSON. BSD-3ArcGIS API (25 · 980) – Documentation and samples for ArcGIS API for Python. Apache-2PySAL (25 · 830) – PySAL: Python Spatial Analysis Library Meta-Package. BSD-3GeoViews (22 · 330) – Simple, concise geographical visualization in Python. BSD-3EarthPy (20 · 230) – A package built to support working with spatial data using open source.. BSD-3pymap3d (19 · 180) – pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef.. BSD-2Show 7 hidden projects…
Libraries for algorithmic stock/crypto trading, risk analytics, backtesting, technical analysis, and other tasks on financial data.
zipline (30 · 14K) – Zipline, a Pythonic Algorithmic Trading Library. Apache-2yfinance (30 · 4.5K) – Yahoo! Finance market data downloader (+faster Pandas Datareader). Apache-2Alpha Vantage (27 · 3.2K) – A python wrapper for Alpha Vantage API for financial data. MITta (27 · 1.9K) – Technical Analysis Library using Pandas and Numpy. MITpyfolio (26 · 3.6K · ) – Portfolio and risk analytics in Python. Apache-2empyrical (25 · 740) – Common financial risk and performance metrics. Used by zipline and.. Apache-2Alphalens (24 · 1.8K · ) – Performance analysis of predictive (alpha) stock factors. Apache-2IB-insync (24 · 1.3K) – Python sync/async framework for Interactive Brokers API. BSD-2bt (24 · 980) – bt – flexible backtesting for Python. MITffn (24 · 800) – ffn – a financial function library for Python. MITEnigma Catalyst (23 · 2K) – An Algorithmic Trading Library for Crypto-Assets in Python. Apache-2stockstats (23 · 730) – Supply a wrapper “StockDataFrame“ based on the.. BSD-3TensorTrade (21 · 3K) – An open source reinforcement learning framework for training,.. Apache-2finmarketpy (20 · 2.5K) – Python library for backtesting trading strategies & analyzing.. Apache-2Qlib (19 · 4.6K) – Qlib is an AI-oriented quantitative investment platform, which aims to.. MIT tf-quant-finance (19 · 2.5K) – High-performance TensorFlow library for quantitative.. Apache-2 Crypto Signals (18 · 2.7K) – Github.com/CryptoSignal – #1 Quant Trading & Technical Analysis.. MITShow 6 hidden projects…
Libraries for forecasting, anomaly detection, feature extraction, and machine learning on time-series and sequential data.
Prophet (28 · 12K) – Tool for producing high quality forecasts for time series data that has.. MITtsfresh (27 · 5.5K) – Automatic extraction of relevant features from time series:. MIT sktime (27 · 3.7K) – A unified framework for machine learning with time series. BSD-3 pmdarima (26 · 830) – A statistical library designed to fill the void in Python’s time series.. MITtslearn (25 · 1.5K) – A machine learning toolkit dedicated to time-series data. BSD-2 Streamz (24 · 920) – Real-time stream processing for python. BSD-3GluonTS (23 · 1.8K) – Probabilistic time series modeling in Python. Apache-2 Darts (22 · 750) – A python library for easy manipulation and forecasting of time series. Apache-2STUMPY (20 · 1.7K) – STUMPY is a powerful and scalable Python library for computing a Matrix.. BSD-3pyts (20 · 890 · ) – A Python package for time series classification. BSD-3pytorch-forecasting (19 · 830) – Time series forecasting with PyTorch. MITseglearn (19 · 430) – Python module for machine learning time series:. BSD-3matrixprofile-ts (18 · 620 · ) – A Python library for detecting patterns and anomalies.. Apache-2Auto TS (18 · 190) – Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost.. Apache-2ADTK (17 · 610 · ) – A Python toolkit for rule-based/unsupervised anomaly detection in time.. MPL-2.0tick (17 · 320 · ) – Module for statistical learning, with a particular emphasis on time-.. BSD-3atspy (16 · 340) – AtsPy: Automated Time Series Models in Python (by @firmai). MITShow 3 hidden projects…
Libraries for processing and analyzing medical data such as MRIs, EEGs, genomic data, and other medical imaging formats.
Lifelines (29 · 1.6K) – Survival analysis in Python. MITNilearn (29 · 710) – Machine learning for NeuroImaging in Python. BSD-3 NIPYPE (29 · 560) – Workflows and interfaces for neuroimaging packages. Apache-2NiBabel (29 · 390) – Python package to access a cacophony of neuro-imaging file formats. MITMNE (27 · 1.5K) – MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python. BSD-3DIPY (27 · 390) – DIPY is the paragon 3D/4D+ imaging library in Python. Contains generic.. BSD-3Hail (24 · 700) – Scalable genomic data analysis. MIT NIPY (23 · 290) – Neuroimaging in Python FMRI analysis package. BSD-3MONAI (22 · 1.8K) – AI Toolkit for Healthcare Imaging. Apache-2 DeepVariant (21 · 2.2K) – DeepVariant is an analysis pipeline that uses a deep neural.. BSD-3 NiftyNet (21 · 1.3K · ) – [unmaintained] An open-source convolutional neural.. Apache-2 Brainiak (19 · 230) – Brain Imaging Analysis Kit. Apache-2Glow (19 · 160) – An open-source toolkit for large-scale genomic analysis. Apache-2Medical Detection Toolkit (12 · 910 · ) – The Medical Detection Toolkit contains 2D + 3D.. Apache-2 MedicalNet (11 · 1.1K · ) – Many studies have shown that the performance on deep learning is.. MITShow 4 hidden projects…
Libraries for optical character recognition (OCR) and text extraction from images or videos.
Tesseract (30 · 3.5K) – Python-tesseract is an optical character recognition (OCR) tool.. Apache-2EasyOCR (28 · 11K) – Ready-to-use OCR with 80+ supported languages and all popular writing.. Apache-2OCRmyPDF (27 · 4K) – OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to.. MPL-2.0tesserocr (26 · 1.4K) – A Python wrapper for the tesseract-ocr API. MITPaddleOCR (24 · 11K) – Awesome multilingual OCR toolkits based on PaddlePaddle.. Apache-2 attention-ocr (21 · 840) – A Tensorflow model for text recognition (CNN + seq2seq with.. MIT keras-ocr (20 · 780) – A packaged and flexible version of the CRAFT text detector and.. MIT calamari (19 · 790) – Line based ATR Engine based on OCRopy. Apache-2doc2text (18 · 1.2K) – Detect text blocks and OCR poorly scanned PDFs in bulk. Python module.. MITMozart (10 · 240 · ) – An optical music recognition (OMR) system. Converts sheet.. Apache-2 Show 1 hidden projects…
General-purpose data containers & structures as well as utilities & extensions for pandas.
pandas (40 · 29K) – Flexible and powerful data analysis / manipulation library for.. BSD-3 numpy (38 · 17K) – The fundamental package for scientific computing with Python. BSD-3h5py (36 · 1.5K) – HDF5 for Python — The h5py package is a Pythonic interface to the HDF5.. BSD-3Arrow (35 · 7.5K) – Apache Arrow is a cross-language development platform for in-memory.. Apache-2xarray (32 · 2K) – N-D labeled arrays and datasets in Python. Apache-2numexpr (31 · 1.6K) – Fast numerical array expression evaluator for Python, NumPy, PyTables,.. MITTinyDB (29 · 4.1K) – TinyDB is a lightweight document oriented database optimized for your.. MITKoalas (29 · 2.7K) – Koalas: pandas API on Apache Spark. Apache-2 Bottleneck (29 · 580) – Fast NumPy array functions written in C. BSD-2Modin (28 · 5.8K) – Modin: Speed up your Pandas workflows by changing a single line of.. Apache-2 PyTables (28 · 1K) – A Python package to manage extremely large amounts of data. BSD-3datasketch (27 · 1.4K) – MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog,.. MITzarr (26 · 660) – An implementation of chunked, compressed, N-dimensional arrays for Python. MITbcolz (25 · 910) – A columnar data container that can be compressed. BSD-3Arctic (24 · 2.2K) – Arctic is a high performance datastore for numeric data. ❗️LGPL-2.1swifter (24 · 1.6K) – A package which efficiently applies any function to a pandas.. MIT Pandaral·lel (24 · 1.4K) – A simple and efficient tool to parallelize Pandas.. BSD-3 Vaex (23 · 5.9K) – Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and.. MITdatatable (21 · 1.2K) – A Python package for manipulating 2-dimensional tabular data.. MPL-2.0StaticFrame (21 · 220) – Immutable and grow-only Pandas-like DataFrames with a more explicit.. MITfletcher (20 · 210) – Pandas ExtensionDType/Array backed by Apache Arrow. MIT Bounter (17 · 900 · ) – Efficient Counter that uses a limited (bounded) amount of memory.. MITPandaPy (14 · 470) – PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x.. MIT Show 5 hidden projects…
Libraries for loading, collecting, and extracting data from a variety of data sources and formats.
Faker (36 · 12K) – Faker is a Python package that generates fake data for you. MITxlrd (34 · 1.9K) – Please use openpyxl where you can… BSD-3xmltodict (32 · 4.3K · ) – Python module that makes working with XML feel like you are.. MITTensorFlow Datasets (32 · 2.7K) – TFDS is a collection of datasets ready to use with.. Apache-2 python-magic (32 · 1.8K) – A python wrapper for libmagic. MITTablib (31 · 3.9K) – Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c. MITsmart-open (30 · 2K) – Utils for streaming large files (S3, HDFS, gzip, bz2…). MITDatasets (29 · 6.9K) – The largest hub of ready-to-use NLP datasets for ML models with.. Apache-2pandas-datareader (29 · 1.9K) – Extract data from a wide range of Internet sources.. BSD-3 snorkel (28 · 4.5K · ) – A system for quickly generating training data with weak.. Apache-2csvkit (28 · 4.5K) – A suite of utilities for converting to and working with CSV, the king of.. MITtabulator-py (26 · 200) – Python library for reading and writing tabular data via streams. MITIntake (25 · 530) – Intake is a lightweight package for finding, investigating, loading and.. BSD-2SDV (21 · 360) – Synthetic Data Generation for tabular, relational and time series data. MITdatatest (21 · 240) – Tools for test driven data-wrangling and data validation. Apache-2Show 8 hidden projects…
Libraries for web scraping, crawling, downloading, and mining as well as libraries.
best-of-web-python – Web Scraping ( 1.1K · ) – Collection of web-scraping and crawling libraries.
Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
Celery (39 · 17K · ) – Asynchronous task queue/job queue based on distributed message passing. BSD-3Airflow (36 · 21K · ) – Platform to programmatically author, schedule, and monitor.. Apache-2joblib (35 · 2.4K) – Computing with Python functions. BSD-3rq (33 · 7.6K) – Simple job queues for Python. BSD-3luigi (32 · 14K) – Luigi is a Python module that helps you build complex pipelines of batch.. Apache-2Beam (32 · 4.6K) – Unified programming model to define and execute data processing.. Apache-2Prefect (30 · 6K) – The easiest way to automate your data. Apache-2dbt (29 · 2.7K) – dbt (data build tool) enables data analysts and engineers to transform.. Apache-2faust (28 · 5.4K) – Python Stream Processing. BSD-3Kedro (28 · 3.6K) – A Python framework for creating reproducible, maintainable and modular.. Apache-2Dagster (27 · 3K) – A data orchestrator for machine learning, analytics, and ETL. Apache-2mrjob (27 · 2.5K) – Run MapReduce jobs on Hadoop or Amazon Web Services. Apache-2petl (27 · 860) – Python Extract Transform and Load Tables of Data. MITPyFunctional (26 · 1.8K) – Python library for creating data pipelines with chain functional.. MITHub (25 · 2.7K) – Fastest unstructured dataset management for TensorFlow/PyTorch… MPL-2.0 TFX (25 · 1.4K) – TFX is an end-to-end platform for deploying production ML pipelines. Apache-2 Great Expectations (24 · 3.9K) – Always know what to expect from your data. Apache-2streamparse (23 · 1.4K) – Run Python in Apache Storm topologies. Pythonic API, CLI.. Apache-2bonobo (23 · 1.4K) – Extract Transform Load for Python 3.5+. Apache-2Optimus (23 · 980) – Agile Data Preparation Workflows madeeasy with dask, cudf,.. Apache-2 pysparkling (23 · 230) – A pure Python implementation of Apache Spark’s RDD and DStream.. MITPypeline (22 · 1.2K) – Concurrent data pipelines in Python . MITdpark (20 · 2.6K) – Python clone of Spark, a MapReduce alike framework in Python. BSD-3 mrq (20 · 840) – Mr. Queue – A distributed worker task queue in Python using Redis & gevent. MITpdpipe (20 · 590) – Easy pipelines for pandas DataFrames. MIT ploomber (20 · 210) – A convention over configuration workflow orchestrator. Develop.. Apache-2spark-deep-learning (18 · 1.8K) – Deep Learning Pipelines for Apache Spark. Apache-2 Mara Pipelines (18 · 1.6K) – A lightweight opinionated ETL framework, halfway between plain.. MITTaskTiger (18 · 1K) – Python task queue using Redis. MITDatabolt Flow (18 · 900) – Python library for building highly effective data science workflows. MITBatchFlow (18 · 160) – BatchFlow helps you conveniently work with random or sequential.. Apache-2flupy (18 · 150) – Fluent data pipelines for python and your shell. MITriko (17 · 1.6K · ) – A Python stream processing engine modeled after Yahoo! Pipes. MITzenml (14 · 900 · ) – ZenML: Bring Zen to your ML with reproducible pipelines. Apache-2Show 1 hidden projects…
Libraries that provide capabilities to distribute and parallelize machine learning tasks across large-scale compute infrastructure.
Ray (32 · 15K) – An open source framework that provides a simple, universal API for.. Apache-2dask (32 · 8K · ) – Parallel computing with task scheduling. BSD-3dask.distributed (31 · 1.2K · ) – A distributed task scheduler for Dask. BSD-3horovod (29 · 11K) – Distributed training framework for TensorFlow, Keras, PyTorch, and.. Apache-2ipyparallel (28 · 1.9K) – Interactive Parallel Computing in Python. BSD-3 Mesh (26 · 910) – Mesh TensorFlow: Model Parallelism Made Easier. Apache-2 BigDL (25 · 3.7K) – BigDL: Distributed Deep Learning Framework for Apache Spark. Apache-2Elephas (25 · 1.5K) – Distributed Deep learning with Keras & Spark. MIT keras petastorm (25 · 1.1K) – Petastorm library enables single machine or distributed training.. Apache-2mpi4py (25 · 390) – Python bindings for MPI. BSD-3DeepSpeed (24 · 4.5K) – DeepSpeed is a deep learning optimization library that makes.. MIT TensorFlowOnSpark (24 · 3.6K) – TensorFlowOnSpark brings TensorFlow programs to.. Apache-2 dask-ml (24 · 690) – Scalable Machine Learning with Dask. BSD-3MMLSpark (23 · 2.3K) – Microsoft Machine Learning for Apache Spark. MIT analytics-zoo (22 · 2.2K) – Distributed Tensorflow, Keras and PyTorch on Apache.. Apache-2 FairScale (21 · 850) – PyTorch extensions for high performance and large scale training. BSD-3 Submit it (21 · 310) – Python 3.6+ toolbox for submitting jobs to Slurm. MITApache Singa (19 · 2.2K) – a distributed deep learning platform. Apache-2BytePS (18 · 2.7K) – A high performance and generic framework for distributed DNN training. Apache-2Fiber (18 · 860) – Distributed Computing for AI Made Simple. Apache-2Hivemind (18 · 660) – Decentralized deep learning in PyTorch. Built to train models on.. MITsk-dist (18 · 260) – Distributed scikit-learn meta-estimators in PySpark. Apache-2 somoclu (18 · 220 · ) – Massively parallel self-organizing maps: accelerate training on.. MITShow 3 hidden projects…
Hyperparameter Optimization & AutoML
Libraries for hyperparameter optimization, automl and neural architecture search.
Optuna (31 · 4.2K) – A hyperparameter optimization framework. MITHyperopt (30 · 5.5K) – Distributed Asynchronous Hyperparameter Optimization in Python. BSD-3scikit-optimize (29 · 2.1K) – Sequential model-based optimization with a `scipy.optimize`.. BSD-3Keras Tuner (28 · 2.3K) – Hyperparameter tuning for humans. Apache-2 AutoKeras (27 · 7.8K) – AutoML library for deep learning. Apache-2 Bayesian Optimization (27 · 4.9K) – A Python implementation of global optimization with.. MITNNI (26 · 9.3K) – An open source AutoML toolkit for automate machine learning lifecycle,.. MITauto-sklearn (26 · 5.3K) – Automated Machine Learning with scikit-learn. BSD-3 AutoGluon (26 · 3K) – AutoGluon: AutoML for Text, Image, and Tabular Data. Apache-2 nevergrad (26 · 2.8K) – A Python toolbox for performing gradient-free optimization. MITBoTorch (26 · 1.9K) – Bayesian optimization in PyTorch. MIT SMAC3 (26 · 560) – Sequential Model-based Algorithm Configuration. BSD-3featuretools (25 · 5.4K) – An open source python library for automated feature engineering. BSD-3Ax (25 · 1.4K) – Adaptive Experimentation Platform. MIT Hyperas (23 · 2.1K) – Keras + Hyperopt: A very simple wrapper for convenient.. MIT GPyOpt (23 · 720) – Gaussian Process Optimization using GPy. BSD-3Talos (22 · 1.4K) – Hyperparameter Optimization for TensorFlow, Keras and PyTorch. MIT Orion (22 · 180) – Asynchronous Distributed Hyperparameter Optimization. BSD-3AdaNet (21 · 3.2K · ) – Fast and flexible AutoML with learning guarantees. Apache-2 mljar-supervised (21 · 950) – Automates Machine Learning Pipeline with Feature Engineering.. MITNeuraxle (21 · 380) – A Sklearn-like Framework for Hyperparameter Tuning and AutoML in.. Apache-2lazypredict (20 · 400) – Lazy Predict help build a lot of basic models without much code.. MIT optunity (20 · 360 · ) – optimization routines for hyperparameter tuning. BSD-3Auto ViML (20 · 220) – Automatically Build Multiple ML Models with a Single Line of Code… Apache-2Test Tube (19 · 660 · ) – Python library to easily log experiments and parallelize.. MITDragonfly (17 · 570 · ) – An open source python library for scalable Bayesian optimisation. MITHyperparameterHunter (16 · 650) – Easy hyperparameter optimization and automatic result.. MITAlphaPy (16 · 560) – Automated Machine Learning [AutoML] with Python, scikit-learn, Keras,.. Apache-2Parfit (15 · 200 · ) – A package for parallelizing the fit and flexibly scoring of.. MIT ENAS (13 · 2.4K · ) – PyTorch implementation of Efficient Neural Architecture Search via.. Apache-2Devol (11 · 920 · ) – Genetic neural architecture search with Keras. MITShow 14 hidden projects…
Libraries for building and evaluating reinforcement learning & agent-based systems.
OpenAI Gym (35 · 24K) – A toolkit for developing and comparing reinforcement learning.. MITDopamine (27 · 9.3K) – Dopamine is a research framework for fast prototyping of.. Apache-2 TensorLayer (27 · 6.5K) – Deep Learning and Reinforcement Learning Library for.. Apache-2 TF-Agents (27 · 1.8K) – TF-Agents: A reliable, scalable and easy to use TensorFlow.. Apache-2 TensorForce (25 · 2.9K) – Tensorforce: a TensorFlow library for applied.. Apache-2 ViZDoom (25 · 1.2K) – Doom-based AI Research Platform for Reinforcement Learning from Raw.. MITStable Baselines (24 · 3K) – A fork of OpenAI Baselines, implementations of reinforcement.. MITAcme (23 · 2K) – A library of reinforcement learning components and agents. Apache-2 garage (22 · 1.1K) – A toolkit for reproducible reinforcement learning research. MIT ChainerRL (22 · 930) – ChainerRL is a deep reinforcement learning library built on top of.. MITPARL (21 · 1.9K) – A high-performance distributed training framework for Reinforcement.. Apache-2 TRFL (19 · 3.1K · ) – TensorFlow Reinforcement Learning. Apache-2 Coach (19 · 1.9K) – Reinforcement Learning Coach by Intel AI Lab enables easy.. Apache-2PFRL (19 · 530) – PFRL: a PyTorch-based deep reinforcement learning library. MITReAgent (17 · 2.8K) – A platform for Reasoning systems (Reinforcement Learning,.. BSD-3 RLax (17 · 570) – A library of reinforcement learning building blocks in JAX. Apache-2 jaxShow 3 hidden projects…
Libraries for building and evaluating recommendation systems.
lightfm (27 · 3.5K) – A Python implementation of LightFM, a hybrid recommendation algorithm. Apache-2implicit (27 · 2.3K) – Fast Python Collaborative Filtering for Implicit Feedback Datasets. MITscikit-surprise (26 · 4.7K · ) – A Python scikit for building and analyzing recommender.. BSD-3TF Ranking (22 · 2.1K) – Learning to Rank in TensorFlow. Apache-2 Cornac (22 · 310) – A Comparative Framework for Multimodal Recommender Systems. Apache-2Recommenders (21 · 9.3K) – Best Practices on Recommendation Systems. MITfastFM (20 · 910 · ) – fastFM: A Library for Factorization Machines. BSD-3RecBole (20 · 770) – A unified, comprehensive and efficient recommendation library. MIT TF Recommenders (19 · 750) – TensorFlow Recommenders is a library for building.. Apache-2 recmetrics (18 · 240) – A library of metrics for evaluating recommender systems. MITCase Recommender (16 · 320 · ) – Case Recommender: A Flexible and Extensible Python.. MIT Show 3 hidden projects…
Libraries for encrypted and privacy-preserving machine learning using methods like federated learning & differential privacy.
PySyft (26 · 6.9K) – A library for answering questions using data you cannot see. Apache-2 Opacus (22 · 760) – Training PyTorch models with differential privacy. Apache-2 FATE (20 · 2.8K) – An Industrial Grade Federated Learning Framework. Apache-2TensorFlow Privacy (20 · 1.4K) – Library for training machine learning models with.. Apache-2 TFEncrypted (20 · 830 · ) – A Framework for Encrypted Machine Learning in TensorFlow. Apache-2 CrypTen (16 · 730) – A framework for Privacy Preserving Machine Learning. MIT
Workflow & Experiment Tracking
Libraries to organize, track, and visualize machine learning experiments.
Tensorboard (36 · 5.2K) – TensorFlow’s Visualization Toolkit. Apache-2 mlflow (32 · 8.6K) – Open source platform for the machine learning lifecycle. Apache-2DVC (30 · 7.5K) – Data Version Control | Git for Data & Models. Apache-2wandb client (30 · 2.8K) – A tool for visualizing and tracking your machine learning.. MITSageMaker SDK (30 · 1.3K) – A library for training and deploying machine learning.. Apache-2 kaggle (29 · 3.9K) – Official Kaggle API. Apache-2AzureML SDK (29 · 2.2K) – Python notebooks with ML and deep learning examples with Azure.. MITsnakemake (29 · 880) – This is the development home of the workflow management system.. MITtensorboardX (28 · 6.8K) – tensorboard for pytorch (and chainer, mxnet, numpy, …). MITsacred (28 · 3.3K) – Sacred is a tool to help you configure, organize, log and reproduce.. MITPyCaret (28 · 3K) – An open-source, low-code machine learning library in Python. MITMetaflow (26 · 4.2K) – Build and manage real-life data science projects with ease. Apache-2Catalyst (26 · 2.5K) – Accelerated deep learning R&D. Apache-2 VisualDL (24 · 3.9K) – Deep Learning Visualization Toolkit. Apache-2 ClearML (24 · 2.2K) – ClearML – Auto-Magical Suite of tools to streamline your ML.. Apache-2TNT (24 · 1.3K) – Simple tools for logging and visualizing, loading and training. BSD-3 livelossplot (24 · 1K) – Live training loss plot in Jupyter Notebook for Keras, PyTorch.. MIT ml-metadata (24 · 290) – For recording and retrieving metadata associated with ML.. Apache-2TensorWatch (22 · 3K) – Debugging, monitoring and visualization for Python Machine Learning.. MITknockknock (22 · 2K · ) – Knock Knock: Get notified when your training ends with only two.. MITlore (21 · 1.5K · ) – Lore makes machine learning approachable for Software Engineers and.. MITGuild AI (21 · 550) – Experiment tracking, ML developer tools. Apache-2Studio.ml (21 · 370) – Studio: Simplify and expedite model building process. Apache-2quinn (21 · 220) – pyspark methods to enhance developer productivity. Apache-2 hiddenlayer (20 · 1.4K · ) – Neural network graphs and training metrics for.. MIT Labml (20 · 500) – Monitor deep learning model training and hardware usage from your mobile.. MITgokart (19 · 170) – A wrapper of the data pipeline library luigi. MITaim (15 · 880) – Aim a super-easy way to record, search and compare 1000s of ML training.. Apache-2Show 7 hidden projects…
Model Serialization & Conversion
Libraries to serialize models to files, convert between a variety of model formats, and optimize models for deployment.
onnx (33 · 9.9K) – Open standard for machine learning interoperability. Apache-2Core ML Tools (26 · 2.1K) – Core ML tools contain supporting tools for Core ML model.. BSD-3TorchServe (24 · 1.6K) – Model Serving on PyTorch. Apache-2 mmdnn (23 · 5.3K · ) – MMdnn is a set of tools to help users inter-operate among different deep.. MITcortex (21 · 7.4K) – Model serving at scale. Apache-2m2cgen (21 · 1.8K) – Transform ML models into a native code (Java, C, Python, Go, JavaScript,.. MITHummingbird (20 · 2.3K) – Hummingbird compiles trained ML models into tensor computation for.. MITpytorch2keras (18 · 670 · ) – PyTorch to Keras model convertor. MITtfdeploy (16 · 350) – Deploy tensorflow graphs for fast evaluation and export to.. BSD-3 Show 2 hidden projects…
Libraries to visualize, explain, debug, evaluate, and interpret machine learning models.
shap (34 · 12K) – A game theoretic approach to explain the output of any machine learning model. MITLime (29 · 8.5K) – Lime: Explaining the predictions of any machine learning classifier. BSD-2pyLDAvis (28 · 1.4K) – Python library for interactive topic model visualization. Port of.. BSD-3 InterpretML (27 · 3.5K) – Fit interpretable models. Explain blackbox machine learning. MIT Model Analysis (27 · 1K) – Model analysis tools for TensorFlow. Apache-2 yellowbrick (25 · 3.1K) – Visual analysis and diagnostic tools to facilitate machine.. Apache-2 Captum (25 · 2.2K) – Model interpretability and understanding for PyTorch. BSD-3 dtreeviz (25 · 1.4K) – A python library for decision tree visualization and model interpretation. MITFairness 360 (25 · 1.2K) – A comprehensive set of fairness metrics for datasets and.. Apache-2arviz (25 · 960) – Exploratory analysis of Bayesian models with Python. Apache-2Lucid (24 · 4.1K) – A collection of infrastructure and tools for research in neural.. Apache-2 DoWhy (24 · 2.7K) – DoWhy is a Python library for causal inference that supports explicit.. MITkeras-vis (23 · 2.8K · ) – Neural network visualization toolkit for keras. MIT TreeInterpreter (23 · 650) – Package for interpreting scikit-learn’s decision tree.. BSD-3 Alibi (22 · 910) – Algorithms for monitoring and explaining machine learning models. Apache-2keract (22 · 860) – Activation Maps (Layers Outputs) and Gradients in Keras. MIT random-forest-importances (22 · 420) – Code to compute permutation and drop-column.. MIT Explainability 360 (21 · 780) – Interpretability and explainability of data and machine.. Apache-2iNNvestigate (21 · 780) – A toolbox to iNNvestigate neural networks’ predictions!. BSD-2 tf-explain (21 · 780) – Interpretability Methods for tf.keras models with Tensorflow 2.x. MIT fairlearn (21 · 710) – A Python package to assess and improve fairness of machine.. MIT aequitas (21 · 360) – Bias and Fairness Audit Toolkit. MITexplainerdashboard (20 · 370) – Quickly build Explainable AI dashboards that show the inner.. MITchecklist (19 · 1.3K) – Beyond Accuracy: Behavioral Testing of NLP models with CheckList. MIT CausalNex (19 · 1K) – A Python library that helps data scientists to infer.. Apache-2 deeplift (19 · 510) – Public facing deeplift repo. MITWhat-If Tool (19 · 460) – Source code/webpage/demos for the What-If Tool. Apache-2sklearn-evaluation (19 · 290) – Machine learning model evaluation made easy: plots,.. MIT tcav (18 · 440) – Code for the TCAV ML interpretability project. Apache-2 fairness-indicators (18 · 180) – Tensorflow’s Fairness Evaluation and Visualization.. Apache-2 LIT (17 · 2.4K) – The Language Interpretability Tool: Interactively analyze NLP models for.. Apache-2ExplainX.ai (17 · 190) – Explainable AI framework for data scientists. Explain & debug any.. MITimodels (17 · 190) – Interpretable ML package for concise, transparent, and accurate predictive.. MITDiCE (16 · 480) – Generate Diverse Counterfactual Explanations for any machine.. MIT LOFO (16 · 310 · ) – Leave One Feature Out Importance. MITmodel-card-toolkit (16 · 180) – a tool that leverages rich metadata and lineage.. Apache-2FlashTorch (15 · 560 · ) – Visualization toolkit for neural networks in PyTorch! Demo –. MIT Anchor (14 · 630) – Code for High-Precision Model-Agnostic Explanations paper. BSD-2Show 8 hidden projects…
Vector Similarity Search (ANN)
Libraries for Approximate Nearest Neighbor Search and Vector Indexing/Similarity Search.
ANN Benchmarks ( 2.1K) – Benchmarks of approximate nearest neighbor libraries in Python.
Faiss (29 · 13K) – A library for efficient similarity search and clustering of dense vectors. MITAnnoy (29 · 8.2K) – Approximate Nearest Neighbors in C++/Python optimized for memory usage.. Apache-2NMSLIB (28 · 2.3K) – Non-Metric Space Library (NMSLIB): An efficient similarity search.. Apache-2hnswlib (26 · 1.4K) – Header-only C++/python library for fast approximate nearest neighbors. Apache-2Milvus (25 · 5.3K) – An open source embedding vector similarity search engine powered by.. Apache-2PyNNDescent (25 · 380) – A Python nearest neighbor descent for approximate nearest neighbors. BSD-2Magnitude (23 · 1.4K · ) – A fast, efficient universal vector embedding utility package. MITNGT (19 · 630) – Nearest Neighbor Search with Neighborhood Graph and Tree for High-.. Apache-2N2 (19 · 460) – TOROS N2 – lightweight approximate Nearest Neighbor library which runs fast.. Apache-2Show 2 hidden projects…
Libraries providing capabilities for probabilistic programming/reasoning, bayesian inference, gaussian processes, or statistics.
PyMC3 (32 · 5.6K) – Probabilistic Programming in Python: Bayesian Modeling and.. Apache-2tensorflow-probability (31 · 3.3K) – Probabilistic reasoning and statistical analysis in.. Apache-2 hmmlearn (29 · 2.2K) – Hidden Markov Models in Python, with scikit-learn like API. BSD-3 Pyro (28 · 6.8K) – Deep universal probabilistic programming with Python and PyTorch. Apache-2 GPyTorch (28 · 2.3K) – A highly efficient and modular implementation of Gaussian Processes.. MIT pomegranate (27 · 2.6K) – Fast, flexible and easy to use probabilistic modelling in Python. MITfilterpy (27 · 1.7K) – Python Kalman filtering and optimal estimation library. Implements.. MITGPflow (27 · 1.4K) – Gaussian processes in TensorFlow. Apache-2 pgmpy (25 · 1.7K) – Python Library for learning (Structure and Parameter) and inference.. MITSALib (24 · 440) – Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris,.. MITbambi (20 · 580) – BAyesian Model-Building Interface (Bambi) in Python. MITscikit-posthocs (20 · 190) – Multiple Pairwise Comparisons (Post Hoc) Tests in Python. MIT Funsor (19 · 160) – Functional tensors for probabilistic programming. Apache-2 pyhsmm (18 · 480 · ) – Bayesian inference in HSMMs and HMMs. MITOrbit (18 · 340) – A Python package for Bayesian forecasting with object-oriented design.. Apache-2Baal (17 · 320) – Using approximate bayesian posteriors in deep nets for active learning. Apache-2Show 5 hidden projects…
Libraries for testing the robustness of machine learning models against attacks with adversarial/malicious examples.
CleverHans (27 · 5K) – An adversarial example library for constructing attacks, building.. MIT Foolbox (27 · 1.8K) – A Python toolbox to create adversarial examples that fool neural networks.. MITART (23 · 2.1K) – Adversarial Robustness Toolbox (ART) – Python Library for Machine Learning.. MITTextAttack (23 · 1.3K) – TextAttack is a Python framework for adversarial attacks, data.. MITrobustness (18 · 490) – A library for experimenting with, training and evaluating neural.. MITAdvBox (16 · 1.1K · ) – Advbox is a toolbox to generate adversarial examples that fool.. Apache-2Show 2 hidden projects…
Libraries that require and make use of CUDA/GPU system capabilities to optimize data handling and machine learning tasks.
CuPy (31 · 4.9K) – A NumPy-compatible array library accelerated by CUDA. MITgpustat (26 · 2.3K) – A simple command-line utility for querying and monitoring GPU status. MITPyCUDA (25 · 1.1K · ) – CUDA integration for Python, plus shiny features. MITApex (23 · 5.1K) – A PyTorch Extension: Tools for easy mixed precision and distributed.. BSD-3 ArrayFire (23 · 3.3K) – ArrayFire: a general purpose GPU library. BSD-3scikit-cuda (23 · 800) – Python interface to GPU-powered libraries. BSD-3cuDF (21 · 3.7K) – cuDF – GPU DataFrame Library. Apache-2py3nvml (21 · 170 · ) – Python 3 Bindings for NVML library. Get NVIDIA GPU status inside.. BSD-3DALI (20 · 3.1K) – A library containing both highly optimized building blocks and an.. Apache-2cuML (19 · 2K) – cuML – RAPIDS Machine Learning Library. Apache-2BlazingSQL (17 · 1.4K) – BlazingSQL is a lightweight, GPU accelerated, SQL engine for.. Apache-2Vulkan Kompute (17 · 350) – General purpose GPU compute framework for cross vendor.. Apache-2cuGraph (16 · 670) – cuGraph – RAPIDS Graph Analytics Library. Apache-2cuSignal (15 · 460) – GPU accelerated signal processing. Apache-2Show 4 hidden projects…
Libraries that extend TensorFlow with additional capabilities.
tensorflow-hub (32 · 2.8K) – A library for transfer learning by reusing parts of.. Apache-2 tensor2tensor (31 · 11K) – Library of deep learning models and datasets designed to.. Apache-2 TF Addons (31 · 1.2K) – Useful extra functionality for TensorFlow 2.x maintained by.. Apache-2 TensorFlow Transform (29 · 860) – Input pipeline framework. Apache-2 TensorFlow I/O (26 · 420) – Dataset, streaming, and file system extensions.. Apache-2 TF Model Optimization (25 · 980) – A toolkit to optimize ML models for deployment for.. Apache-2 efficientnet (23 · 1.7K) – Implementation of EfficientNet model. Keras and.. Apache-2 TensorFlow Cloud (22 · 230) – The TensorFlow Cloud repository provides APIs that.. Apache-2 Neural Structured Learning (21 · 790) – Training neural models with structured signals. Apache-2 TensorNets (19 · 980) – High level network definitions with pre-trained weights in.. MIT tffm (18 · 760 · ) – TensorFlow implementation of an arbitrary order Factorization Machine. MIT TF Compression (18 · 450) – Data compression in TensorFlow. Apache-2 Saliency (17 · 640) – TensorFlow implementation for SmoothGrad, Grad-CAM, Guided.. Apache-2
Libraries that extend scikit-learn with additional capabilities.
imbalanced-learn (31 · 5.1K) – A Python Package to Tackle the Curse of Imbalanced.. MIT MLxtend (30 · 3.4K) – A library of extension and helper modules for Python’s data.. BSD-3 category_encoders (24 · 1.6K · ) – A library of sklearn compatible categorical variable.. BSD-3 sklearn-contrib-lightning (24 · 1.4K) – Large-scale linear classification, regression and.. BSD-3 scikit-opt (22 · 2K) – Genetic Algorithm, Particle Swarm Optimization, Simulated.. MIT fancyimpute (22 · 940) – Multivariate imputation and matrix completion algorithms.. Apache-2 combo (22 · 480) – (AAAI’ 20) A Python Toolbox for Machine Learning Model.. BSD-2 xgboostscikit-lego (20 · 440) – Extra blocks for scikit-learn pipelines. MIT DESlib (20 · 320) – A Python library for dynamic classifier and ensemble selection. BSD-3 iterative-stratification (19 · 530) – scikit-learn cross validators for iterative.. BSD-3 scikit-tda (19 · 270) – Topological Data Analysis for Python. MIT skggm (16 · 180) – Scikit-learn compatible estimation of general graphical models. MIT Show 5 hidden projects…
Libraries that extend Pytorch with additional capabilities.
pretrainedmodels (27 · 7.8K · ) – Pretrained ConvNets for pytorch: NASNet, ResNeXt,.. BSD-3 pytorch-summary (25 · 3K · ) – Model summary in PyTorch similar to `model.summary()` in.. MIT pytorch-optimizer (25 · 1.7K) – torch-optimizer — collection of optimizers for.. Apache-2 EfficientNet-PyTorch (24 · 5.5K) – A PyTorch implementation of EfficientNet. Apache-2 torchdiffeq (24 · 3.4K) – Differentiable ODE solvers with full GPU support and.. MIT PML (24 · 2.8K) – The easiest way to use deep metric learning in your application. Modular,.. MIT SRU (23 · 1.9K) – Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755). MIT Torchmeta (21 · 1.2K) – A collection of extensions and data-loaders for few-shot learning.. MIT torch-scatter (21 · 610) – PyTorch Extension Library of Optimized Scatter Operations. MIT PyTorch Sparse (21 · 360) – PyTorch Extension Library of Optimized Autograd Sparse.. MIT reformer-pytorch (20 · 1.4K) – Reformer, the efficient Transformer, in Pytorch. MIT EfficientNets (20 · 1.3K) – Pretrained EfficientNet, EfficientNet-Lite, MixNet,.. Apache-2 Higher (20 · 1.1K) – higher is a pytorch library allowing users to obtain higher.. Apache-2 TabNet (20 · 860) – PyTorch implementation of TabNet paper :.. MIT Pytorch Toolbelt (19 · 940) – PyTorch extensions for fast R&D prototyping and Kaggle.. MIT Performer Pytorch (17 · 540 · ) – An implementation of Performer, a linear attention-.. MIT Tensor Sensor (17 · 530) – The goal of this library is to generate more helpful.. MIT tinygrad (15 · 4.1K · ) – You like pytorch? You like micrograd? You love tinygrad!. MIT Lambda Networks (15 · 1.4K · ) – Implementation of LambdaNetworks, a new approach to.. MIT Torch-Struct (15 · 910) – Fast, general, and tested differentiable structured prediction.. MIT torchsde (15 · 680) – Differentiable SDE solvers with GPU support and efficient.. Apache-2 Pywick (15 · 320) – High-level batteries-included neural network training library for.. MIT Tez (14 · 580 · ) – Tez is a super-simple and lightweight Trainer for PyTorch. It.. Apache-2 micrograd (12 · 1.6K · ) – A tiny scalar-valued autograd engine and a neural net library.. MIT Show 3 hidden projects…
Libraries for connecting to, operating, and querying databases.
best-of-python – DB Clients ( 1.5K · ) – Collection of database clients for python.
scipy (40 · 8K) – Ecosystem of open-source software for mathematics, science, and engineering. BSD-3SymPy (36 · 7.9K) – A computer algebra system written in pure Python. BSD-3Autograd (30 · 5.2K) – Efficiently computes derivatives of numpy code. MIThdbscan (29 · 1.8K) – A high performance implementation of HDBSCAN clustering. BSD-3 PyOD (28 · 4.2K) – (JMLR’19) A Python Toolbox for Scalable Outlier Detection (Anomaly.. BSD-2Keras-Preprocessing (28 · 920) – Utilities for working with image data, text data, and.. MIT Cython BLIS (28 · 160) – Fast matrix-multiplication as a self-contained Python library no.. BSD-3Streamlit (27 · 14K) – Streamlit The fastest way to build data apps in Python. Apache-2carla (26 · 5.7K) – Open-source simulator for autonomous driving research. MITDatasette (26 · 4.8K) – An open source multi-tool for exploring and publishing data. Apache-2DeepChem (26 · 2.8K) – Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry,.. MIT agate (26 · 1K) – A Python data analysis library that is optimized for humans instead of machines. MITpyclustering (26 · 800) – pyclustring is a Python, C++ data mining library. BSD-3Trax (25 · 5.9K) – Trax Deep Learning with Clear Code and Speed. Apache-2causalml (25 · 1.8K) – Uplift modeling and causal inference with machine learning.. Apache-2Pythran (25 · 1.5K) – Ahead of Time compiler for numeric kernels. BSD-3TabPy (25 · 1K) – Execute Python code on the fly and display results in Tableau visualizations:. MITkmodes (25 · 820) – Python implementations of the k-modes and k-prototypes clustering.. MITmetric-learn (24 · 1.1K · ) – Metric learning algorithms in Python. MIT PennyLane (24 · 800) – PennyLane is a cross-platform Python library for differentiable.. Apache-2pyopencl (24 · 790 · ) – OpenCL integration for Python, plus shiny features. MITPySwarms (24 · 740) – A research toolkit for particle swarm optimization in Python. MITpyjanitor (24 · 640) – Clean APIs for data cleaning. Python implementation of R package Janitor. MITfindspark (24 · 390 · ) – Find pyspark to make it importable. BSD-3 datalad (24 · 230) – Keep code, data, containers under control with git and git-annex. MITGradio (23 · 2.1K) – Wrap UIs around any model, share with anyone. Apache-2modAL (23 · 1.1K) – A modular active learning framework for Python. MIT PaddleHub (22 · 4.7K) – Awesome pre-trained models toolkit based on.. Apache-2 pycm (22 · 1.1K) – Multi-class confusion matrix library in Python. MITPrince (22 · 590) – Python factor analysis library (PCA, CA, MCA, MFA, FAMD). MIT SUOD (22 · 240) – (MLSys’ 21) An Acceleration System for Large-scare Unsupervised.. BSD-2Mars (21 · 2.1K) – Mars is a tensor-based unified framework for large-scale data.. Apache-2tensorly (21 · 970) – TensorLy: Tensor Learning in Python. BSD-2StreamAlert (20 · 2.5K) – StreamAlert is a serverless, realtime data analysis framework.. Apache-2AstroML (20 · 730) – Machine learning, statistics, and data mining for astronomy and.. BSD-2 alibi-detect (20 · 600) – Algorithms for outlier and adversarial instance detection,.. Apache-2baikal (20 · 570) – A graph-based functional API for building complex scikit-learn pipelines. BSD-3BioPandas (20 · 330) – Working with molecular structures in pandas DataFrames. BSD-3 scikit-rebate (20 · 310) – A scikit-learn-compatible Python implementation of ReBATE, a.. MIT rrcf (20 · 290 · ) – Implementation of the Robust Random Cut Forest algorithm for anomaly.. MITFeature Engine (19 · 470) – Feature engineering package with sklearn like functionality. BSD-3apricot (18 · 310) – apricot implements submodular optimization for the purpose of selecting.. MITRiver (17 · 1.4K) – Online machine learning in Python. BSD-3traingenerator (10 · 940 · ) – A web app to generate template code for machine learning. MITShow 8 hidden projects…
Related Resources
- Papers With Code: Discover ML papers, code, and evaluation tables.
- Sotabench: Discover & compare open-source ML models.
- Google Dataset Search: Dataset search engine by Google.
- Dataset List: List of the biggest ML datasets from across the web.
- Awesome Public Datasets: A topic-centric list of open datasets.
- Best-of lists: Discover other best-of lists with awesome open-source projects on all kinds of topics.
- best-of-python-dev: A ranked list of awesome python developer tools and libraries.
- best-of-web-python: A ranked list of awesome python libraries for web development.
3. Machine Learning – TensorFlow – Part III
4. Artificial Intelligence (AI) – Part I
Reproduced from GitHub https://github.com/
A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.
Contents
- Courses
- Books
- Programming
- Philosophy
- Free Content
- Code
- Videos
- Learning
- Organizations
- Journals
- Competitions
- Newsletters
- Misc
Courses
- MIT: Intro to Deep Learning – A seven day bootcamp designed in MIT to introduce deep learning methods and applications
- Deep Blueberry: Deep Learning book – A free five-weekend plan to self-learners to learn the basics of deep-learning architectures like CNNs, LSTMs, RNNs, VAEs, GANs, DQN, A3C and more
- Spinning Up in Deep Reinforcement Learning – A free deep reinforcement learning course by OpenAI
- MIT Artifical Intelligence Videos – MIT AI Course
- Grokking Deep Learning in Motion – Beginner’s course to learn deep learning and neural networks without frameworks.
- Intro to Artificial Intelligence – Learn the Fundamentals of AI. Course run by Peter Norvig
- EdX Artificial Intelligence – The course will introduce the basic ideas and techniques underlying the design of intelligent computer systems
- Artificial Intelligence For Robotics – This class will teach you basic methods in Artificial Intelligence, including: probabilistic inference, planning and search, localization, tracking and control, all with a focus on robotics
- Machine Learning – Basic machine learning algorithms for supervised and unsupervised learning
- Neural Networks For Machine Learning – Algorithmic and practical tricks for artifical neural networks.
- Deep Learning – An Introductory course to the world of Deep Learning.
- Stanford Statistical Learning – Introductory course on machine learning focusing on: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines.
- Knowledge Based Artificial Intelligence – Georgia Tech’s course on Artificial Intelligence focussing on Symbolic AI.
- Deep RL Bootcamp Lectures – Deep Reinforcement Bootcamp Lectures – August 2017
- Machine Learning Crash Course By Google Machine Learning Crash Course features a series of lessons with video lectures, real-world case studies, and hands-on practice exercises.
- Python Class By Google This is a free class for people with a little bit of programming experience who want to learn Python. The class includes written materials, lecture videos, and lots of code exercises to practice Python coding.
- Deep Learning Crash Course In this liveVideo course, machine learning expert Oliver Zeigermann teaches you the basics of deep learning.
- Artificial Intelligence: A Modern Approach – Stuart Russell & Peter Norvig
- Also consider browsing the list of recommended reading, divided by each chapter in “Artificial Intelligence: A Modern Approach”.
- Paradigms Of Artificial Intelligence Programming: Case Studies in Common Lisp – Paradigms of AI Programming is the first text to teach advanced Common Lisp techniques in the context of building major AI systems
- Reinforcement Learning: An Introduction – This introductory textbook on reinforcement learning is targeted toward engineers and scientists in artificial intelligence, operations research, neural networks, and control systems, and we hope it will also be of interest to psychologists and neuroscientists.
- The Cambridge Handbook Of Artificial Intelligence – Written for non-specialists, it covers the discipline’s foundations, major theories, and principal research areas, plus related topics such as artificial life
- The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind – In this mind-expanding book, scientific pioneer Marvin Minsky continues his groundbreaking research, offering a fascinating new model for how our minds work
- Artificial Intelligence: A New Synthesis – Beginning with elementary reactive agents, Nilsson gradually increases their cognitive horsepower to illustrate the most important and lasting ideas in AI
- On Intelligence – Hawkins develops a powerful theory of how the human brain works, explaining why computers are not intelligent and how, based on this new theory, we can finally build intelligent machines. Also audio version available from audible.com
- How To Create A Mind – Kurzweil discusses how the brain works, how the mind emerges, brain-computer interfaces, and the implications of vastly increasing the powers of our intelligence to address the world’s problems
- Deep Learning – Goodfellow, Bengio and Courville’s introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction – Hastie and Tibshirani cover a broad range of topics, from supervised learning (prediction) to unsupervised learning including neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book.
- Deep Learning and the Game of Go – Deep Learning and the Game of Go teaches you how to apply the power of deep learning to complex human-flavored reasoning tasks by building a Go-playing AI. After exposing you to the foundations of machine and deep learning, you’ll use Python to build a bot and then teach it the rules of the game.
- Deep Learning for Search – Deep Learning for Search teaches you how to leverage neural networks, NLP, and deep learning techniques to improve search performance.
- Deep Learning with PyTorch – PyTorch puts these superpowers in your hands, providing a comfortable Python experience that gets you started quickly and then grows with you as you—and your deep learning skills—become more sophisticated. Deep Learning with PyTorch will make that journey engaging and fun.
- Deep Reinforcement Learning in Action – Deep Reinforcement Learning in Action teaches you the fundamental concepts and terminology of deep reinforcement learning, along with the practical skills and techniques you’ll need to implement it into your own projects.
- Grokking Deep Reinforcement Learning – Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching.
- Fusion in Action – Fusion in Action teaches you to build a full-featured data analytics pipeline, including document and data search and distributed data clustering.
- Real-World Natural Language Processing – Early access book on how to create practical NLP applications using Python.
- Grokking Machine Learning – Early access book that introduces the most valuable machine learning techniques.
- Succeeding with AI – An introduction to managing successful AI projects and applying AI to real-life situations.
- Elements of AI (Part 1) – Reaktor/University of Helsinki – An Introduction to AI is a free online course for everyone interested in learning what AI is, what is possible (and not possible) with AI, and how it affects our lives – with no complicated math or programming required.
- Essential Natural Language Processing – A hands-on guide to NLP with practical techniques, numerous Python-based examples and real-world case studies.
- Kaggle’s micro courses – A series of micro courses by offering practical and hands-on knowledge ranging from Python to Deep Learning.
- Transfer Learning for Natural Language Processing – A book that gets you up to speed with the relevant ML concepts and then dives into transfer learning for NLP.
- (Stanford Deep Learning Series][https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb]
- Amazon Machine Learning Developer Guide – A book for ML developers which itroduces the ML concepts & strategies with lots of practical usages.
- Machine Learning for Humans – A series of simple, plain-English explanations accompanied by math, code, and real-world examples.
Books
- Machine Learning for Mortals (Mere and Otherwise) – Early access book that provides basics of machine learning and using R programming language.
- How Machine Learning Works – Mostafa Samir. Early access book that introduces machine learning from both practical and theoretical aspects in a non-threating way.
- MachineLearningWithTensorFlow2ed – a book on general purpose machine learning techniques regression, classification, unsupervised clustering, reinforcement learning, auto encoders, convolutional neural networks, RNNs, LSTMs, using TensorFlow 1.14.1.
- Serverless Machine Learning – a book for machine learning engineers on how to train and deploy machine learning systems on public clouds like AWS, Azure, and GCP, using a code-oriented approach.
- The Hundred-Page Machine Learning Book – all you need to know about Machine Learning in a hundred pages, supervised and unsupervised learning, SVM, neural networks, ensemble methods, gradient descent, cluster analysis and dimensionality reduction, autoencoders and transfer learning, feature engineering and hyperparameter tuning.
Programming
- Prolog Programming For Artificial Intelligence – This best-selling guide to Prolog and Artificial Intelligence concentrates on the art of using the basic mechanisms of Prolog to solve interesting AI problems.
- AI Algorithms, Data Structures and Idioms in Prolog, Lisp and Java – PDF here
- Python Tools for Machine Learning
- Python for Artificial Intelligence
Philosophy
- Super Intelligence – Superintelligence asks the questions: What happens when machines surpass humans in general intelligence. A really great book.
- Our Final Invention: Artificial Intelligence And The End Of The Human Era – Our Final Invention explores the perils of the heedless pursuit of advanced AI. Until now, human intelligence has had no rival. Can we coexist with beings whose intelligence dwarfs our own? And will they allow us to?
- How to Create a Mind: The Secret of Human Thought Revealed – Ray Kurzweil, director of engineering at Google, explored the process of reverse-engineering the brain to understand precisely how it works, then applies that knowledge to create vastly intelligent machines.
- Minds, Brains, And Programs – The 1980 paper by philospher John Searle that contains the famous ‘Chinese Room’ thought experiment. Probably the most famous attack on the notion of a Strong AI possessing a ‘mind’ or a ‘consciousness’, and interesting reading for those interested in the intersection of AI and philosophy of mind.
- Gödel, Escher, Bach: An Eternal Golden Braid – Written by Douglas Hofstadter and taglined “a metaphorical fugue on minds and machines in the spirit of Lewis Carroll”, this wonderful journey into the the fundamental concepts of mathematics,symmetry and intelligence won a Pulitzer Price for Non-Fiction in 1979. A major theme throughout is the emergence of meaning from seemingly ‘meaningless’ elements, like 1’s and 0’s, arranged in special patterns.
- Life 3.0: Being Human in the Age of Artificial Intelligence – Max Tegmark, professor of Physics at MIT, discusses how Artificial Intelligence may affect crime, war, justice, jobs, society and our very sense of being human both in the near and far future.
Free Content
- Foundations Of Computational Agents – This book is published by Cambridge University Press, 2010
- The Quest For Artificial Intelligence – This book traces the history of the subject, from the early dreams of eighteenth-century (and earlier) pioneers to the more successful work of today’s AI engineers.
- Stanford CS229 – Machine Learning – This course provides a broad introduction to machine learning and statistical pattern recognition.
- Computers and Thought: A practical Introduction to Artificial Intelligence – The book covers computer simulation of human activities, such as problem solving and natural language understanding; computer vision; AI tools and techniques; an introduction to AI programming; symbolic and neural network models of cognition; the nature of mind and intelligence; and the social implications of AI and cognitive science.
- Society of Mind – Marvin Minsky’s seminal work on how our mind works. Lot of Symbolic AI concepts have been derived from this basis.
- Artificial Intelligence and Molecular Biology – The current volume is an effort to bridge that range of exploration, from nucleotide to abstract concept, in contemporary AI/MB research.
- Brief Introduction To Educational Implications Of Artificial Intelligence – This book is designed to help preservice and inservice teachers learn about some of the educational implications of current uses of Artificial Intelligence as an aid to solving problems and accomplishing tasks.
- Encyclopedia: Computational intelligence – Scholarpedia is a peer-reviewed open-access encyclopedia written and maintained by scholarly experts from around the world.
- Ethical Artificial Intelligence – a book by Bill Hibbard that combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence.
- Golden Artificial Intelligence – a cluster of pages on artificial intelligence and machine learning.
- R2D3 – A website with explanations on topics from Machine Learning to Statistics. All helped with beautiful animated infographics and real life examples. Available in various languages.
Code
- ExplainX– ExplainX is a fast, light-weight, and scalable explainable AI framework for data scientists to explain any black-box model to business stakeholders.
- AIMACode – Source code for “Artificial Intelligence: A Modern Approach” in Common Lisp, Java, Python. More to come.
- FANN – Fast Artificial Neural Network Library, native for C
- FARGonautica – Source code of Douglas Hosftadter’s Fluid Concepts and Creative Analogies Ph.D. projects.
Videos
- A tutorial on Deep Learning
- Basics of Computational Reinforcement Learning
- Deep Reinforcement Learning
- Intelligent agents and paradigms for AI
- The Unreasonable Effectiveness Of Deep Learning – The Director of Facebook’s AI Research, Dr. Yann LeCun gives a talk on deep convolutional neural networks and their applications to machine learning and computer vision
- AWS Machine Learning in Motion– This interactive liveVideo course gives you a crash course in using AWS for machine learning, teaching you how to build a fully-working predictive algorithm.
- Deep Learning with R in Motion-Deep Learning with R in Motion teaches you to apply deep learning to text and images using the powerful Keras library and its R language interface.
- Grokking Deep Learning in Motion-Grokking Deep Learning in Motion will not just teach you how to use a single library or framework, you’ll actually discover how to build these algorithms completely from scratch!
- Reinforcement Learning in Motion – This liveVideo breaks down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents.
Learning
- Deep Learning. Methods And Applications Free book from Microsoft Research
- Neural Networks And Deep Learning – Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts behind neural networks and deep learning
- Machine Learning: A Probabilistic Perspective – This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach
- Deep Learning – Yoshua Bengio, Ian Goodfellow and Aaron Courville put together this currently free (and draft version) book on deep learning. The book is kept up-to-date and covers a wide range of topics in depth (up to and including sequence-to-sequence learning).
- Getting Started with Deep Learning and Python
- Machine Learning Mastery
- Deep Learning.net – Aggregation site for DL resources
- Awesome Machine Learning – Like this Github, but ML-focused
- FastML
- Awesome Deep Learning Resources – Rough list of learning resources for Deep Learning
- Professional and In-Depth Machine Learning Video Courses – A collection of free professional and in depth Machine Learning and Data Science video tutorials and courses
- Professional and In-Depth Artificial Intelligence Video Courses – A collection of free professional and in depth Artificial Intelligence video tutorials and courses
- Professional and In-Depth Deep Learning Video Courses – A collection of free professional and in depth Deep Learning video tutorials and courses
- Introduction to Machine Learning – Introductory level machine learning crash course
- Awesome Graph Classification – Learning from graph stuctured data
- Awesome Community Detection – Clustering graph structured data
- Awesome Decision Tree Papers – Decision tree papers from machine learning conferences
- Awesome Gradient Boosting Papers – Gradient boosting papers from machine learning conferences
- Awesome Fraud Detection Papers – Fraud detection papers from machine learning conferences
- Awesome Neural Art – Creating art and manipulating images using deep neural networks.
Organizations
- IEEE Computational Intelligence Society
- Machine Intelligence Research Institute
- OpenAI
- Association For The Advancement of Artificial Intelligence
- Google DeepMind Research
- Nvidia Deep Learning
- AI Google
- Facebook AI
- IBM Research
- Microsoft Research
Journals
- AI & Society
- AI Communications
- AI Magazine
- Annals of Mathematics and Artifical Intelligence
- Applicable Algebra in Engineering, Communication and Computing
- Applied Artificial Intelligence
- Applied Intelligence
- Artificial Intelligence for Engineering Design, Analysis and Manufacturing
- Artificial Intelligence Review
- Artificial Intelligence
- Automated Software Engineering
- Autonomous Agents and Multi-Agent Systems
- Computational and Mathematical Organization Theory
- Computational Intelligence
- Electronic Transactions on Artificial Intelligence
- Evolutionary Intelligence
- EXPERT—IEEE Intelligent Systems
- IEEE Transactions Automation Science and Engineering
- Intelligent Industrial Systems
- International Journal of Intelligent Systems
- International Journal on Artificial Intelligence Tools
- Journal of Artificial Intelligence Research
- Journal of Automated Reasoning
- Journal of Experimental and Theoretical Artificial Intelligence
- Journal of Intelligent Information Systems
- Journal on Data Semantics
- Knowledge Engineering Review
- Minds and Machines
- Progress in Artificial Intelligence
Competitions
Newsletters
- AI Digest. A weekly newsletter to keep up to date with AI, machine learning, and data science. Archive.
Misc
- Open Cognition Project – We’re undertaking a serious effort to build a thinking machine
- AITopics – Large aggregation of AI resources
- AIResources – Directory of open source software and open access data for the AI research community
- Artificial Intelligence Subreddit
- AI Experiments with Google
5. Artificial Intelligence (AI) – Part II
A curated list of awesome awesomeness about artificial intelligence(AI).
Table of Contents
- Artificial Intelligence(AI)
- Machine Learning(ML)
- Deep Learning(DL)
- Computer Vision(CV)
- Natural Language Processing(NLP)
- Speech Recognition
- Other Research Topics
- Programming Languages
- Framework
- Datasets
- [AI Career](#AI Career)
Artificial Intelligence(AI)
Machine Learning(ML)
- ML
- ML-Source-Code
- ML-CN
- Adversarial-ML
- Quantum-ML
- 3D-Machine-Learning
- Machine Learning Interpretability
- Machine Learning System
- Mobile Machine Learning
- Machine Learning Problems
- Gradient Boosting
- Decision Tree
Deep Learning(DL)
- DL
- DL-Papers
- DL-Resources
- DeepLearning-500-questions
- Deep-Learning-in-Production
- DNN Compression and Acceleration
- Architecture Search
- Deep Learning for Graphs
- Real-time Network
- Deep Learning Interpretability
- Graph-based Deep Learning
Computer Vision(CV)
- CV
- CV2
- CV-People
- DeepFakes
- Event-based Vision Resources
- Embodied Vision
- Research Topics
- Action Recognition
- Colorization
- Image Classification
- Image Registration
- Object Detection
- Face
- Gaze Estimation
- HDR Image Synthesis
- Image Segmentation
- Object Tracking
- Pose estimation
- Object Pose Estimation
- Human Pose estimation
- Hand Pose estimation
- Human Motion
- Human-Object Interaction(HOI)
- Long-tailed
- Scene Text
- Super Resolution
- 3D
- OCR
- Re-ID
- Pedestrian Attribute Recognition
- Person Search
- Image Captioning
- Question Answering
- Crowd Counting
- Lane Detection
- Low Lignt Enhancement
- Image Retrieval
- Medical Imaging
- Image Inpainting
- Image/Video Dehazing
- Image Denoising
- Image Deraining
- Image/Video Deblurring
- Image to Image(img2img)
- Underwater Image Enhancement
- Video Analysis
- Video Object Segmentation(VOS)
- Edge Detection
- Local and Global Descriptor
- Salience
- Fashion + AI
- Event-based Vision Resources
- Video Stabilization
- Visual Transformer
Natural Language Processing(NLP)
- NLP
- NLP-progress
- CoreNLP
- NLPIR
- nlp_course
- nlp-datasets
- nlp-reading-group
- NLP Paper
- Awesome-Chinese-NLP: 中文自然语言处理相关资料
- awesome-dl4nlp
- awesome-sentence-embedding
- Research Topics
Speech Recognition
Other Research Topics
- Bayesian
- Capsule Networks
- Contrastive-Learning
- Data Augmentation
- Emebedded AI
- GAN(Generative Adversarial Networks)
- GAN-Case-Study
- really-awesome-gan
- AdversarialNetsPapers
- the-gan-zoo
- Keras-GAN
- gans-awesome-applications: Curated list of awesome GAN applications and demo
- image-to-image translation
- GAN Inversion
- Graph Neural Networks(GNN)
- Semi-Supervised Learning
- SLAM
- Reinforcement Learning
- Transfer Learning
- Trajectory Prediction
- Zero-Shot Learning
- Few-Shot Learning
- Federated Learning
- poga/Federated Learning
- Federated Computing/Learning
- [ChanChiChoi/Federated Learning](ChaoChiChoi/Federated Learning)
- Meta-Learning
- Open Set Recognition
- Self-Supervised
- Graph Classification
- Incremental Learning
- AutoML
- Model Compression
- Binary Neural Networks
- Multimodal Research
- Multimodal Machine Learning
- Neural Rendering
- NeRF
- Domain Adaptation
- Robotics
- Recommender Systems
- Autonomous Vehicles
- Anomaly Detection
- Yochengliu/Point Cloud Analysis
- NUAAXQ/Point Cloud Analysis
- 3D Point Clouds
- Affective_Computing
- Knowledge Distillation
- Click-Through Rate Prediction
- Label Noise
- VAE
- Imbalanced Learning