You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
👉Repository for the paper "Optimal Subarchitecture Extraction for BERT"
😎TOPICS: ``
⭐️STARS:243, 今日上升数↑:95
👉README:
Bort
Companion code for the paper "Optimal Subarchitecture Extraction for BERT."
Bort is an optimal subset of architectural parameters for the BERT architecture, extracted by applying a fully polynomial-time approximation scheme (FPTAS) for neural architecture search. Bort has an effective (that is, not counting the embedding layer) size of 5.5% the original BERT-large architecture, and 16% of the net size. It is also able to be pretrained in 288 GPU hours, which is 1.2% of the time required to pretrain the highest-performing BERT parametric architectural variant, RoBERTa-large.
It is also 7.9x faster on a CPU, and performs better than other compressed variants of the architecture, and some of the non-compressed variants; it obtains an average performance improvement of between 0.3% and 31%, absolute with respect to BERT-large on multiple public natural language understanding (NLU) benchmarks.
Here are the corresponding GLUE scores on the test set:
An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random features approach (FAVOR+).
Install
$ pip install performer-pytorch
Usage
Performer Language Model
import torch
from performer_pytorch import PerformerLM
model = PerformerLM(
num_tokens = 20000,
max_seq_len = 2048, # max sequence length
dim = 512, # dimension
depth = 6, # layers
heads = 8, # heads
causal = False, # auto-regressive or not
nb_features = 256, # number of random features, if not set, will default to (d * log(d)), where d is the dimension of each head
generalized_attention = False, # defaults to softmax approximation, but can be set to True for generalized attention
kernel_fn = nn....
The Serverless Data Lake Framework (SDLF) is a collection of reusable artifacts aimed at accelerating the delivery of enterprise data lakes on AWS, shortening the deployment time to production from several months to a few weeks. It can be used by AWS teams, partners and customers to implement the foundational structure of a data lake following best practices. It is used in production by more than thirty large organizations, including public references such as Embraer, Formula One, Hudl, and David Jones.
Public References
Motivation
A data lake gives your organization agility. It provides a repository where consumers can quickly find the data they need and use it in their business projects. However, building a data lake can be complex; there’s...
Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text
transformer model, trained following a similar recipe as T5.
This repo can be used to reproduce the experiments in the [mT5 paper][paper].
Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright delivers automation that is ever-green, capable, reliable and fast. See how Playwright is better.
Linux
macOS
Windows
Chromium 86.0.4238.0
✅
✅
✅
WebKit 14.0
✅
✅
✅
Firefox 80.0b8
✅
✅
✅
Headless execution is supported for all browsers on all platforms.
D-Tale is the combination of a Flask back-end and a React front-end to bring you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. Currently this tool supports such Pandas objects as DataFrame, Series, MultiIndex, DatetimeIndex & RangeIndex.
All datasets in this repository are released under the CC BY 4.0 International
license, which can be found here: https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this
repository are released under the Apache 2.0 license, the text of which can be
found in the LICENSE file.
👉This repository contains implementations and illustrative code to accompany DeepMind publications
😎TOPICS: ``
⭐️STARS:2521, 今日上升数↑:11
👉README:
DeepMind Research
This repository contains implementations and illustrative code to accompany
DeepMind publications. Along with publishing papers to accompany research
conducted at DeepMind, we release open-source environments, data sets,
and code to
enable the broader research community to engage with our work and build upon it,
with the ultimate goal of accelerating scientific progress to benefit society.
For example, you can build on our implementations of the Deep Q-Network or Differential Neural Computer, or experiment
in the same environments we use for our research, such as DeepMind Lab or StarCraft II.
If you enjoy building tools, environments, software librar...
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.
In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.
The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more experti...
👉A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Note: If you are looking for the first edition notebooks, check out ageron/handson-ml.
Quick Start
Want to play with these notebooks online without having to install anything?
Use any of the following services.
WARNING: Please be aware that these services provide temporary environments: anything you do will be deleted after a while, so make sure you download any data you care about.
Python随身听-2020-10-30-技术精选
🤩Python随身听-技术精选: /MrS0m30n3/youtube-dl-gui
👉A cross platform front-end GUI of the popular youtube-dl written in wxPython.
😎TOPICS:
youtube-dl,python,wxpython,gui,cross-platform,youtube-dlg,downloader,video,video-downloader,linux,windows,youtube-dl-gui,pypi,youtube
⭐️STARS:5585, 今日上升数↑:221
👉README:
youtube-dlG
A cross platform front-end GUI of the popular youtube-dl media downloader written in wxPython. Supported sites
Screenshots
Requirements
Downloads
地址:https://github.com/MrS0m30n3/youtube-dl-gui
🤩Python随身听-技术精选: /lrvick/youtube-dl
👉RIAA: Please go die in a fire.
😎TOPICS: ``
⭐️STARS:481, 今日上升数↑:184
👉README:
youtube-dl - download videos from youtube.com or other video platforms
INSTALLATION
To install it right away for all UNIX users (Linux, macOS, etc.), type:
If you do not have curl, you can alternatively use a recent wget:
Windows users can download an .exe file and place it in any location on their [PATH](...
地址:https://github.com/lrvick/youtube-dl
🤩Python随身听-技术精选: /donnemartin/system-design-primer
👉Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
😎TOPICS:
programming,development,design,design-system,system,design-patterns,web,web-application,webapp,python,interview,interview-questions,interview-practice
⭐️STARS:110438, 今日上升数↑:159
👉README:
*English ∙ 日本語 ∙ 简体中文 ∙ 繁體中文 | العَرَبِيَّة ∙ বাংলা ∙ Português do Brasil ∙ Deutsch ∙ ελληνικά ∙ עברית ∙ Italiano ∙ 한국어 ∙ فارسی ∙ Polski ∙ русский язык ∙ Español ∙ [...
地址:https://github.com/donnemartin/system-design-primer
🤩Python随身听-技术精选: /l1ving/youtube-dl
👉A copyright-respecting fork of youtube-dl
😎TOPICS: ``
⭐️STARS:1262, 今日上升数↑:196
👉README:
youtube-dl - download videos from youtube.com or other video platforms
CHANGES
You can view the changes made to ytdl-org/youtube-dl here
You can view the archived tags here: youtube-dl/releases
You can view the archived unmerged pull requests here: youtube-dl/tree/archive/recovered-github-prs
INSTALLATION
To install it right away for all UNIX users (Linux, macOS, etc.), typ...
地址:https://github.com/l1ving/youtube-dl
🤩Python随身听-技术精选: /youtube-dl2/youtube-dl
👉Repository with the code of youtube-dl
😎TOPICS:
youtube-dl,dmca-takedown
⭐️STARS:460, 今日上升数↑:87
👉README:
youtube-dl - download videos from youtube.com or other video platforms
INSTALLATION
To install it right away for all UNIX users (Linux, macOS, etc.), type:
If you do not have curl, you can alternatively use a recent wget:
Windows users can download an .exe file and place it in any location on their [PATH](...
地址:https://github.com/youtube-dl2/youtube-dl
🤩Python随身听-技术精选: /alexa/bort
👉Repository for the paper "Optimal Subarchitecture Extraction for BERT"
😎TOPICS: ``
⭐️STARS:243, 今日上升数↑:95
👉README:
Bort
Companion code for the paper "Optimal Subarchitecture Extraction for BERT."
Bort is an optimal subset of architectural parameters for the BERT architecture, extracted by applying a fully polynomial-time approximation scheme (FPTAS) for neural architecture search. Bort has an effective (that is, not counting the embedding layer) size of 5.5% the original BERT-large architecture, and 16% of the net size. It is also able to be pretrained in 288 GPU hours, which is 1.2% of the time required to pretrain the highest-performing BERT parametric architectural variant, RoBERTa-large.
It is also 7.9x faster on a CPU, and performs better than other compressed variants of the architecture, and some of the non-compressed variants; it obtains an average performance improvement of between 0.3% and 31%, absolute with respect to BERT-large on multiple public natural language understanding (NLU) benchmarks.
Here are the corresponding GLUE scores on the test set:
|Model|Score|CoLA|SST-2|MRPC|STS-B|QQP|MNLI...
地址:https://github.com/alexa/bort
🤩Python随身听-技术精选: /EssayKillerBrain/EssayKiller_V2
👉基于开源GPT2.0的初代创作型人工智能 | 可扩展、可进化
😎TOPICS: ``
⭐️STARS:1119, 今日上升数↑:59
👉README:
EssayKiller
通用型议论文创作人工智能框架,仅限交流与科普。
Bilibili视频地址:https://www.bilibili.com/video/BV1pr4y1w7uM/
项目简介
EssayKiller是基于OCR、NLP领域的最新模型所构建的生成式文本创作AI框架,目前第一版finetune模型针对高考作文(主要是议论文),可以有效生成符合人类认知的文章,多数文章经过测试可以达到正常高中生及格作文水平。
致谢
感谢开源作者@imcaspar 提供GPT-2中文预训练框架与数据支持。
感谢@白小鱼博士 、@YJango博士 、@画渣花小烙、@万物拣史 、@柴知道、@风羽酱-sdk、@WhatOnEarth、@这知识好冷、[@科技狐](https://space.bilibili.com/404334...
地址:https://github.com/EssayKillerBrain/EssayKiller_V2
🤩Python随身听-技术精选: /python/cpython
👉The Python programming language
😎TOPICS: ``
⭐️STARS:34235, 今日上升数↑:20
👉README:
This is Python version 3.10.0 alpha 1
.. image:: https://travis-ci.com/python/cpython.svg?branch=master
:alt: CPython build status on Travis CI
:target: https://travis-ci.com/python/cpython
.. image:: https://github.com/python/cpython/workflows/Tests/badge.svg
:alt: CPython build status on GitHub Actions
:target: https://github.com/python/cpython/actions
.. image:: https://dev.azure.com/python/cpython/_apis/build/status/Azure%20Pipelines%20CI?branchName=master
:alt: CPython build status on Azure DevOps
:target: https://dev.azure.com/python/cpython/_build/latest?definitionId=4&branchName=master
.. image:: https://codecov.io/gh/python/cpython/branch/master/graph/badge.svg
:alt: CPython code coverage on Codecov
:target: https://codecov.io/gh/python/cpython
.. image:: https://img.shields.io/badge/zulip-join_chat-brightgreen.svg
:alt: Python Zulip chat
:target: https://python.zulipchat.com
Copyright (c) 2001-2020 Python Software Foundation. All rights reserved.
See the end of...
地址:https://github.com/python/cpython
🤩Python随身听-技术精选: /corkami/mitra
👉A generator of binary polyglots
😎TOPICS: ``
⭐️STARS:350, 今日上升数↑:24
👉README:
Mitra
A tool to generate binary polyglots
(files that are valid with several file formats).
Loosely named after Μιθραδάτης,
a famous polyglot.
Pronounced
mɪtrə
.What's new.
How to use
mitra.py file1.png file2.dcm
gives you a working PNG/DICOM polyglot.Check Corkami mini
or tiny PoCs for input files.
and the formats repository for some extra technical info.
Features
It tries different layouts:
Stacks (appended data), Cavities (blank space), Parasites (comments), Zippers (mutual comments).
It returns the offsets where the payloads 'switch sizes' for multi-ciphertexts.
Ex:
Z(80-162-286)-DICOM^TIFF.be3b767b.dcm.tif
is a DICOM/TIFF zipperwhere the payloads switch side at offsets
0x80
,0x162
and0x286
.The
-s
option extracts the 2 payloads separately, mixed with...地址:https://github.com/corkami/mitra
🤩Python随身听-技术精选: /chubin/cheat.sh
👉the only cheat sheet you need
😎TOPICS:
cheatsheet,curl,terminal,command-line,cli,examples,documentation,help,tldr
⭐️STARS:19591, 今日上升数↑:30
👉README:
Unified access to the best community driven cheat sheets repositories of the world.
Let's imagine for a moment that there is such a thing as an ideal cheat sheet.
What should it look like?
What features should it have?
Such a thing exists.
Features
cheat.sh
地址:https://github.com/chubin/cheat.sh
🤩Python随身听-技术精选: /sherlock-project/sherlock
👉🔎 Hunt down social media accounts by username across social networks
😎TOPICS:
osint,reconnaissance,linux,macos,cli,sherlock,python3,windows,redteam,tools,information-gathering
⭐️STARS:16512, 今日上升数↑:110
👉README:
Hunt down social media accounts by username across social networks
地址:https://github.com/sherlock-project/sherlock
🤩Python随身听-技术精选: /lucidrains/performer-pytorch
👉An implementation of Performer, a linear attention-based transformer, in Pytorch
😎TOPICS:
artificial-intelligence,deep-learning,attention-mechanism,attention,transformers
⭐️STARS:142, 今日上升数↑:18
👉README:
Performer - Pytorch
An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random features approach (FAVOR+).
Install
$ pip install performer-pytorch
Usage
Performer Language Model
import torch
from performer_pytorch import PerformerLM
model = PerformerLM(
num_tokens = 20000,
max_seq_len = 2048, # max sequence length
dim = 512, # dimension
depth = 6, # layers
heads = 8, # heads
causal = False, # auto-regressive or not
nb_features = 256, # number of random features, if not set, will default to (d * log(d)), where d is the dimension of each head
generalized_attention = False, # defaults to softmax approximation, but can be set to True for generalized attention
kernel_fn = nn....
地址:https://github.com/lucidrains/performer-pytorch
🤩Python随身听-技术精选: /soimort/you-get
👉:arrow_double_down: Dumb downloader that scrapes the web
😎TOPICS: ``
⭐️STARS:36083, 今日上升数↑:175
👉README:
You-Get
NOTICE: Read this if you are looking for the conventional "Issues" tab.
You-Get is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
Here's how you use
you-get
to download a video from YouTube:$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
stream:
- itag: 43
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
# download-with: you-get --itag=43 [URL]
Downloading Me at the zoo.webm ...
100% ( 0.5/ 0.5MB) ├██████████████████████████████████┤[1/1] 6 MB/s
Saving Me at the zoo.en.srt ... Done.
And here's why you might want to use it:
地址:https://github.com/soimort/you-get
🤩Python随身听-技术精选: /awslabs/aws-serverless-data-lake-framework
👉Enterprise-grade, production-hardened, serverless data lake on AWS
😎TOPICS:
serverless,framework,data-lake,analytics,aws,etl,data-engineering,lake-formation
⭐️STARS:80, 今日上升数↑:13
👉README:
Serverless Data Lake Framework (SDLF)
An AWS Professional Service open source initiative | aws-proserve-opensource@amazon.com
The Serverless Data Lake Framework (SDLF) is a collection of reusable artifacts aimed at accelerating the delivery of enterprise data lakes on AWS, shortening the deployment time to production from several months to a few weeks. It can be used by AWS teams, partners and customers to implement the foundational structure of a data lake following best practices. It is used in production by more than thirty large organizations, including public references such as Embraer, Formula One, Hudl, and David Jones.
Public References
Motivation
A data lake gives your organization agility. It provides a repository where consumers can quickly find the data they need and use it in their business projects. However, building a data lake can be complex; there’s...
地址:https://github.com/awslabs/aws-serverless-data-lake-framework
🤩Python随身听-技术精选: /numpy/numpy
👉The fundamental package for scientific computing with Python.
😎TOPICS:
numpy,python
⭐️STARS:15244, 今日上升数↑:11
👉README:
NumPy is the fundamental package needed for scientific computing with Python.
It provides:
地址:https://github.com/numpy/numpy
🤩Python随身听-技术精选: /google-research/multilingual-t5
👉None
😎TOPICS: ``
⭐️STARS:330, 今日上升数↑:28
👉README:
mT5: Multilingual T5
Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text
transformer model, trained following a similar recipe as
T5.
This repo can be used to reproduce the experiments in the [mT5 paper][paper].
Table of Contents
Languages covered
mT5 is pretrained on the mC4 corpus, covering 101 languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque,
Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese,
Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino,
Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole,
...
地址:https://github.com/google-research/multilingual-t5
🤩Python随身听-技术精选: /microsoft/playwright-python
👉Python version of the Playwright testing and automation library.
😎TOPICS:
playwright
⭐️STARS:1245, 今日上升数↑:36
👉README:
Docs | Website | Python API reference
Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright delivers automation that is ever-green, capable, reliable and fast. See how Playwright is better.
Headless execution is supported for all browsers on all platforms.
地址:https://github.com/microsoft/playwright-python
🤩Python随身听-技术精选: /man-group/dtale
👉Visualizer for pandas data structures
😎TOPICS:
python27,python3,react,flask,pandas,ipython,jupyter-notebook,react-virtualized,data-analysis,data-visualization,visualization,plotly-dash,data-science,xarray
⭐️STARS:1582, 今日上升数↑:19
👉README:
What is it?
D-Tale is the combination of a Flask back-end and a React front-end to bring you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. Currently this tool supports such Pandas objects as DataFrame, Series, MultiIndex, DatetimeIndex & RangeIndex.
Origins
D-Tale was the product of a SAS to Pyth...
地址:https://github.com/man-group/dtale
🤩Python随身听-技术精选: /scastillo/not-youtube-dl
👉This is not youtube-dl
😎TOPICS: ``
⭐️STARS:754, 今日上升数↑:61
👉README:
this is not youtube-dl - it does not download videos from youtube.com or other video platforms
INSTALLATION
To install it right away for all UNIX users (Linux, macOS, etc.), type:
If you do not have curl, you can alternatively use a recent wget:
Windows users can download an .exe file and place it in any l...
地址:https://github.com/scastillo/not-youtube-dl
🤩Python随身听-技术精选: /google-research/google-research
👉Google Research
😎TOPICS:
machine-learning,ai,research
⭐️STARS:13721, 今日上升数↑:226
👉README:
Google Research
This repository contains code released by
Google Research.
All datasets in this repository are released under the CC BY 4.0 International
license, which can be found here:
https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this
repository are released under the Apache 2.0 license, the text of which can be
found in the LICENSE file.
Because the re...
地址:https://github.com/google-research/google-research
🤩Python随身听-技术精选: /deepmind/deepmind-research
👉This repository contains implementations and illustrative code to accompany DeepMind publications
😎TOPICS: ``
⭐️STARS:2521, 今日上升数↑:11
👉README:
DeepMind Research
This repository contains implementations and illustrative code to accompany
DeepMind publications. Along with publishing papers to accompany research
conducted at DeepMind, we release open-source
environments,
data sets,
and code to
enable the broader research community to engage with our work and build upon it,
with the ultimate goal of accelerating scientific progress to benefit society.
For example, you can build on our implementations of the
Deep Q-Network or
Differential Neural Computer, or experiment
in the same environments we use for our research, such as
DeepMind Lab or
StarCraft II.
If you enjoy building tools, environments, software librar...
地址:https://github.com/deepmind/deepmind-research
🤩Python随身听-技术精选: /Pierian-Data/Complete-Python-3-Bootcamp
👉Course Files for Complete Python 3 Bootcamp Course on Udemy
😎TOPICS: ``
⭐️STARS:12650, 今日上升数↑:14
👉README:
Complete-Python-3-Bootcamp
Course Files for Complete Python 3 Bootcamp Course on Udemy
Get it now for ...
地址:https://github.com/Pierian-Data/Complete-Python-3-Bootcamp
🤩Python随身听-技术精选: /pycaret/pycaret
👉An open-source, low-code machine learning library in Python
😎TOPICS:
data-science,citizen-data-scientists,python,machine-learning,pycaret,ml
⭐️STARS:2444, 今日上升数↑:13
👉README:
PyCaret 2.2
What is PyCaret?
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.
In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.
The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more experti...
地址:https://github.com/pycaret/pycaret
🤩Python随身听-技术精选: /ageron/handson-ml2
👉A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
😎TOPICS: ``
⭐️STARS:11066, 今日上升数↑:20
👉README:
Machine Learning Notebooks
This project aims at teaching you the fundamentals of Machine Learning in
python. It contains the example code and solutions to the exercises in the second edition of my O'Reilly book Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow:
Note: If you are looking for the first edition notebooks, check out ageron/handson-ml.
Quick Start
Want to play with these notebooks online without having to install anything?
Use any of the following services.
WARNING: Please be aware that these services provide temporary environments: anything you do will be deleted after a while, so make sure you download any data you care about.
地址:https://github.com/ageron/handson-ml2
The text was updated successfully, but these errors were encountered: