Uploaded by wefak72569

Наборы данных, датасеты

advertisement
Какая задача решается
№
1
Тема
Physical Unclonable
Functions Data Set
https://archive.ics.uci.edu/
ml/datasets/Physical+Unclo
nable+Functions
2
Kitsune Network Attack
Dataset Data Set
https://archive.ics.uci.edu/
ml/datasets/Kitsune+Netwo
rk+Attack+Dataset
3
NYC Parking Tickets
https://www.kaggle.com/ne
w-york-city/nyc-parkingtickets
4
N-BaIoT Dataset to Detect
IoT Botnet Attacks
https://www.kaggle.com/mk
ashifn/nbaiotdataset#1.benign.csv
5
DDoS Dataset
https://www.kaggle.com/de
vendra416/ddosdatasets#unbalaced_20_80
_dataset.csv
The dataset is generated from
Physical Unclonable Functions
(PUFs) simulation, specifically XOR
Arbiter PUFs. PUFs are used for
authentication purposes.
A cybersecurity dataset containing
nine different network attacks on a
commercial IP-based surveillance
system and an IoT network. The
dataset includes reconnaissance,
MitM, DoS, and botnet attacks.
The NYC Department of Finance
collects data on every parking ticket
issued in NYC (~10M per year!).
This data is made publicly available
to aid in ticket resolution and to
guide policymakers.
When are tickets most likely to be
issued? Any seasonality?
Where are tickets most commonly
issued?
What are the most common years
and types of cars to be ticketed?
This dataset addresses the lack of
public botnet datasets, especially
for the IoT. It suggests real traffic
data, gathered from 9 commercial
IoT devices authentically infected
by Mirai and BASHLITE.
DDoS Balanced & Unbalanced
Datasets.
There are no latest data sets found
exclusively for DDoS in the Public
domain, though IDS data sets
available. So, I have extracted
DDoS flows from other public IDS
datasets {CSE-CIC-IDS2018-AWS,
CICIDS2017, CIC DoS data set
Число
строк
>1 млн
Число
столбцов
129
Объем
Характеристики
>1GB
Атрибутивные
Характеристики
Integer
Связанные
задачи
Classification
Real
Classification,
Clustering, CausalDiscovery
Real
Classification,
Clustering
Multivariate
>1 млн
115
>1GB
Multivariate,
Sequential, TimeSeries
>1 млн
51
>1GB
>1 млн
115
>1GB
Multivariate,
Sequential
>1 млн
85
>1GB
Best FE on clean and
filtered d
6
https://www.kaggle.com/ica
rofreire/best-filter-andfeatureengineering#final_tr
ain2.csv ata
Dados_Brasil
7
8
9
10
https://www.kaggle.com/ca
mposfabio/dadosbrasil#Educacao_Basica_2
018%20%20Docentes_Sudeste.csv
KASANDR Data Set
http://archive.ics.uci.edu/ml
/datasets/KASANDR
DeepSat (SAT-4) Airborne
Dataset
https://www.kaggle.com/cra
wford/deepsatsat4?select=X_test_sat4.cs
v
Complete 2017 Program
Year Open Payments
(2016)}.
To
introduce
more
variance, DDOS data is extracted
from different IDS datasets which
were produced in different years
and different experimental DDoS
traffic
generation
tools.
The
extracted
DDOS
flows
are
combined with "Benign " flows
which are extracted separately from
the same base dataset and made
into a single largest dataset.
The two CSV files here are the train
and test data in Kaggle's Ion
Switching Competition with drift
removed and filter with Kalman filter
to reduce noise.
>1 млн
80
>1GB
This is a data set with information
on basic education in Brazil in
2018.
>1 млн
132
>1GB
KASANDR is a novel, publicly
available
collection
for
recommendation
systems
that
records the behavior of customers
of the European leader in eCommerce advertising, Kelkoo.
500,000 image patches covering
four broad land cover classes
>1 млн
2158859
>1GB
>1 млн
3136
>1GB
A complete set of all data from the
2017 Program Year, which includes
>1 млн
75
>1GB
Integer
Multivariate
Causal-Discovery
Dataset
https://www.cms.gov/Open
Payments/Explore-theData/Dataset-Downloads
PAMAP2 Physical Activity
Monitoring Data Set
11
12
13
14
15
https://archive.ics.uci.edu/
ml/datasets/PAMAP2+Phys
ical+Activity+Monitoring
Los Angeles Building and
Safety Permits
https://www.kaggle.com/cit
yofLA/los-angeles-buildingand-safety-permits
Predict Outcome of
Pregnancy
https://www.kaggle.com/raj
anand/ahs-woman-1
SIFT10M Data Set
https://archive.ics.uci.edu/
ml/datasets/SIFT10M
Human Activity Recognition
from Continuous Ambient
Sensor Data Data Set
https://archive.ics.uci.edu/
ml/datasets/Human+Activit
y+Recognition+from+Conti
nuous+Ambient+Sensor+D
ata
data reported about payments made
from January 1 through December
31, 2017.
The PAMAP2 Physical Activity
Monitoring dataset contains data of
18 different physical activities,
performed by 9 subjects wearing 3
inertial measurement units and a
heart rate monitor.
This is a dataset hosted by the city
of Los Angeles.
>1 млн
>1 млн
65
<1GB
This dataset contains data on
Annual Health Survey.
Is it possible to predict the
pregnancy outcome (live birth/still
birth/abortion)?
>1 млн
201
<1GB
In SIFT10M, each data point is a
SIFT feature which is extracted
from Caltech-256 by the open
source
VLFeat
library.
The
corresponding patches of the SIFT
features are provided.
This dataset represents ambient
data collected in homes with
volunteer residents. Data are
collected
continuously
while
residents perform their normal
routines.
>1 млн
128
<1GB
52
<1GB
Real
Classification
Integer
Causal-Discovery
Integer, Real
Classification
Multivariate, TimeSeries
Multivariate
>1 млн
37
>1GB
Multivariate,
Sequential, TimeSeries
Download