Hadoop Distributed File System
(HDFS) is a Java-based file system that provides scalable and reliable data
storage that is designed to span large clusters of commodity servers. HDFS was
designed to be a scalable, fault-tolerant, distributed storage system that
works closely with MapReduce.
HDFS has a master/slave
architecture. An HDFS cluster consists of a single Name Node (a master server
that manages the file system namespace and regulates access to files by clients)
and a number of Data Nodes. HDFS exposes a file system namespace and allows
user data to be stored in files. Internally, a file is split into one or more
blocks (typically 64Mb in size) and these blocks are stored in a set of Data Nodes.
The Name Node executes file
system namespace operations like opening, closing, and renaming files and
directories. It also determines the mapping of blocks to Data Nodes. The Data Nodes
are responsible for serving read and write requests from the file system’s
clients. The Data Nodes also perform block creation, deletion, and replication
upon instruction from the Name Node.
HDFS can be managed via console application (the ‘hadoop fs’ command) with a set of Linux-like
commands:
Command |
Command
description
|
Command
syntax
|
mkdir |
Create directory
in the given path
|
hadoop fs -mkdir
<paths> |
ls |
List the file
for the given path
|
hadoop fs -ls
<args> |
lsr |
Recursive
version of ls. Similar to Unix ls -R.
|
hadoop fs -lsr
<args> |
touchz |
Creates file in
the given path
|
hadoop fs -touchz
<path[filename]> |
cat |
Copies source
paths to STDOUT
|
hadoop fs -cat
<path[filename]> |
cp |
Copy files from
source to destination. In case of multiple sources the destination must be a
directory
|
hadoop fs -cp
<source> <dest> |
put |
Copy single item,
or multiple items from local file system to the destination file system. Also reads input from STDIN and writes to
destination file system
|
hadoop fs -put
<source:localFile> <destination> |
get |
Copy files to
the local file system
|
hadoop fs -get <source> <dest:localFileSystem> |
copyFromLocal |
Similar to put
except that the source is limited to local files
|
hadoop fs
-copyFromLocal <src:localFileSystem> <dest:Hdfs> |
copyToLocal |
Similar to get
except that the destination is limited to local files
|
hadoop fs -copyToLocal
<src:Hdfs> <dest:localFileSystem> |
mv |
Move
file from source to destination.
Except moving files across file system
is not permitted
|
hadoop fs -mv
<src> <dest> |
rm |
Remove
files specified as argument.
Deletes directory only when it is
empty
|
hadoop fs -rm
<arg> |
rmr |
Recursive
version of delete
|
hadoop fs -rmr
<arg> |
stat |
Returns the stat
information on the path
|
hadoop fs -stat
<path> hadoop fs -stat /user/akceptor/dir1 |
tail |
Similar to tail
in Unix Command
|
hadoop fs -tail
<path[filename]> |
test |
-e
check to see if the file exists. Return 0 if true.
-z
check to see if the file is zero length. Return 0 if true
-d
check return 1 if the path is directory else return 0
|
hadoop fs -test
-[ezd]<path> |
text |
Outputs the file
in text format. The allowed formats are zip and TextRecordInputStream
|
hadoop fs -text
<src> |
du |
Display the
aggregate length of a file
|
hadoop fs -du
<path> |
dus |
Displays the
summary of a file length
|
hadoop fs -dus
<args> |
expung |
Empty the trash
|
hadoop fs -expunge |
chgrp |
Change group
association of files. With -R, make the change recursively.
The user must be the owner of files, or a super-user
|
hadoop fs -chgrp [-R]
GROUP <path> |
chmod |
Change the
permissions of files. With -R, make the change recursively.
The user must be the owner of the file, or a
super-user
|
hadoop fs -chmod [-R]
<MODE[,MODE] | OCTALMODE>
<path> |
chown |
Change the owner
of files. With -R, make the change recursively.
The user must be a super-user
|
hadoop fs -chown [-R]
[OWNER][:[GROUP]] <path> |
getmerge |
Takes a source
directory and a destination file as input and concatenates files in src dir
into the destination local file.
Optionally addnl can be set to enable adding a
newline character at the end of each file
|
hadoop fs -getmerge
<src> <localdst> [addnl] |
setrep |
Changes the
replication factor of a file. With -R, make the change recursively.
|
hadoop fs -setrep [-R]
<path> |
Videos to understand what HDFS is:
Introduction to HDFS and live demo
HDFS tutorial video
- HortonWorks HDFS introduction: http://hortonworks.com/hadoop/hdfs/
- HortonWorks HDFS tutorial: http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#usingHDP
- Apache HDFS design: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Introduction
- HortonWorks HDFS CLI tutorial: http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/
- Official documentation: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
0 коментарі:
Дописати коментар