Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS) is a Java-based file system that provides scalable and reliable data storage that is designed to span large clusters of commodity servers. HDFS was designed to be a scalable, fault-tolerant, distributed storage system that works closely with MapReduce.

HDFS has a master/slave architecture. An HDFS cluster consists of a single Name Node (a master server that manages the file system namespace and regulates access to files by clients) and a number of Data Nodes. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks (typically 64Mb in size) and these blocks are stored in a set of Data Nodes.

The Name Node executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to Data Nodes. The Data Nodes are responsible for serving read and write requests from the file system’s clients. The Data Nodes also perform block creation, deletion, and replication upon instruction from the Name Node.

HDFS can be managed via console application (the ‘hadoop fs’ command) with a set of Linux-like commands:

Command	Command description	Command syntax
mkdir	Create directory in the given path	hadoop fs -mkdir <paths>
ls	List the file for the given path	hadoop fs -ls <args>
lsr	Recursive version of ls. Similar to Unix ls -R.	hadoop fs -lsr <args>
touchz	Creates file in the given path	hadoop fs -touchz <path[filename]>
cat	Copies source paths to STDOUT	hadoop fs -cat <path[filename]>
cp	Copy files from source to destination. In case of multiple sources the destination must be a directory	hadoop fs -cp <source> <dest>
put	Copy single item, or multiple items from local file system to the destination file system. Also reads input from STDIN and writes to destination file system	hadoop fs -put <source:localFile> <destination>
get	Copy files to the local file system	hadoop fs -get <source> <dest:localFileSystem>
copyFromLocal	Similar to put except that the source is limited to local files	hadoop fs -copyFromLocal <src:localFileSystem> <dest:Hdfs>
copyToLocal	Similar to get except that the destination is limited to local files	hadoop fs -copyToLocal <src:Hdfs> <dest:localFileSystem>
mv	Move file from source to destination. Except moving files across file system is not permitted	hadoop fs -mv <src> <dest>
rm	Remove files specified as argument. Deletes directory only when it is empty	hadoop fs -rm <arg>
rmr	Recursive version of delete	hadoop fs -rmr <arg>
stat	Returns the stat information on the path	hadoop fs -stat <path> hadoop fs -stat /user/akceptor/dir1
tail	Similar to tail in Unix Command	hadoop fs -tail <path[filename]>
test	-e check to see if the file exists. Return 0 if true. -z check to see if the file is zero length. Return 0 if true -d check return 1 if the path is directory else return 0	hadoop fs -test -[ezd]<path>
text	Outputs the file in text format. The allowed formats are zip and TextRecordInputStream	hadoop fs -text <src>
du	Display the aggregate length of a file	hadoop fs -du <path>
dus	Displays the summary of a file length	hadoop fs -dus <args>
expung	Empty the trash	hadoop fs -expunge
chgrp	Change group association of files. With -R, make the change recursively. The user must be the owner of files, or a super-user	hadoop fs -chgrp [-R] GROUP <path>
chmod	Change the permissions of files. With -R, make the change recursively. The user must be the owner of the file, or a super-user	hadoop fs -chmod [-R] <MODE[,MODE] \| OCTALMODE> <path>
chown	Change the owner of files. With -R, make the change recursively. The user must be a super-user	hadoop fs -chown [-R] [OWNER][:[GROUP]] <path>
getmerge	Takes a source directory and a destination file as input and concatenates files in src dir into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file	hadoop fs -getmerge <src> <localdst> [addnl]
setrep	Changes the replication factor of a file. With -R, make the change recursively.	hadoop fs -setrep [-R] <path>

Videos to understand what HDFS is:

Introduction to HDFS and live demo

HDFS tutorial video

Hadoop Distributed File System (HDFS)

0 коментарі:

Popular Posts

Recent Comments

Arsip Blog