Hadoop Distributed File System (HDFS) is a Java-based file system that provides scalable and reliable data storage that is designed to span large clusters of commodity servers. HDFS was designed to be a scalable, fault-tolerant, distributed storage system that works closely with MapReduce.
HDFS has a master/slave architecture. An HDFS cluster consists of a single Name Node (a master server that manages the file system namespace and regulates access to files by clients) and a number of Data Nodes. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks (typically 64Mb in size) and these blocks are stored in a set of Data Nodes.
The Name Node executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to Data Nodes. The Data Nodes are responsible for serving read and write requests from the file system’s clients. The Data Nodes also perform block creation, deletion, and replication upon instruction from the Name Node.
HDFS can be managed via console application (the ‘hadoop fs’ command) with a set of Linux-like commands:
Create directory in the given path
|hadoop fs -mkdir <paths>|
List the file for the given path
|hadoop fs -ls <args>|
Recursive version of ls. Similar to Unix ls -R.
|hadoop fs -lsr <args>|
Creates file in the given path
|hadoop fs -touchz <path[filename]>|
Copies source paths to STDOUT
|hadoop fs -cat <path[filename]>|
Copy files from source to destination. In case of multiple sources the destination must be a directory
|hadoop fs -cp <source> <dest>|
Copy single item, or multiple items from local file system to the destination file system. Also reads input from STDIN and writes to destination file system
|hadoop fs -put <source:localFile> <destination>|
Copy files to the local file system
|hadoop fs -get <source> <dest:localFileSystem>|
Similar to put except that the source is limited to local files
|hadoop fs -copyFromLocal <src:localFileSystem> <dest:Hdfs>|
Similar to get except that the destination is limited to local files
|hadoop fs -copyToLocal <src:Hdfs> <dest:localFileSystem>|
Move file from source to destination.
Except moving files across file system is not permitted
|hadoop fs -mv <src> <dest>|
Remove files specified as argument.
Deletes directory only when it is empty
|hadoop fs -rm <arg>|
Recursive version of delete
|hadoop fs -rmr <arg>|
Returns the stat information on the path
|hadoop fs -stat <path> hadoop fs -stat /user/akceptor/dir1|
Similar to tail in Unix Command
|hadoop fs -tail <path[filename]>|
-e check to see if the file exists. Return 0 if true.
-z check to see if the file is zero length. Return 0 if true
-d check return 1 if the path is directory else return 0
|hadoop fs -test -[ezd]<path>|
Outputs the file in text format. The allowed formats are zip and TextRecordInputStream
|hadoop fs -text <src>|
Display the aggregate length of a file
|hadoop fs -du <path>|
Displays the summary of a file length
|hadoop fs -dus <args>|
Empty the trash
|hadoop fs -expunge|
Change group association of files. With -R, make the change recursively.
The user must be the owner of files, or a super-user
|hadoop fs -chgrp [-R] GROUP <path>|
Change the permissions of files. With -R, make the change recursively.
The user must be the owner of the file, or a super-user
|hadoop fs -chmod [-R] <MODE[,MODE] | OCTALMODE> <path>|
Change the owner of files. With -R, make the change recursively.
The user must be a super-user
|hadoop fs -chown [-R] [OWNER][:[GROUP]] <path>|
Takes a source directory and a destination file as input and concatenates files in src dir into the destination local file.
Optionally addnl can be set to enable adding a newline character at the end of each file
|hadoop fs -getmerge <src> <localdst> [addnl]|
Changes the replication factor of a file. With -R, make the change recursively.
|hadoop fs -setrep [-R] <path>|
Videos to understand what HDFS is:
Introduction to HDFS and live demo
HDFS tutorial video
- HortonWorks HDFS introduction: http://hortonworks.com/hadoop/hdfs/
- HortonWorks HDFS tutorial: http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#usingHDP
- Apache HDFS design: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Introduction
- HortonWorks HDFS CLI tutorial: http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/
- Official documentation: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html