Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Saanvi Gupta N243
Pashmaa Singh N265
Pooja Joshi N269
Facebook systems initially process "Big Data," from simple reporting and business intelligence to massive measurements and reports from multiple perspectives.
Facebook was inspired by Hadoop and Hive systems and developed on top of them to analyze enormous amounts of data in geographically scattered datacenters using highly equipped machines.
Recently, cloud computing solutions have been combined with Facebook's globally dispersed infrastructure to provide its features and services.
Facebook is a commercial Online Social Network (OSN) and hosted software that attracts users with features and marketers who pay to display targeted ads.
OSN provides synchronous and asynchronous communication of user-generated content including text, multimedia from audio, video, and third-party OSN application updates through a social graph..
Wall: a user's original profile space where they can upload photographs, videos, and files.
News Feed: users can see a running list of their friends' activity on their home page. Users can view profile updates, events, and friends' wall talks.
Timeline: a place to organize photographs, videos, posts, and content by upload or creation time.
Friendship: Facebook is based on "friending" other users. User-controlled friend lists.
Notifications: tracking recent events. It alerts users of actions on their profile page, wall, timeline, comments, likes, and shared media.
Networks, groups, and pages: Facebook lets users construct networks, groups, and pages around an idea or community.
Messaging and outbox: a service allows users to send messages to each other. Users can send a message to any number of friends at a time.
Facebook, the online social network (OSN), relies on internationally spread datacenters that are extremely dependent on centralized U.S. datacenters for scalability, availability, openness, reliability, and security.
the system design, which is 3 or 4 tiers, where data flows from client requests and is served by the following steps:
Scalability and reliability are required due to the globalization of the system. Facebook is a global OSN that serves billions of requests and must respond in seconds without being late. These requirements require scalability in size, geographical scalability.
Facebook uses Hadoop for data discovery, unstructured text, logs, and events, and structured data. It was designed to handle vast amounts of data, therefore preparing and processing it should be cost prohibitive.
Master and worker nodes with specific software make up a typical Hadoop setup. Many master nodes in Hadoop prevent single points of failure in any setting.
Job Tracker: Interacts with client applications. It assigns cluster nodes Map and reduction jobs.
Task tracker: it gets tasks from a job tracker in the master node like Map, Reduce to a cluster node, and Shuffle.
Name node (NN): they track each file in Hadoop Distributed File System HDFS, and client applications call NN to locate, remove, copy, or add.
Data Node (DN): stores HDFS files, maintains indexes, and interacts with client applications and the NN.
Worker Nodes: slave servers that handle jobs using DN and task trackers.
Facebook users receive HTML answers from browsers after establishing a TCP connection. Consider traffic producers and Facebook's California datacenters.
The bandwidth and latency measured from outside U.S. consumers and these spread datacenters will be unsafe and detrimental, therefore the decision maker should evaluate numerous choices to preserve network dependability, system availability, and protect the system from network bottlenecks.
Facebook servers and CDN handled the items. Russian, Egyptian, Swedish, and UK CDNs are widespread.
Although CDN regional servers posed an attractive solution for infrastructure expansion, other solutions mentioned here will serve a good support for the huge growth and datacenter extensions: TCP proxies and regional OSN caching servers would be attractive solutions to enhance network performance and reduce latency, but these solutions are under tacking and are not yet applied, which causes slow performance.
In figure C.1, a user contacts U.S. webservers, and the CDN must stay connected for more than four steps to fulfill user requests.
In TCP proxies, figure C.2 , user can be served totally by contacting his regional server, sometimes there is a need to establish the connection from the original servers and being completed by their CDN, while in OSN cache regional servers in figure C.3, the requests are being served totally by them, sometimes a little bit need to be asking the original servers, these solutions will help Facebook avoid bad performance.
RPCs exploit tcp connections in Hadoop. Instead of declaring RPC timeout, RPC clients ping the RPC server when they detect tcp-socket timeout limits. Clients can wait if the server is alive and can communicate. If an RPC server has a communication burst, the client should wait and route its traffic to it.
A few years ago, Facebook's distributed system used Hadoop and Hive to store and analyze enormous data sets. Most of these analyses are offline batch processes to maximize throughput and efficiency, while some are online.
Using Facebook as a case study, we have analyzed the system, described its features, and laid out its architecture, communications, and components. This paper presents the results of a comprehensive investigation into Facebook's distributed system at its data center.
The Hadoop project is an example of a system that Facebook has developed on top of its technology, and it relies on the availability and stability provided by highly equipped data centers.