揭秘:从内部源码看Facebook技术(第一集)

时间:2022-04-26
本文章向大家介绍揭秘:从内部源码看Facebook技术(第一集),主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

Warning

本文中所有代码都是通过合法途径获得。

写在前面

我是一名铁杆Facebook粉丝。Facebook为开源社区贡献了许多力量,经常开放他们内部的软件。比如Phabricator, libphutil, 以及 XHP都是不错的好东西。

Phabricator是Facebook开发的可视化代码审查工具。工程师可以在页面上非常方便的针对每一段(单行或者多行)代码进行交互讨论。负责审查的工程师可以接受代码改变,可以提出疑问要求原作者继续修改。

曾经有段时间我对Phabricator 和XHP(一个PHP扩展)进行了优化研究,却意外发现了许多有关Facebook的内部资料。

意外的发现

大概是2013年6月份左右,那时我已经在使用Phabricator修复bug了。如果我没有记错的话,Phabricator程序当时是返回了一个PhutilBootloaderException错误信息。

当时我并不知道Phabricator是怎么运行的,于是就Google查询了下错误信息……就跟你想的一样,我获得了源代码以及一些参考链接,其中有一个链接十分抢眼——一个Pastebin(一个轻量级的文本分享工具)分享链接,里面有Facebook很多的内部数据。

当然,这引起了我的兴趣,下面就是我的发现...

[emir@dev3003 ~/devtools/libphutil] arcdiff --trace
>>> [0] <conduit>conduit.connect()
<<< [0] <conduit> 98,172 us
>>> [1] <exec> $ (cd&'/home/emir/devtools/libphutil&'; git rev-parse --show-cdup)
<<< [1] <exec> 13,629 us
>>> [2] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git rev-parse --verify HEAD^)
<<< [2] <exec> 17,024 us
>>> [3] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git diff --no-ext-diff --no-textconv --raw&'HEAD^&' --)
>>> [4] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git diff --no-ext-diff --no-textconv --rawHEAD --)
>>> [5] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git ls-files --others --exclude-standard)
>>> [6] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git ls-files -m)
<<< [5] <exec> 73,004 us
<<< [6] <exec> 74,084 us
<<< [4] <exec> 77,907 us
<<< [3] <exec> 80,606 us
>>> [7] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git log --first-parent --format=medium&'HEAD^&'..HEAD)
<<< [7] <exec> 16,390 us
>>> [8] <conduit>differential.parsecommitmessage()
<<< [8] <conduit> 106,631 us
Linting...
>>> [9] <exec> $ (cd&'/home/emir/devtools/libphutil&'; git rev-parse --show-cdup)
<<< [9] <exec> 9,976 us
>>> [10] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git merge-base &'HEAD^&' HEAD)
<<< [10] <exec> 13,472 us
>>> [11] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git diff --no-ext-diff --no-textconv --raw&'00645a0aec09edc7f0f1f573032991ae94faa01b&' --)
>>> [12] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git diff --no-ext-diff --no-textconv --rawHEAD --)
>>> [13] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git ls-files --others --exclude-standard)
>>> [14] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git ls-files -m)
<<< [11] <exec> 19,092 us
<<< [14] <exec> 15,219 us
<<< [12] <exec> 21,602 us
<<< [13] <exec> 43,139 us
>>> [15] <exec> $ (cd&'/home/emir/devtools/libphutil/&'; git diff --no-ext-diff --no-textconv -M -C--no-color --src-prefix=a/ --dst-prefix=b/ -U32767&'00645a0aec09edc7f0f1f573032991ae94faa01b&' --)
<<< [15] <exec> 28,318 us
>>> [16] <exec> $&'/home/engshare/devtools/libphutil/src/parser/xhpast/bin/xhpast&' --version
<<< [16] <exec> 11,420 us
>>> [17] <exec> $&'/home/engshare/devtools/arcanist/scripts/phutil_analyzer.php&'&'/home/emir/devtools/libphutil/src/markup/engine/remarkup/markuprule/hyperlink&'
<<< [17] <exec> 490,196 us
>>> [18] <exec> $&'/home/engshare/devtools/arcanist/scripts/phutil_analyzer.php&'&'/home/engshare/devtools/libphutil/src/markup&'
>>> [19] <exec> $&'/home/engshare/devtools/arcanist/scripts/phutil_analyzer.php&'&'/home/engshare/devtools/libphutil/src/markup/engine/remarkup/markuprule/base&'
>>> [20] <exec> $&'/home/engshare/devtools/arcanist/scripts/phutil_analyzer.php&' &'/home/engshare/devtools/libphutil/src/parser/uri&'
>>> [21] <exec> $&'/home/engshare/devtools/arcanist/scripts/phutil_analyzer.php&'&'/home/engshare/devtools/libphutil/src/utils&'
<<< [18] <exec> 498,899 us
<<< [19] <exec> 497,710 us
<<< [20] <exec> 517,740 us
<<< [21] <exec> 556,267 us
>>> [22] <exec> $&'/home/engshare/devtools/libphutil/src/parser/xhpast/bin/xhpast&'
<<< [22] <exec> 10,066 us
 LINTOKAY  No lint problems.
Running unit tests...
HipHopFatal error: Uncaught exception exception &'PhutilBootloaderException&' withmessage &'The phutil library &'&' has not been loaded!&' in/home/engshare/devtools/libphutil/src/__phutil_library_init__.php:124nStacktrace:n#0 /home/engshare/devtools/libphutil/src/__phutil_library_init__.php(177):PhutilBootloader->getLibraryRoot()n#1/home/engshare/devtools/arcanist/src/unit/engine/phutil/PhutilUnitTestEngine.php(53):PhutilBootloader->moduleExists()n#2/home/engshare/devtools/arcanist/src/workflow/unit/ArcanistUnitWorkflow.php(113):PhutilUnitTestEngine->run()n#3/home/engshare/devtools/arcanist/src/workflow/diff/ArcanistDiffWorkflow.php(1172):ArcanistUnitWorkflow->run()n#4/home/engshare/devtools/arcanist/src/workflow/diff/ArcanistDiffWorkflow.php(225):ArcanistDiffWorkflow->runUnit()n#5/home/engshare/devtools/arcanist/scripts/arcanist.php(257):ArcanistDiffWorkflow->run()n#6 {main}

Okay,这确实不是完整的源码。这仅仅是一些命令行输出,但依旧告诉了我们一些有趣的信息。

数据分析

我们可以得知用户名“emir”,或许这是该用户的First Name(相当于国人的名),当然也可能是由姓的第一个字母加上名(E. Mir)。我们可以通过另外一个Facebook的工程师清楚的看到这些输出,所以在Pastebin上面发布文章不是一个明智的做法。这个人的这个做法很容易被攻击者盯住,惹来不必要的麻烦。

"dev3003"是emir那个时候使用的机器名字,同时我们可以得知Facebook至少有3000台机器支持着开发工作(假设“3003”是从1开始增长的,我对这个假设很确信呢)

`/home/engshare/devtools/`是libphutil和arcanist的安装路径,如果我的记忆没问题的话,`/home/engshare/`是通过NFS开发机器之间进行分享,这里没有什么比较有趣的,但是也有可能存在其他脚本定位在这个目录。

这里也有一些执行时间的信息,以及Git hashes。

之后,我又继续尝试类似Pastebin文章中的操作。结果告诉我,并没让我失望!

[25/10/2013]Promoting The Meme Bank (1/1) - Campaign Update Failed: Campaign 6009258279237:Value cannot be null (Value given: null)TAAL[BLAME_files,www/flib/core/utils/enforce.php,www/flib/core/utils/EnforceBase.php]

有趣的是,它显示了路径和文件名。"flib" (Facebook Library)是一个包含实用工具的内部lib以帮助开发。我们再深入一些...

[ksalas@dev578 ~/www]./scripts/intl/intl_string.php scan .
Loading modules, hang on...
Analyzing directory `.&'
Error: Command `ulimit -s 65536 &&/mnt/vol/engshare/tools/fbt_extractor -tasks 32 &'/data/users/ksalas/www-hg&'&'failed with error #2:
stdout:
 
stderr:
warning: parsing problem in/data/users/ksalas/www-hg/flib/intern/third-party/phpunit/phpunit/Tests/TextUI/dataprovider-log-xml-isolation.phpt
warning: parsing problem in/data/users/ksalas/www-hg/flib/intern/third-party/phpunit/phpunit/Tests/TextUI/dataprovider-log-xml.phpt
warning: parsing problem in /data/users/ksalas/www-hg/flib/intern/third-party/phpunit/phpunit/Tests/TextUI/log-xml.phpt
warning: parsing problem in/data/users/ksalas/www-hg/scripts/sandcastle/local_testing/script_for_test_commits.php
warning: parsing problem in/data/users/ksalas/www-hg/lib/arcanist/lint/linter/__tests__/hphpast/php-tags-script.lint-test
LEXER: unrecognised symbol, in token rule:&'
warning: parsing problem in/data/users/ksalas/www-hg/scripts/intern/test/test.php
warning: parsing problem in/data/users/ksalas/www-hg/scripts/intern/test/test2.php
Fatal error: exception Common.Todo
Fatal error: exceptionSys_error("Broken pipe")
 
 
Typeintl_string.php --help to get more information about how to use this script.

dev578中的Ksalas似乎在运行一个字符串解析器。`intl_string.php`尝试运行`/mnt/vol/engshare/tools/fbt_extractor`,所以我们可以知道在`/mnt/vol/engshare/`目录下还有其他的脚本,我们可以看到他们正在使用PHP Unit进行单元测试,"www-hg"是Mercurial的目录!众所周知,他们从Subversion进行Git迁移数据。

"That&'s still not god damn sourcecode!" 我听见有人在哭了……

Index: flib/core/db/queryf.php
===================================================================
--- flib/core/db/queryf.php
+++ flib/core/db/queryf.php
@@ -1104,11 +1104,12 @@
 *  @author rmcelroy
  */
 function mysql_query_all($sql, $ok_sql, $conn,$params) {
+ FBTraceDB::rqsend($ok_sql);
  switch (SQLQueryType::parse($sql)) {
    case SQLQueryType::READ:
      $t_start = microtime(true);
      $result = mysql_query_read($ok_sql, $conn);
      $t_end = microtime(true);
      $t_delta = $t_end - $t_start;
      if ($t_delta > ProfilingThresholds::$queryReadDuration) {
         ProfilingThresholds::recordDurationError(&'mysql.queryReadDuration&',

`flib/core/db/queryf.php`就是问题文件。在前面,我们认为这只是一个文件与另一个文件MySQL相关函数之间的差异。我们可以通过`mysql_query_all()`查询函数。从目前能够得到的代码看来,这是一个十分简单的查询函数,或许实际上它很复杂,但是不幸的是我们可能永远不会知道。

我会发布一些我发现的示例代码,这些都可以通过文章底部下载链接进行下载。

diff --gita/flib/entity/user/personal/EntPersonalUser.phpb/flib/entity/user/personal/EntPersonalUser.php
index 4de7ad8..439c162 100644
---a/flib/entity/user/personal/EntPersonalUser.php
+++b/flib/entity/user/personal/EntPersonalUser.php
@@ -306,13 +306,15 @@ class EntPersonalUserextends EntProfile
 
  public function prepareFriendIDs() {
    require_module_lazy(&'friends&');
-   // TODO: add privacy checks!
    DT(&'ReciprocalFriends&')->add($this->id);
    return null;
   }
 
  public function getFriendIDs() {
-   return DT(&'ReciprocalFriends&')->get($this->id);
+   if ($this->canSeeFriends()) {
+     return DT(&'ReciprocalFriends&')->get($this->id);
+   }
+   return array();
   }
 
  /**
@@ -397,6 +399,7 @@ class EntPersonalUserextends EntProfile
      $this->viewerCanSee,
      array(
        PrivacyConcepts::EXISTENCE,
+       PrivacyConcepts::FRIENDS,
        // Note that we&'re fetching GENDER here because it&'s PAI
        // so it&'s cheap and because we don&'t want to add a prepareGender
        // call here if we don&'t have to.
@@ -418,6 +421,10 @@ class EntPersonalUserextends EntProfile
    return must_prepare($this->viewerCanSee)->canSee();
   }
 
+ protected function canSeeFriends() {
+   return must_prepare($this->viewerCanSee)->canSeeFriends();
+  }
+
# update your local master branch
  gitcheckout master
  gitpull --rebase
 
# never do any work on master branch
# create & switch to new branch instead
  gitcheckout -b my_branch
 
# rebase &'my_branch&' onto master
  gitcheckout my_branch
  gitrebase master
 
# list branches
  gitbranch
 
# delete &'my_branch&' branch
  $git branch -d my_branch
 
# shows status
$ git status
 
stage file, also remove conflict
  $git add <file>
 
revert file to head revision
  $git checkout -- <file>
 
commit change
  $git commit -a --amend
   -a       stages all modified files
   --amend  overwrites last commit
 
show all local history (amend commits,branch changes, etc.)
  $git reflog
 
show history (there is lot of options)
  $git log
  $git log --pretty=oneline --abbrev-commit --author=plamenko
  $git log -S"text to search"
 
show last commit (what is about to be sendfor diff)
  $git show
 
get the version of the file from the givencommit
  $git checkout <commit> path/to/file 
 
fetch & merge
  $git pull --rebase
 
resolving conflicts:
  useours:
    $git checkout --ours index.html
  usetheirs:
    $git checkout --theirs index.html
 
commit author:
  $git config --global user.name "Ognjen Dragoljevic"
  $git config --global user.email plamenko@fb.com
 
 After doing this, you may fix the identity used for this commit with:
  $git commit --amend --reset-author
 
commit template:
 /mnt/vol/engshare/admin/scripts/templates/git-commit-template.txt
 
rename a branch:
  $git branch -m old_branch new_branch
 
interactive rebase
  $git rebase -i master
 pick
 edit
   make changes
   ...
    $git commit -a --amend
    $git rebase --continue
 exec
    $arc diff
    $arc amend
    $git push --dry-run origin HEAD:master // remove dry-run to do actual push
   ...
 
to update commit message in phabricator
  $ arc diff --verbatim
#!/bin/bash
#
# Creates a new www sandbox managed by git.
#
# Usage: git-clone-www [dirname]
#
# dirname defaults to "www-git".
#
 
DIRNAME=${1:-www-git}
 
NFS_REPO=/home/engshare/git/tfb
 
# Are we running on a machine that has alocal shared copy of the git repo?
if [ -d /data/git/tfb ]; then
  #Yes. Reuse its objects directory.
 echo "Cloning the local host&'s shared www repository..."
 PARENT=/data/git/tfb
 SHARE=-s
else
  #Nope, copy the NFS server&'s objects locally so as not to be dog slow.
 echo "Copying from the shared www repository on the NFSserver..."
 PARENT=$NFS_REPO
 SHARE=
fi
 
if [ ! -d $HOME/local ]; then
 echo "You don&'t seem to have a &'local&' symlink in your homedirectory."
 echo "Fix that and try again."
 exit 1
fi
 
cd $HOME/local
if [ -d "$DIRNAME" ]; then
 echo "You already have a $DIRNAME directory; won&'t overwriteit."
 echo "Aborting."
 exit 1
fi
 
# We clone the shared repository hererather than running "git svn clone"
# because it&'s much, much more efficient.And the clone has some options:
#
# -n = Don&'t check out working copy yet.
# -s = Reference the origin&'s .git/objectsdirectory rather than copying.
#     Saves gobs of disk space and makes the clone nearly instantaneous.
#     We don&'t do this if there&'s no local-disk shared repo.
 
git clone $SHARE -n "$PARENT""$DIRNAME"
 
cd "$DIRNAME"
 
# If we&'re sharing a local repository&'sobjects, use the NFS server as a
# fallback so stuff doesn&'t break if we usethis repo from another host
# that doesn&'t have a /data/git/tfbdirectory.
ALTERNATES=.git/objects/info/alternates
if [ -s $ALTERNATES ]; then
 echo $NFS_REPO/.git/objects >> $ALTERNATES
fi
 
# We want to use the same remote branchname ("remotes/trunk") for git-svn
# and for fetches from the shared git repo,so set that up explicitly.
git config remote.origin.url"file://$PARENT/.git"
git config remote.origin.fetchrefs/remotes/trunk:refs/remotes/trunk
git config --remove-section branch.master
 
# Enable the standard commit template
git config commit.template/home/engshare/admin/scripts/templates/git-commit-template.txt
 
# Enable recording of rebase conflictresolutions
git config rerere.enabled true
 
# Now fetch from the shared repo. Thismostly just creates the new "trunk"
# branch since we already have the objectsthanks to the initial "git clone".
git fetch origin
 
# Blow away the "origin/"branches created by "git clone" -- we don&'t need them.
rm -rf .git/refs/remotes/origin
 
# Now it&'s time to turn this plain old gitrepo into a git-svn repo. Really
# all we need is the svn-remoteconfiguration (installed above) and a
# metadata file with some versioninformation. git-svn is smart enough to
# rebuild the other stuff it needs.
 
echo ""
echo "Synchronizing with svn..."
 
git svn init -itrunksvn+ssh://tubbs/svnroot/tfb/trunk/www
 
# Now tweak the git-svn config a little bitso it&'s easier for someone to
# go add more "fetch" lines ifthey want to track svn-side branches in
# addition to trunk. This doesn&'t affectany of the existing history.
git config svn-remote.svn.urlsvn+ssh://tubbs/svnroot
git config svn-remote.svn.fetchtfb/trunk/www:refs/remotes/trunk
 
# Let git-svn update its mappings and fetchthe latest revisions. This can
# spew lots of uninteresting output sosuppress it.
git svn fetch > /dev/null
 
echo ""
echo "Checking out workingcopy..."
 
# We use git reset here because the git svnfetch might have advanced trunk
# to a newer revision than the masterbranch created by git clone.
git reset --hard trunk
 
if [ ! -d "$HOME/$DIRNAME" ];then
 echo ""
 echo "Making home dir symlink: $HOME/$DIRNAME"
  ln-s "local/$DIRNAME" "$HOME/$DIRNAME"
else
 echo ""
 echo "$HOME/$DIRNAME already exists; leaving it alone."
fi
 
echo ""
echo "All done. To make this your newmain sandbox directory, run"
echo ""
echo "    rm -rf ~/www"
echo "    ln -s ~/$DIRNAME ~/www"
echo""

Facebook MysSql数据库密码

最后,我想分享一些我认为有趣的东西。Facebook&'s MySQL password.似乎保存在`print_r()`数组

array( &'ip&' => &'10.21.209.92&', &'db_name&' => &'insights&', &'user&' => &'mark&',&'pass&' => &'e5p0nd4&', &'mode&' => &'r&', &'port&' => 3306, &'cleanup&' =>false, &'num_retries&' => 3, &'log_after_num_retries&' => 4, &'reason&' =>&'insights&', &'cdb&' => true, &'flags&' => 0, &'is_shadow&' => false,&'backoff_retry&' => false, )
Host: 10.21.209.92 (Private IP)
Database Name: insights
User: mark
Password:e5p0nd4

Okay,尽管Facebook数据库服务有大量防火墙进行保护,这也许不是最安全的密码。

学习收获

我们今天学到了什么呢?我们最好不要在面向公众的站点(就比如共享工具Pastebin)发布内部源码。另外还有一点:确保调试信息不会被用户看到。

下载地址

声明:仅供学习和研究用途

链接:http://pan.baidu.com/s/1kTEBXuJ 密码:p3ve 密码:freebuf.com

[参考来源Sinthetic Labs,译/实习编辑鸢尾,转载请注明来自Freebuf黑客与极客(FreeBuf.COM)]