Function start detection in stripped binaries is an undecidable problem. This should be no surprise. Many problems in the program analysis domain fall into this category. Add to that the numerous types of CPU architectures, compilers, programming languages, application binary interfaces (ABIs), etc. and you’re left with an interesting, multifaceted, hard problem. Accurate detection of both function starts and the low-level basic blocks is often the first step in program analysis. Performing this task accurately is critical. These foundational analysis artifacts are often the starting point for many automated analysis tools.

Being undecidable, there is no single algorithm to identify all function starts across the wide variety of program binaries. Among some of the approaches enlisted to try and solve this problem are machine learning algorithms (e.g. Byteweight and Neural Networks) which use signature based pattern recognition. At first glance the results of these learners look promising but they are often biased to their training set.

Other approaches to function detection (e.g. Nucleus and Function Interface Analysis) are more sensible. Their research is focused on control-flow, data-flow, and function interface analysis rather than signature detection. Yet, these solutions fall short of addressing the entire problem domain across architectures in a generalized way.

For example, the Nucleus approach can be improved upon since it relies on heuristics to handle indirect control flow. What’s wrong with heuristics? They must be updated constantly for the latest compiler constructs, and can frequently produce invalid results, especially in highly optimized binaries. Even today, many tools rely on heuristics. That’s a dilemma. Even though they solve a specific problem really well, they rarely generalize to solve an entire class of problems.

Enter The Ninja

Binary Ninja is an extensible reverse engineering tool that provides a generalized, architecture-agnostic approach to program analysis. Therefore, one of our goals is to develop a function detection algorithm that comprises the best techniques available while being architecture-agnostic. We therefore intentionally overlook signature matching and heuristics as primary methods to provide this capability (though do not discount them entirely).

Carving Binary Spaces With Confidence

Given the multifaceted nature of function detection in stripped binaries, it is common to apply multiple strategies over multiple analysis passes. The order of those strategies is driven by the desire to achieve positive function identification with the highest degree of confidence first, as this should reduce the scope, complexity, and possibility for mistakes in later analysis stages. This is important. Mistakes early in the binary analysis process can degrade the results of higher-level analysis later. The search space for each analysis phase is the current set of unknown regions in the binary that are not tagged as code or data. After each phase, the search space shrinks by tagging regions as code and/or data. Currently, Binary Ninja performs analysis in four distinct phases:

  • Recursive Descent
  • Call Target Analysis
  • Control Flow Graph (CFG) Analysis
  • Tail Call Analysis

After analysis is complete, the desired result is a binary that is annotated in a similar way to a binary with full symbol information, or even better.

Recursive Descent

In earlier versions of Binary Ninja, automatic function detection is limited to our recursive descent algorithm. This algorithm analyzes the control flow from the defined entry point(s) and attempts to follow all code paths. Since the algorithm is based on control flow from a well-defined entry point it is inherently proficient at differentiating between code and data. A unique ability of Binary Ninja’s recursive descent algorithm is the ability to perform multi-threaded* analysis during this phase.

Results from recursive descent are highly accurate, but there are several limitations: handling indirect control flow constructs, disconnected functions, and tail call function identification. We present our solutions to these challenges later.

* Note that multi-threaded analysis is only available in Binary Ninja Commercial

Jump Table Inference With Generic Value-Set Analysis

Many tools approach indirect dataflow resolution with heuristics. Even some of the newest binary analysis tools still rely on heuristics to detect instruction sequences and patterns in order to locate jump tables. The problem is that every time a new compiler/optimization/architecture is released, the compilers’ indirect control flow idioms could change, undercutting the effectiveness of the once-sufficient heuristic. This creates a big problem for most types of function-detection algorithms. Broken jump tables negatively impact function detection because each of the indirect jump targets can appear to be a potential function as they are valid native blocks of instructions. Depending on the architecture, well-connected basic blocks may even appear within the jump table itself. Attempting to filter out all of these types of candidate functions is not only a tedious, but difficult task.

Table Value on HoverTable Value on Hover – Figure 1

Since adding heuristics to solve a problem can be an eternal cycle of misery, Binary Ninja provides a practical implementation of generic value-set analysis. Value-set analysis is in the domain of abstract interpretation. Essentially, it is a method to interpret the semantics of low-level CPU instructions and track the possible value of variables and/or registers at each instruction point. The value represents an approximation that is path sensitive. Given this capability it becomes trivial to query the possible values at an indirect jump-site. One example is shown in Figure 1. If the possible values represent a reasonable set of jump targets, then the recursive descent algorithm can continue. This reduces the resolution of an indirect control flow instruction idiom to a simple query via our possible value set API.

Multiple IndirectSolving jump tables with multiple levels of indirection – Figure 2

Because our value set analysis occurs within our Binary Ninja Intermediate Language (BNIL) set of abstractions, it is inherently architecture agnostic. This means that when we add a new architecture to Binary Ninja all of the existing value set analysis is automatically applied by Binary Ninja’s analysis core. The result is a lifted, clean binary ready for consumption. The next several sections describe the additional analysis phases which make up our implementation of linear sweep.

CALL TARGET ANALYSIS

This analysis phase sequentially disassembles the unknown regions in the search space while aggregating all potential call targets. For our purposes, a call target is the destination of a call instruction as returned from the architecture. Once complete, the call targets are ordered by the number of cross-references in descending order and handed off to the recursive descent algorithm for further analysis.

CONTROL FLOW GRAPH ANALYSIS

This analysis pass is based on the Control Flow Graph (CFG) recovery technique discussed in the research paper titled Compiler-Agnostic Function Detection in Binaries (i.e. Nucleus). A key insight from this paper is that compilers typically use different control-flow constructs for interprocedural control-flow versus intraprocedural control-flow. The general algorithm for the CFG recovery is to perform a linear disassembly of the unknown regions in the search space creating basic blocks. Then connect the basic blocks into groupings and group the basic blocks based on intraprocedural control-flow constructs. The basic block groupings become candidate functions which undergo some additional analysis to find the possible entry point. For more details about CFG recovery please reference the Nucleus paper. Once we have possible entry points, we hand them off to the recursive descent algorithm to perform the heavy lifting.

TAIL CALL ANALYSIS

Compilers employ tail call optimizations to reduce the overhead of adjusting the stack before and after the call to a function. This technique is typically applied when the last thing a function does before it returns is call another function. This optimization often appears as a jump to the target function. The caller stack frame is available to the target function, which may or may not require a stack frame.

Recursive traversal by itself does not detect tail call functions since a jump is simply a direct control flow that is followed by the algorithm. Note that the techniques used in the linear sweep passes described thus far attempt to find functions in areas of the binary that are not already indicated as part of a basic block or data variable.

Tail Call Before AnalysisTail call before analysis – Figure 3

For Binary Ninja, we add a tail call analysis pass as the final stage of our function detection. This pass requires iterating over existing functions. Generally, if we detect a basic block that jumps before or after the current function boundary, then it is likely a tail call function. This works well for some cases, however there are other times where the tail call function is part of the current function under analysis, as shown in Figure 3. For these situations we use our static dataflow capability to query the stack frame offset at the jump site as well as in the candidate tail call function. This allows Binary Ninja to determine if the stack offset is zero and whether a particular jump site may in fact be a tail call. Figure 4 illustrates the result of Binary Ninja’s tail-call analysis, the correct identification of a tail-called function.

Tail Call After AnalysisTail call after analysis – Figure 4

Evaluation

We currently evaluate and refine our function detection capability on three different data sets. These include the ByteWeight data set, a subset of binaries from the Cyber Grand Challenge (CGC) corpus of binaries, as well as our own BusyBox binaries that are cross-compiled for various architectures. For most of our results we compare Binary Ninja with Nucleus only. The reason for this is that Nucleus is readily available, and easy to set up and generate results for comparison.

The results presented below were generated with Binary Ninja 1.1.988-dev, IDA Pro Version 7.0.170914, and Nucleus (c98b48c). ByteWeight results are reposted from the original work.

GROUND TRUTH DATA

Unless otherwise noted, in each data set for all of the non-stripped ELF binaries, we generate ground truth data for function starts by parsing output from the readelf binary utility. We also exclude PLT entries or thunks as was done in prior research.

Reference the Sensitivity and Specificity article regarding statistical measures calculated in the tables below. Note the following definitions:

  • True Positives (TP): identified function starts that match ground truth data
  • False Positives (FP): identified function starts that do not match ground truth data
  • False Negatives (FN): function starts in ground truth data that are not detected

BYTEWEIGHT DATA SET

This data set consists of 2200 binaries that target both x86-32 and x86-64 architectures. 2064 of the binaries are compiled with gcc or icc across various optimization levels. The remaining 136 binaries are PEs compiled with msvc. Note that the ground truth data for the pe-x86/pe-x86-64 binaries are obtained from the original Byteweight data set. It should be noted that Nucleus never presented results for the icc compiler. We hypothesize the algorithm performance is degraded on icc due to the use of heuristics for resolving indirect control flow.

Original ByteWeight Data Set (gcc/icc/msvc)

Binary Ninja Byteweight Nucleus
Precision 0.9734 0.9730 0.8130
Recall 0.9632 0.9744 0.9347
F1 0.9683 0.9737 0.8696

We also include a modified ByteWeight data set that swaps out the icc compiled binaries with ones compiled with clang/llvm. (Thanks to Dennis for providing the clang binaries.)

Modified ByteWeight Data Set (gcc/clang/msvc)

Binary Ninja Nucleus
Precision 0.9702 0.9704
Recall 0.9753 0.9461
F1 0.9727 0.9581

DARPA CHALLENGE BINARIES

This data set consists of a subset of the DARPA challenge binaries created for the Cyber Grand Challenge (CGC) and adapted by Trail Of Bits. We have plans to expand this data set to multiple compilers but for now it includes 283 binaries compiled with clang/llvm for the x86-32 architecture.

DARPA Challenge Binaries (clang/llvm)

Binary Ninja Nucleus
Precision 0.9996 0.9985
Recall 0.9994 0.9815
F1 0.9995 0.9899

BUSYBOX MULTIARCH BINARIES

This data set consists of 16 BusyBox binaries where each is cross-compiled for a different architecture. All binaries are compiled with gcc using the default BusyBox settings. One set has optimizations enabled and another has them disabled.

Busybox MultiArch Binaries (gcc)

Binary Ninja IDA Pro v7.0 Nucleus
Precision Recall F1 Precision Recall F1 Precision Recall F1
busybox_aarch64_opt 0.9958 0.9551 0.9750 0.8461 0.8355 0.8408 0.9857 0.9101 0.9464
busybox_aarch64 0.9980 0.9989 0.9984 0.9001 0.9600 0.9291 0.8397 0.9861 0.9071
busybox_arm-thumb_opt 0.8791 0.9411 0.9091 0.8569 0.8818 0.8692 0.0121 0.0453 0.0190
busybox_arm-thumb 0.9258 0.9727 0.9487 0.8985 0.9223 0.9103 0.0081 0.0314 0.0128
busybox_arm_opt 0.9976 0.9890 0.9933 0.8499 0.8580 0.8539 0.1973 0.2142 0.2054
busybox_arm 0.9964 0.9961 0.9963 0.8993 0.9351 0.9169 0.2722 0.2729 0.2725
busybox_i386_opt 1.0000 0.9944 0.9972 0.8638 0.9598 0.9093 1.0000 0.9639 0.9816
busybox_i386 0.9975 0.9986 0.9980 0.9035 0.9822 0.9412 0.9961 0.9886 0.9923
busybox_mips32_opt 0.9812 0.9380 0.9591 0.7448 0.9104 0.8194 0.9451 0.9156 0.9301
busybox_mips32 0.9886 0.9906 0.9896 0.8232 0.9583 0.8857 0.9861 0.8478 0.9118
busybox_mipsel32_opt 0.9788 0.9382 0.9580 0.7440 0.9122 0.8196
busybox_mipsel32 0.9886 0.9905 0.9896 0.8207 0.9621 0.8858
busybox_powerpc_opt 0.8449 0.9416 0.8906 0.7610 0.9486 0.8445 0.8144 0.7777 0.7957
busybox_powerpc 0.9019 0.9928 0.9452 0.8245 0.9833 0.8970 0.7635 0.9872 0.8611
busybox_x86-64_opt 1.0000 0.9924 0.9962 0.8613 0.9551 0.9058 1.0000 0.9555 0.9773
busybox_x86-64 0.9986 0.9994 0.9990 0.8982 0.9629 0.9294 0.9943 0.9898 0.9920
Overall 0.9662 0.9795 0.9728 0.8454 0.9373 0.8890 0.4932 0.6183 0.5487

Mission Accomplished

We have a few architectures where performance lags. However, these are typically due to incomplete lifting and/or other architectural differences (e.g. delay slots, multiple architectures in a binary). We are very pleased with the results, but we are not finished. Stay tuned.

Raw Results

We think it’s important to be transparent and support reproducibility of our work so we’ve published the binaries used in testing.

Acknowledgements

Thanks to the following whose shoulders we stand on and whose contributions helped this work:

 

Post originally appeared at:

https://binary.ninja/2017/11/06/architecture-agnostic-function-detection-in-binaries.html

In the event that your Windows machine has been compromised or for any other reason, this Penetration Testing Cheat Sheet is intended to help.

This Penetration Testing Cheat Sheet article is for Windows Administrators and security personnel to better execute a thorough examination of their framework (inside and out) keeping in mind the end goal is to search for indications of compromise.

Apart From this You can Read Many Penetration testing Articles Here .

Also Read –   Become Master in Cyber Security with Complete Advance Level Security Course Bundle

1.Unusual Log Entries:

Check your logs for suspicious events, such as:

  • “Event log service was stopped.”
  • “Windows File Protection is not active on this system.”
  • “The protected System file [file name] was not restored to its original, valid version because of the Windows File Protection…”
  • “The MS Telnet Service has started successfully.”
  • Look for a large number of failed logon attempts or locked out accounts.

Penetration Testing Cheat Sheet To do this using the GUI, run the Windows event viewer:

C:> eventvwr.msc

Using the command prompt:

C:> eventquery.vbs | more

Or, to focus on a particular event log:

C:> eventquery.vbs /L security

Also Read:   Google : Microsoft is putting Windows 7 and 8.1 users in danger By only patching Windows 10

2.Unusual Processes and Services:

Look for unusual/unexpected processes, and focus on processes with User Name “SYSTEM” or “Administrator” (or users in the Administrators’ group). You need to be familiar with normal processes and services and search for deviations.

Using the GUI, run Task Manager:

C:> taskmgr.exe

Using the command prompt:

C:> tasklist
C:> wmic process list full

Also look for unusual services.

Using the GUI:
C:> services.msc

Using the command prompt:

C:> net start
C:> sc query

For a list of services associated with each process:

C:> tasklist /svc

3 Complete Advance Level  Security Course Bundle for just $59

3.Unusual Files and Registry Keys

Check file space usage to look for sudden major decreases in free space, using the GUI (right-click on a partition), or type:

C:> dir c:

Look for unusually big files:

Start–> Search–>For Files of Folders… Search Options–>Size–>At Least 10000KB

Look for strange programs referred to in registry keys associated with system start up:

HKLMSoftwareMicrosoftWindowsCurrentVersionRun

HKLMSoftwareMicrosoftWindowsCurrentVersionRunonce 

HKLMSoftwareMicrosoftWindowsCurrentVersionRunonceEx

Note that you should also check the HKCU counterparts (replace HKLM with HKCU above).
Using the GUI:

C:> regedit

Using the command prompt:

C:> reg query <reg key>

4.Penetration Testing Cheat Sheet for Unusual Network Usage

Look at file shares, and make sure each has a defined business purpose:

C:> net view \127.0.0.1

Look at who has an open session with the machine:

C:> net session

Look at which sessions this machine has opened with other systems:

C:> net use

Look at NetBIOS over TCP/IP activity:

C:> nbtstat –S

Look for unusual listening TCP and UDP ports:

C:> netstat –na

For continuously updated and scrolling output of this command every 5 seconds:

C:> netstat –na 5

The –o flag shows the owning process id:

C:> netstat –nao 5

The –b flag shows the executable name and the DLLs loaded for the network connection.

C:> netstat –naob 5

Note that the –b flag uses excessive CPU resources.
Again, you need to understand normal port usage for the system and look for deviations.

Also, check Windows Firewall configuration:

C:> netsh firewall show config

5.Unusual Scheduled Tasks

Look for unusually scheduled tasks, especially those that run as a user in the Administrators group, as SYSTEM, or with a blank user name.

Using the GUI, run Task Scheduler:

Start–>Programs–>Accessories–>System Tools–>Scheduled Tasks

Using the command prompt:

C:> schtasks

Check other autostart items as well for unexpected entries, remembering to check user autostart directories and registry keys.

Using the GUI, run msconfig and look at the Startup tab:

Start –> Run, msconfig.exe

Using the command prompt:

C:> wmic startup list full

6.Unusual Accounts

Look for new, unexpected accounts in the Administrators group:

C:> lusrmgr.msc

Click on Groups, Double Click on Administrators, then check members of this group.
This can also be done at the command prompt:

C:> net user
C:> net localgroup administrators

7.Other Unusual Items

Look for unusually sluggish performance and a single unusual process hogging the CPU:

Task Manager –> Process and Performance tabs

Look for unusual system crashes, beyond the normal level for the given system.

On a periodic basis (daily, weekly, or each time you logon to a system you manage,) run through these quick steps to look for anomalous behavior that might be caused by a computer intrusion. Each of these commands runs locally on a system.

 

 

The article originally appeared on:

facebook-link-spoofing

While scrolling on Facebook how you decide which link/article should be clicked or opened?

Facebook timeline and Messenger display title, description, thumbnail image and URL of every shared-link, and this information are enough to decide if the content is of your interest or not.

Since Facebook is full of spam, clickbait and fake news articles these days, most users do not click every second link served to them.

But yes, the possibility of opening an article is much higher when the content of your interest comes from a legitimate and authoritative website, like YouTube or Instagram.

However, what if a link shared from a legitimate website lands you into trouble?

Even before links shared on Facebook could not be edited, but to stop the spread of misinformation and false news, the social media giant also removed the ability for Pages to edit title, description, thumbnail image of a link in July 2017.However, it turns out that—spammers can spoof URLs of the shared-links to trick users into visiting pages they do not expect, redirecting them to phishing or fake news websites with malware or malicious content.

Discovered by 24-year-old security researcher Barak Tawily, a simple trick could allow anyone to spoof URLs by exploiting the way Facebook fetch link previews.

In brief, Facebook scans shared-link for Open Graph meta tags to determine page properties, specifically ‘og:url’, ‘og:image’ and ‘og:title’ to fetch its URL, thumbnail image and title respectively.

facebook security

Interestingly, Tawily found that Facebook does not validate if the link mentioned in ‘og:url’ meta tag is same as the page URL, allowing spammers to spread malicious web pages on Facebook with spoofed URLs by just adding legitimate URLs in ‘og:url’ Open Graph meta tag on their websites.

“In my opinion, all Facebook users think that preview data shown by Facebook is reliable, and will click the links they are interested in, which makes them easily targeted by attackers that abuse this feature in order to perform several types of attacks, including phishing campaigns/ads/click fraud pay-per-click,” Tawily told The Hacker News.

Tawily reported the issue to Facebook, but the social media giant refused to recognise it as a security flaw and referred that Facebook uses “Linkshim” to protect against such attacks.

If you are unaware, every time a link is clicked on Facebook, a system called “Linkshim” checks that URL against the company’s own blacklist of malicious links to avoid phishing and malicious websites.This means if an attacker is using a new domain for generating spoofed links, it would not be easy for Linkshim system to identify if it is malicious.

Although Linkshim also uses machine learning to identify never-seen-before malicious pages by scanning its content, Tawily found that the protection mechanism could be bypassed by serving non-malicious content explicitly to Facebook bot based on User-Agent or IP address.

Tawily has also provided a demo video to show the attack in action. You can watch the video above.

Since there is no way to check the actual URL behind a shared link on Facebook without opening it, there is a little user can do to protect themselves except being vigilant.

Article appeared on:

Project taken from:

https://www.hackster.io/shiva-siddharth/simultaneously-run-alexa-and-google-assistant-on-pi-12e8df