Questions - SabaiDiscuss Demo Site

1 vote

1 answer

20k views

So visitors don't need to be registered to access this screen?

General Questions

So visitors don’t need to be registered to access this screen?

Ben asked 13 years ago
last active 13 years ago

1 vote

In reply to: Best way to prevent SQL injection?

For those unsure of how to use PDO (coming from the mysql_ functions), I made a very, very simple PDO wrapper that is a single file. It exists to show how easy it is to do all the common things applications need done. Works with PostgreSQL, MySQL, and SQLite.

Basically, read it while you read the manual to see how to put the PDO functions to use in real life to make it simple to store and retrieve values in the format you want.

I want a single column
$count = DB::column('SELECT COUNT(*) FROM `user`);
I want an array(key => value) results (i.e. for making a selectbox)
$pairs = DB::pairs('SELECT `id`, `username` FROM `user`);
I want a single row result
$user = DB::row('SELECT * FROM `user` WHERE `id` = ?', array($user_id));
I want an array of results
$banned_users = DB::fetch('SELECT * FROM `user` WHERE `banned` = ?', array(TRUE));

NOTE: This answer was originally posted at StackOverflow.com by Xeoncross

Mary answered 13 years ago

1 vote

In reply to: php code test

Albert, thanks for responding. Adding an option to open links to new page would make your plugin much flexible. Regarding putting 4 spaces before code, I guess no one know should do this, only if you notify them before.

That was our honest testing and review.

demo answered 13 years ago

0 votes

In reply to: Is this a suitable platform to run a peer review system for academia ?

hello

demo answered 13 years ago

1 vote

In reply to: How to parse and process HTML with PHP?

phpQuery and QueryPath are extremely similar in replicating the fluent jQuery API. That’s also why they’re one of the easiest approaches to properly parse HTML in PHP.

Examples for QueryPath

Basically you first create a queryable DOM tree from a HTML string:

 $qp = qp("<html><body><h1>title</h1>..."); // or give filename or URL

The resulting object contains a complete tree representation of the HTML document. It can be traversed using DOM methods. But the common approach is to use CSS selectors like in jQuery:

 $qp->find("div.classname")->children()->...;

 foreach ($qp->find("p img") as $img) {
     print qp($img)->attr("src");
 }

Mostly you want to use simple #id and .class or DIV tag selectors for ->find(). But you can also use xpath statements, which sometimes are faster. Also typical jQuery methods like ->children() and ->text() and particularily ->attr() simplify extracting the right HTML snippets. (And already have their SGML entities decoded.)

 $qp->xpath("//div/p[1]");  // get first paragraph in a div

QueryPath also allows injecting new tags into the stream (->append), and later output and prettify an updated document (->writeHTML). It can not only parse malformed HTML, but also various XML dialects (with namespaces), and even extract data from HTML microformats (XFN, vCard).

 $qp->find("a[target=_blank]")->toggleClass("usability-blunder");

phpQuery or QueryPath?

Generally QueryPath is better suited for manipulation of documents. While phpQuery also implements some pseudo AJAX methods (just HTTP requests) to more closely resemble jQuery. It is said that phpQuery is often faster than QueryPath (because overall less features).

For further informations on the differences see this comparison:
http://web.archive.org/web/20101230230134/http://www.tagbytag.org/articles/phpquery-vs-querypath (Original source went missing, so here’s an internet archive link. Yes, you can still locate missing pages, people.)

And here’s a comprehensive QueryPath introduction: http://www.ibm.com/developerworks/opensource/library/os-php-querypath/index.html?S_TACT=105AGX01&S_CMP=HP

Advantages

Simplicity and Reliability
Simple to use alternatives ->find(“a img, a object, div a”)
Proper data unescaping (in comparison to regular expression greping)

NOTE: This answer was originally posted at StackOverflow.com by mario

Dorothy answered 15 years ago
last active 13 years ago

1 vote

In reply to: How to parse and process HTML with PHP?

Native XML Extensions

I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup.

DOM

The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C’s Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents.

DOM is capable of parsing and modifying real world (broken) HTML and it can do XPath queries. It is based on libxml.

It takes some time to get productive with DOM, but that time is well worth IMO. Since DOM is a language agnostic interface, you’ll find implementations in many languages, so if you need to change your programming language, chances are you will already know how to use that language’s DOM API then.

A basic usage example can be found in Grabbing the href attribute of an A element and a general conceptual overview can be found at Noob question about DOMDocument in php

How to use the DOM extension has been covered extensively on StackOverflow, so if you choose to use it, you can be sure most of the issues you run into can be solved by searching/browsing Stack Overflow.

XMLReader

The XMLReader extension is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.

XMLReader, like DOM, is based on libxml. I am not aware on how to trigger the HTML Parser Module, so chances are using XMLReader for parsing broken HTML might be less robust than using DOM where you can explicitly tell it to use libxml’s HTML Parser Module.

A basic usage example can be found at getting all values from h1 tags using php

SimpleXml

The SimpleXML extension provides a very simple and easily usable toolset to convert XML to an object that can be processed with normal property selectors and array iterators.

SimpleXML is an option when you know the HTML is valid XHTML. If you need to parse broken HTML, don’t even consider SimpleXml because it will choke.

A basic usage example can be found at A simple program to CRUD node and node values of xml file and there is lots of additional examples in the PHP Manual.

3rd Party Libraries (libxml based)

If you prefer to use a 3rd party lib, I’d suggest to use a lib that actually uses DOM/libxml underneath instead of String Parsing.

phpQuery

phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library written in PHP5 and provides additional Command Line Interface (CLI).

Zend_Dom

Zend_Dom provides tools for working with DOM documents and structures. Currently, we offer Zend_Dom_Query, which provides a unified interface for querying DOM documents utilizing both XPath and CSS selectors.

QueryPath

QueryPath is a PHP library for manipulating XML and HTML. It is designed to work not only with local files, but also with web services and database resources. It implements much of the jQuery interface, but it is heavily tuned for server-side use.

FluentDom

FluentDOM ist a jQuery like fluent XML interface for the DOMDocument in PHP.

fDOMDocument

fDOMDocument extends the standard DOM to use exceptions at all occasions of errors instead of PHP warnings or notices. They also add various custom methods and shortcuts for convinience and to simplify the usage of DOM.

3rd Party (not libxml based)

The benefit of building upon DOM/libxml is that you get good performance out of the box because you are based on a native extension. However, not all 3rd party libs go down this route, some of them listed below

SimpleHtmlDom

A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!

Require PHP 5+.

Supports invalid HTML.

Find tags on an HTML page with selectors just like jQuery.

Extract contents from HTML in a single line.

I generally do not recommend this parser. The codebase is horrible and the parser itself is rather slow and memory hungry. Any of the libxml based libraries should outperform this easily.

Ganon

A universal tokenizer and HTML/XML/RSS DOM Parser

Ability to manipulate elements and their attributes

Supports invalid HTML and UTF8

Can perform advanced CSS3-like queries on elements (like jQuery — namespaces supported)

A HTML beautifier (like HTML Tidy)

Minify CSS and Javascript

Sort attributes, change character case, correct indentation, etc.

Extensible

Parsing documents using callbacks based on current character/token

Operations separated in smaller functions for easy overriding

Fast and Easy

Never used it. Can’t tell if it’s any good.

HTML 5

You can use the above for parsing HTML5, but there can be quirks due to the markup HTML5 allows. So for HTML5 you want to consider using a dedicated parser, like

html5lib

A Python and PHP implementations of a HTML parser based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers.

We might see more dedicated parsers once HTML5 is finalized.

WebServices

If you don’t feel like programming PHP, you can also utilizes Web Services. In general, I found very little utility for these, but that’s just me and my Use Cases.

YQL

The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet. YQL statements have a SQL-like syntax, familiar to any developer with database experience.

ScraperWiki.

ScraperWiki’s external interface allows you to extract data in the form you want for use on the web or in your own applications. You can also extract information about the state of any scraper.

Regular Expressions

Last and least recommended, you can extract data from HTML with Regular Expressions. In general using Regular Expressions on HTML is discouraged.

Most of the snippets you will find on the web to match markup are brittle. In most cases they are only working for a very particular piece of HTML. Tiny markup changes, like adding a space somewhere, can make the Regex fails when it’s not properly written. You should know what you are doing before using Regex on HTML.

HTML parsers already know the syntactical rules of HTML. Regular expression have to be taught them with each new Regex you write. Regex are fine in some cases, but it really depends on your UseCase.

You can write more reliable parsers, but writing a complete and reliable custom parser with Regular Expressions is a waste of time when the aforementioned libraries already exist and do a much better job on this.

Also see Parsing Html The Cthulhu Way

Books

If you want to spend some money, have a look at

PHP Architects Guide to Webscraping with PHP

I am not affiliated with PHP Architects or the authors.

NOTE: This answer was originally posted at StackOverflow.com by Gordon

Pamela answered 15 years ago
last active 13 years ago

0 votes

13 answers

58k views

Can I install/update wordpress plugins without providing ftp access?

wordpress wordpress-plugin

I am using wordpress on my live server which only uses sftp using an ssh key. I want to install...

Russell asked 17 years ago
last active 17 years ago

1 vote

1 answer

19k views

retrieving all users and their reputation

Support Questions

How can I retrieve all users and their reputation into my content of my page template.

G asked 13 years ago
last active 13 years ago

0 votes

2 answers

20k views

Is this a suitable platform to run a peer review system for academia ?

PDO

Note that when using PDO to access a MySQL database real prepared statements are not used by default. To fix this you have to disable the emulation of prepared statements. An example of creating a connection using PDO is:

$dbConnection = new PDO('mysql:dbname=dbtest;host=127.0.0.1;charset=utf8', 'user', 'pass');

$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

In the above example the error mode isn’t strictly necessary, but it is advised to add it. This way the script will not stop with a Fatal Error when something goes wrong. And gives the developer the chance to catch any error(s) which are thrown as PDOExceptions.

What is mandatory however is the setAttribute() line, which tells PDO to disable emulated prepared statements and use real prepared statements. This makes sure the statement and the values aren’t parsed by PHP before sending it the the MySQL server (giving a possible attacker no chance to inject malicious SQL).

Although you can set the charset in the options of the constructor it’s important to note that ‘older’ versions of PHP (< 5.3.6) silently ignored the charset parameter in the DSN.

Explanation

What happens is that the SQL statement you pass to prepare is parsed and compiled by the database server. By specifying parameters (either a ? or a named parameter like :name in the example above) you tell the database engine where you want to filter on. Then when you call execute the prepared statement is combined with the parameter values you specify.

The important thing here is that the parameter values are combined with the compiled statement, not a SQL string. SQL injection works by tricking the script into including malicious strings when it creates SQL to send to the database. So by sending the actual SQL separately from the parameters you limit the risk of ending up with something you didn’t intend. Any parameters you send when using a prepared statement will just be treated as strings (although the database engine may do some optimization so parameters may end up as numbers too, of course). In the example above, if the $name variable contains 'Sarah'; DELETE * FROM employees the result would simply be a search for the string “‘Sarah’; DELETE * FROM employees”, and you will not end up with an empty table.

Another benefit with using prepared statements is that if you execute the same statement many times in the same session it will only be parsed and compiled once, giving you some speed gains.

Oh, and since you asked about how to do it for an insert, here’s an example (using PDO):

$preparedStatement = $db->prepare('INSERT INTO table (column) VALUES (:column)');

$preparedStatement->execute(array(':column' => $unsafeValue));

NOTE: This answer was originally posted at StackOverflow.com by Theo

Dorothy answered 17 years ago
last active 13 years ago

0 votes

In reply to: How do you use bcrypt for hashing passwords in PHP?

มันยอดมากพระเจ้า

demo answered 13 years ago

1 vote

9 answers

50k views

Why does WordPress still use addslashes(), register_globals() and magic_quotes?

wordpress php security escaping global-variables

In order to gain more experience in Wordpress I delved into its code base to study its inner work...

Wayne asked 14 years ago
last active 11 years ago

SabaiDiscuss Demo Site

Just another WordPress site

Posts by demo

Native XML Extensions

DOM

XMLReader

SimpleXml

3rd Party Libraries (libxml based)

phpQuery

Zend_Dom

QueryPath

FluentDom

fDOMDocument

3rd Party (not libxml based)

SimpleHtmlDom

Ganon

HTML 5

WebServices

YQL

ScraperWiki.

Regular Expressions

Books

PDO

Explanation